I'm creating a php based web application which allows the user to upload a PDF file. This file will then be read and checked for certain data (text).
The problem is I can't figure out how to even open a PDF file in PHP. There are some PDF libraries mainly for creating PDF's, but they don't seem to be very good at reading them.
An alternative solution would be to use an already available solution in Python or something else (as described in other threads on this site) but I'd really like to stay as much as possible in PHP as I intend to later export the data to mysql, etc.
Any input on how to read a PDF and extract data from it would be much appreciated.
I personally haven't tried this out, but it looks like this one works: http://www.pdfparser.org/documentation
It's just a matter of downloading and telling your code to include it, just like the documentation shows.
Or you could try the class.pdf2text.php found in http://www.phpclasses.org/browse/file/31030.html
I have been scouting the web for a pdf editing tool for quite some time now. And it would therefore be nice with some suggestions/recommendations to the problem.
I have read a bunch of other topics around StackOverflow, but havent quite been able to find a full solution yet.
The case: I have an application written in php/javascript on a linux server and recently it has become a requirement for my customers that they are able to edit Pdf documents direct in the browser. The functions i need are primarily the ability to make annotations and to draw on the PDF. That means. They load an already uploaded PDF-document and edits it and then saves it in PDF-format again. All done within the browser.
The second requirement is that the program must save pdf document in PDF-format again, since i have an Ipad app with all of this functionality, and i need them to play nicely together. It is therefore not an option to save an image of html of something like that.
I read a comment suggesting the Zend framework, and it did sound quite useful. However i have developed my own platform from the ground, and therefore my second question is, if it is possible to embed only the PDF-tool from Zend or something?
Thank you in advance.
ps. If i missed a simular subject that answers exactly this, please let me know and i'll delete again.
There are several libraries that can be used to edit PDF-files. To name a few: Tcpdf, fpdf and Zend_Pdf.
The later can be used even without the complete ZendFramework. But be cautious: Currently it only allows editing of PDF-Files up to version 1.4.
If you need speed you should also have a look at the PDFlib which could be tweaked to support your usecase
I've used a couple of days to think of a best practice to generate a PDF, which end users can customize the layout for themselves. The PDF output needs to be saved on the server or sent back to the PHP file so the PHP file can save it, and the PHP file needs to know that it went OK.
I thought the best way to do this was to use XML, XSLT and Apache Cocoon. But I'm not sure if this is possible or if it's a good idea since I can't find any information of people doing anything similar. It cannot be an uncommon problem.
The idea came when I read about Cocoon converting XML through XSLT to PDF:
http://cocoon.apache.org/2.1/howto/howto-html-pdf-publishing.html
and being able to take in variables:
http://old.nabble.com/how-to-access-post-parameters-from-sitemap-td31478752.html
This is what I had in mind:
A php file gets called by a user, the php file generates a source XML file with a specific name
The php file then makes a request to Cocoon (on the same web server) to apply the user defined XSLT on the XML file. A parameter will be needed here to know which XSLT to apply.
The request is handled by the PHP file and then saved as a PDF on the server, and can later be mailed away.
Will this work at all? Is there a better way to handle this?
The core problem is that the users need to be able to customize the layout on the PDFs themselves, and I need the server to save the PDF and to mail it later on. The users will use it for order confirmations, invoices, etc. And I wouldn't like to hard code the layout for each user.
I've had some good results in the past by setting up JasperReports Server and creating reports using iReport Designer. They're both available in F/OSS ("community") editions, though you can pay for support and value-adds if you need those things.
This was a good solution for us, since we could access it via the Java API for our Java system, and via SOAP for our PHP system. The GUI designer made tweaking reports very easy for non-technical business staff too.
I use webkithtml2pdf to generate my PDF:s. Just create a document with HTML and CSS for printing like you would usually do, the run it through the converter.
It works great for generating things like invoices. You can use SVG for logos and illustrations, and they will look great in print since they are vector based. Even rounded corners with dotted outlines works perfectly.
A minor gotcha is that the input html must have th htm or html file name suffix, so you can't use the default tempfile functions.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Is there any PHP PDF library that can replace placeholder variables in an existing PDF, ODT or DOCX document, and generate a PDF file as the end result, without screwing up the layout?
Requirements:
Needs no 3rd party web service
Ability to run on shared web hosting would be ideal (no binary installations / packages required)
Mind you, a library that is able to load an existing PDF file and insert text programmatically at a specific position is not enough for my use case.
As far as my research shows, there is no library that can do this:
TCPDF can only generate documents from scratch
FPDI can read existing PDF templates, but can only add contents programmatically (no template variable replacement)
There are various DOCX/ODT template libraries out there but they don't output PDF
PHPDOCx claims to be able to do exactly what I need - but they don't offer a trial version and I'm not going to buy a cat in a bag, especially not when there seems to be no other product on the web that does this. I find it hard to believe they can do this without problems - if you have successfully done this using the product, please drop a line here.
Am I overlooking something?
Is there a way to do this using PDF forms? I am creating the source documents in OpenOffice 3.
I may be able to use standard Linux commands (pdftk is available for example, trying that out right now.)
Update: *Argh!* I was called out of the office and the bounty expired in the meantime. Starting a new bounty: As far as my testing shows, no solution works for me perfectly yet.
Update II: I will be looking the pdftk approach soon, but I am also starting another bounty for one more round of collecting additional input. This question has now seen 1300 rep points in bounties, must be some kind of a record :)
This is not very practical, but for completeness: If you already have an ODT template, then you might very well retain that as template. Modifying the OpenDocument content.xml and replacing placeholders therein is pretty simple. If so, you could use unoconv or pyodconverter to transform the ODT into a final PDF.
unoconv -f pdf -o final.pdf template.odt
Very obviously this requires a full OpenOffice setup (UNO and Writer) on the webserver. And obviously not every webhoster would go with that! haha. Even if it's simple on any Debian or Fedora setup. The execution speed would probably not be stellar either. But then it might be the cleanest approach, since OOo governs both formats way better than any PHP class ever could.
Pekka,
I looked in to this previously, I think you can use pdftk (a command line utility), to fill in a PDF form using FDF/XFDF data files, which you could easily generate from within PHP. That was the best option I've seen so far, though there may well be a native library.
pdftk is quite useful in general, worth having a look at.
Update: Have a look here: http://php.net/manual/en/book.fdf.php
Have you considered using something like XSL:Formatting Objects (XSL:FO)? Basically they're XML documents that are processed and turned into PDFs. Doing string - or better, DOM - replacements within that should be pretty simple. It supports embedding images, links, annotations, etc.
It's not PHP but there are a number of PHP wrappers for it along with ways of using it via exec, etc. Not an ideal but it takes care of the template portion completely. For some more info: http://techportal.inviqa.com/2009/12/16/transforming-xml-with-php-and-xsl/
There's an implementation available as an Apache project - http://xmlgraphics.apache.org/fop/
fpdf and there is another extention on top of it, which I can't remember, which allows you to import templates
Your best bet would be to generate the entire document on the fly, with the template defined programatically using fpdf or something similar. That way, your text will not be cut off by paragraphs or anything like that, and you can easily position images/other elements as required.
Late, but you can use OpenSource template designer https://github.com/applicius/dhek/releases , to define pkaceholders/areas over any existing PDF, then load it in PHP (as it's JSON format) and write accordingly on original PDF using fpdf lib, to generate custom PDF with dynamic data written on.
Altough not exactly thing you asked, you may consider to make it at two steps: using some php templating sytem (smarty, dwoo) to generate html page and then using tools like Html2Pdf convert it to pdf. I am using it, and results are good (no problems with page layout etc)
Of course it depends of your input documents (can you use html instead of PDF/ ODT as source ) and complexity of the layout of those.
Ok I'm trying to help you solve the problem a little.
First the answer for couple of your question.
Q - Am I overlooking something?
A - No. There is a PHP PDF library that can replace placeholder variables in an existing PDF and generate a PDF file as the end result, without screwing up the layout
Q - Is there a way to do this using PDF forms?
A - Yes. absolutelly the tric to doing this is by using a PDF Forms
For both answer you can use Justin Koivisto fill pdf form field php library.
For more detail you please go to http://koivi.com/fill-pdf-form-fields/tutorial.php.
Take a look there for additional information.
Credit to Justin Koivisto for his work
P.S
For workaround for displaying a table like output from pdf form
please consider to take some reading on Oracle Business Intelligence Publisher User's Guide - Creating a PDF Template
I'll add this new answer since the FDF PHP extension is now dead.
I've just followed these instructions and ended up executing one perl script then the pdftk command
I'm pretty aware it's far from being a real PHP solution but it's reliable and fairly easy to implement on any *nix platform.
The tools described there are also available on Debian, just in case you were wondering.
It's a litte bit late but have a look at the PDFTemplate Library it does exatly what you want. You can create Open Document files (odt) and add placeholders in it. The PDFTemplate library can fill out these placeholders (even with images) and create a PDF file.
ODT Files with placeholders to PDF
First of all my apologies to all the people who think this question is a repeated one or they find a similar question to this.
I am working on a project in which I have an online form and some PDFs stored on the server.
Functionality
On the submit action I have to get the data from the form, fill it to the copy of PDF and finally download it.
Approach
I followed these steps to achieve this functionality:
Converted the pdfs to html with this http://www.pdfdownload.org/free-pdf-to-html.aspx online tool.
Embedded the html with form variables and regenerated the PDFs with this library / dompdf library.
Problem
The approach is a brute force one as the html generated are far away from the real ones. So lot of effort is wasted in adjusting the html.
The process is so slow and not reliable as most of the time I get memory error or some other issues.
I need to to automate this process. What I have found through searching is I should create an FDF file that contains my variable and pass it to the PDF using some library and then download it.
I am able to create the FDF file but missing any library in PHP (I found one in JAVA) that I can use to create the PDF and download it. One library that I found is pdf tool kit but that is a command line tool and I am not able to use it on the server at run time and download the PDF file.
Anybody having done this before please help.
(Sorry for this long post)
Thanks,
Madhup
Check out FPDI. It allows you to load some existing PDF, draw on it programatically, and output a new PDF. Which, if I read your question right, is what you're trying to do.
There's some example code here.