I'm creating a php based web application which allows the user to upload a PDF file. This file will then be read and checked for certain data (text).
The problem is I can't figure out how to even open a PDF file in PHP. There are some PDF libraries mainly for creating PDF's, but they don't seem to be very good at reading them.
An alternative solution would be to use an already available solution in Python or something else (as described in other threads on this site) but I'd really like to stay as much as possible in PHP as I intend to later export the data to mysql, etc.
Any input on how to read a PDF and extract data from it would be much appreciated.
I personally haven't tried this out, but it looks like this one works: http://www.pdfparser.org/documentation
It's just a matter of downloading and telling your code to include it, just like the documentation shows.
Or you could try the class.pdf2text.php found in http://www.phpclasses.org/browse/file/31030.html
Related
A client has given me the task of creating a site with the ability to convert their file uploads into html or pdf for storage on the web server. I want them to be able to upload (.doc, .tiff, .jpg, etc) and have it convert these files on the fly, again... into either html or pdf.
I am open to software and api's that do the trick but the file MUST BE STORED ON THE CLIENTS WEB SERVER after conversion. The client is using godaddy with an ssl if that helps. Any input is greatly appreciated as I have been looking for a long term solution to this problem that I will be able to use in future projects.
Things I have looked into but have had trouble using this way... Scribd, open office api
Places I've found the most help so far here
Well...the matter of storing the file as HTML...as you upload images all you need to do is store the file somewhere and then create a HTML file that looks something like this:
<html>
<body>
<img src="path/to/the/image/file.png" />
</body>
</html>
It might be worth it to convert the large files (especially TIFF) to another format. Converting .doc-Files might be a little more tricky. Have a look here: Convert .doc to html in php
Maybe also take a look at the Document zetaComponent, which is able to cobvert between different document types, although not all of those you mentioned are supported so far.
Creating a PDF should be almost as easy as there are several libraries for PHP that can aid you. Just poke around on SO: Convert HTML + CSS to PDF with PHP?
Overall you will have to mix up a whole lot of stuff to get that job done. There is no "simple" solution to this.
We have a high-resolution PDF (for printing) which has some form fields on it. We would like to have an HTML form which submits to the PDF, which is then placed into the respective fields.
I found a solution on google: http://koivi.com/fill-pdf-form-fields/
However, with that solution you only get an FDF file... And the demo does not work for me, opening the FDF file simply downloads another FDF file.
Since this PDF will be available to the public we would like to keep it as simple as possible. If we must open our original PDF and import this FDF file, we need a different solution (which I'm not sure is what the FDF file is for, since it didn't work).
A related post talking about .net framework had the same idea, but there were only paid commercial solutions: From HTML form to PDF
The PHP solutions I have found so far are for creating a new PDF, which is not what I need. Our PDF is created with Adobe Illustrator (or a similar adobe product) and is high-res with embedded fonts, svg and image content.
The form elements are in place, we just need to get the data to there.
Update April 11, 2013:
Since posting this question I have been utilizing FPDF on multiple projects where I needed to accomplish this goal. Although it cannot seem to "merge" template PDFs with the provided data, it can create the PDF from scratch.
One example I have used, I had a high resolution PNG for printing (similar to initial question) which we had to write the customer's name and today's date clearly in the center. I simply made the background of the PDF using FPDF->Image() and write the text afterwards using FPDF->Text().
It was very simple after all, you will need to look up the paper sizes to determine the X,Y,W,H of the image and then base your text fields relative to those numbers.
There was even a Form Filling extension, but I couldn't get it to work.
It seems as though I should answer my own question, although Visions answer may be better (seems to be deleted?). I used Vasiliy Faronov's link which was a comment to my main question: https://stackoverflow.com/a/1890835/200445
Here I found how to install pdftk and run a command to merge (flatten) my FDF and PDF files. I still used the "hacky" way to generate an FDF using Koivi's FDF Generator but it works for the most part.
One caveat is that some characters, like single and double quotes are not inserted correctly. It may be an issue of escaping the fields, but I could not find an answer.
Regardless, my PDF form generator is working, but anyone with a similar issue should look for a better solution.
There are number of tools which are not paid like itextsharp. try the following https://web.archive.org/web/20211020001747/https://www.4guysfromrolla.com/articles/030211-1.aspx Hope this code will help you. I have tried it its worked for me. If you can pay then there are number of paid tools which convert the HtML to PDF like ABCPDF etc.This example is in Asp.net and i am sure if you can convert it in PHP it will work for you too.
I've used a couple of days to think of a best practice to generate a PDF, which end users can customize the layout for themselves. The PDF output needs to be saved on the server or sent back to the PHP file so the PHP file can save it, and the PHP file needs to know that it went OK.
I thought the best way to do this was to use XML, XSLT and Apache Cocoon. But I'm not sure if this is possible or if it's a good idea since I can't find any information of people doing anything similar. It cannot be an uncommon problem.
The idea came when I read about Cocoon converting XML through XSLT to PDF:
http://cocoon.apache.org/2.1/howto/howto-html-pdf-publishing.html
and being able to take in variables:
http://old.nabble.com/how-to-access-post-parameters-from-sitemap-td31478752.html
This is what I had in mind:
A php file gets called by a user, the php file generates a source XML file with a specific name
The php file then makes a request to Cocoon (on the same web server) to apply the user defined XSLT on the XML file. A parameter will be needed here to know which XSLT to apply.
The request is handled by the PHP file and then saved as a PDF on the server, and can later be mailed away.
Will this work at all? Is there a better way to handle this?
The core problem is that the users need to be able to customize the layout on the PDFs themselves, and I need the server to save the PDF and to mail it later on. The users will use it for order confirmations, invoices, etc. And I wouldn't like to hard code the layout for each user.
I've had some good results in the past by setting up JasperReports Server and creating reports using iReport Designer. They're both available in F/OSS ("community") editions, though you can pay for support and value-adds if you need those things.
This was a good solution for us, since we could access it via the Java API for our Java system, and via SOAP for our PHP system. The GUI designer made tweaking reports very easy for non-technical business staff too.
I use webkithtml2pdf to generate my PDF:s. Just create a document with HTML and CSS for printing like you would usually do, the run it through the converter.
It works great for generating things like invoices. You can use SVG for logos and illustrations, and they will look great in print since they are vector based. Even rounded corners with dotted outlines works perfectly.
A minor gotcha is that the input html must have th htm or html file name suffix, so you can't use the default tempfile functions.
I want to know if it is possible to view a PDF file in a webapp of some kind, then be able to trace over lines in the PDF file and submit the lengths to an MySQL database.
Basically import a PDF file, trace around a shape in that PDF file and save those values.
I work with House plans and this would make my job a lot easier.
Is it possible? and can someone point me in the right direction on where to read more.
I know its possible to pull text from a PDF file, I've seen heaps of info on how to do that using various libraries, but nothing on a Vector shape (house plan).
If it's not possible in PHP which language should I look into, the only reason I ask is because I have basic skills in PHP and have already written some simple apps that help me out, this would just be the final touch to make my job easier.
Thanks
Parsing PDFs is a world of pain... And PHP is probably as good language to jump into this as any other.
I would recommend you to start your research with GhostScript, it has a CLI interface which could be accessed by php
First of all my apologies to all the people who think this question is a repeated one or they find a similar question to this.
I am working on a project in which I have an online form and some PDFs stored on the server.
Functionality
On the submit action I have to get the data from the form, fill it to the copy of PDF and finally download it.
Approach
I followed these steps to achieve this functionality:
Converted the pdfs to html with this http://www.pdfdownload.org/free-pdf-to-html.aspx online tool.
Embedded the html with form variables and regenerated the PDFs with this library / dompdf library.
Problem
The approach is a brute force one as the html generated are far away from the real ones. So lot of effort is wasted in adjusting the html.
The process is so slow and not reliable as most of the time I get memory error or some other issues.
I need to to automate this process. What I have found through searching is I should create an FDF file that contains my variable and pass it to the PDF using some library and then download it.
I am able to create the FDF file but missing any library in PHP (I found one in JAVA) that I can use to create the PDF and download it. One library that I found is pdf tool kit but that is a command line tool and I am not able to use it on the server at run time and download the PDF file.
Anybody having done this before please help.
(Sorry for this long post)
Thanks,
Madhup
Check out FPDI. It allows you to load some existing PDF, draw on it programatically, and output a new PDF. Which, if I read your question right, is what you're trying to do.
There's some example code here.