Get list of text-fields within PDF using PHP - php

I have tried googleing this but can't seem to find an answer.
I have a client that has hundreds of PDF files, each of them have a form with any number of text-fields within them... it could range anywhere from 2 text-fields to 30 text-fields... that is an unknown.
What I need to do is read the PDF file, find all of the text-fields (including field names) within the PDF file so that I can dynamically generate a HTML form that an user will complete which then populates the PDF form.
How can I get a list of text fields and their names within a PDF document using PHP?

There are few tools to work on PDFs that are natively written in PHP.
You might however be able to run the pdftk binary to accomplish this task:
$fields = `pdftk input.pdf dump_data_fields`;
Not tried since I don't have a PDF with forms handy. But supposely you get a textual result list as described here: http://www.cs.unb.ca/~bremner/blog/posts/filling_in_forms_with_pdftk/

Related

Convert PDF to HTML in PHP similar to DocuSign

we are developing a website that needs to convert PDF files into HTML because some of the PDF has a form (not necessarily fillable PDF, these PDFs are printed to be filled up).
So we want it to be filled up through our website instead of printing the files and filled up by pen. We are going paperless.
DocuSign provides these wherein you can upload PDF, then you can customized it to have textboxes, checkbox. So we're kinda using DocuSign as a reference but still haven't figured out how they did it (Almost perfect convertion of PDF to HTML vice-versa).
So far I've tried several 3rd party softwares for converting PDF to HTML. I've tried XPDF, Poppler, & ImageMagick.
ImageMagick converts a PDF to an image which is not suitable as these images has a large size when converted back to a PDF for printing.
Poppler is a fork XPDF based on my research, I've tried it after using XPDF to see if it's better, it basically does what XPDF do but it converts the PDF to have bigger pixels on the CSS when converted to HTML. That's fine but it loses the font family.
XPDF converts PDF to HTML but the pixel is smaller, so when I convert it back to PDF, it does not fit the whole page, and I still have to manually adjust all the CSS to fit it.
So after using these 3rd party softwares, I convert back the HTML files into PDF using MPDF, and the converted files has so much inconsistencies. Texts are not aligned properly. It's basically not the same as the original PDF.
Any help will be appreciated thanks!
What you are trying to do is not as straight forward it may seem. I have worked with Adobe Sign, formerly known as EchoSign, for years and I have a pretty good idea on how these services work. With that been said I strongly suggest looking into one of these eSign services instead of trying to roll out your own. It will save you a lot of time.
This is how it all works
The PDF must have a form itself with named fields. In other words, if you open such PDF in Adobe Reader or Chrome you should be able to fill in the fields. If your PDF does not have a PDF form you will need additional software like Acrobat PRO to create the form.
You must convert the PDF into a flat image that can be rendered in the browser.
You will need a tool to extract the PDF Form information, such as the field names, types, dimensions, and coordinates.
With all this information you can then render the PDF image(s) in the browser. Place absolute positioned HTML form elements over the image using the field type, dimensions, and coordinates from the previous step. Each HTML element needs to reference a PDF form field by name.
Once you have collected the information and a data map like field_name => field_value from your HTML widget, you will need to use additional software to programmatically fill in the PDF form in the original PDF. A PDF form information is often stored in FDF or XFDF file.
I don't know of a single tool that will help you with the things outlined above, at least not in PHP. However, I can provide you with a suggestion can be helpful:
PDFtk Server - Can help you to both, extract the PDF form fields information and fill in the same an XFDF file. Unforutently, the form field information that you can extract with such tool does not include dimensions and coordinates.
iText - A library available in .Net and Java that can be used to extract detailed information about the PDF form including the dimension and coordinates of the fields. You can create microservice using this toolkit that can communicate with PHP.
There are definitely a lot more tools out there for the job. Hopefully, this information will guide you in the right direction or help you make a decision on how to move forward with your project.

How can I replace content in a PDF file using PHP?

I have one issue about PDF file.
I generate dynamic PDF template (Like Certificate). The file content some fields like Firstname, Lastname, Course type, etc.
I want to write dynamic content into PDF as firstname, lastname etc.
I have tried MPDF, FPDF etc tools, but they are not working. I also used PDF to HTML conversation but after converting HTML regenerates not possible from html into proper format.
Please let me know if anyone have other idea to direct replace content into Existing pdf using PHP.
I have sample pdf files.
I want to replace content into given attachment. This is a demo file I have many files with different templates.
Okay, get away from your thoughts one could easiely change anything in a pdf from PHP.
The one way to do such thing like generating a certificate for anyone is to generate a "blank" pdf and then put in the names with names with a php libarary (like Zend_Pdf or anything else).
To make it short: There is no (easy, free) way to manipulate data in a PDF.
To make it complicated: Yes, there is a way. One have to open the file, read it's content, understand where to search for the variables, manipulate the data and regenerate the pdf with a pdf generator. And yes, i know, it is very complicated, so stay at the short and easy way: Generate a blanko and put in the variables as needed ;)

Fill pdf form and insert image, without Pdftk

is there any possible way to not lose any content, when inserting an image into a filled pdf, i am using the fpdm.php script from here and works prettty good i might add. the pdfs i am using i pass them trough pdftk, as in pdftk.exe insert.pdf output output.pdf so they can be filled via php with out throwing errors
so my problem is this, i have a pdf template, which i use to fill it with an array passed from php, and output it to browser or server, and works ok, but when i try to insert an image into it, it inserts it, but loses all filled data, i need to retain that data. i cant use pdftk because im on a godaddy shared hosting plan, also setasign scripts works i know, but i am trying to find a way without buying anything yet.
i found this stamper which stamps ok but loses pdf data, all boxes get blanked, and also this one that places the image and loses all data too. setasign is doing some magic stuff right there
All mentioned scripts are using FPDI in the background which simply doesn't modifes the original document but will allow you to recreate a completely new PDF document by importing another one page by page into reuseable structures (XObjects). Because form fields or other dynamic content like links or any other annotation type are not part of a pages content stream they will get lost.
The mentioned "magic" of the SetaPDF products is, that they modify the original document. Because of this all content will retain.

Determine if a .pdf file has any form elements of any type

I need to determine if a given .pdf file has any form element in it (could be text input, check box, list, etc...)
I don't need to know what kind they are or how many there are, I just need to know that there is more than 1 field of any type in the file.
I already have PHP (5.3), Zend_Pdf and tcpdf at my disposal.
It does not appear that Zend_Pdf offers anything to simply list or count form fields.
My option seems to be to convert the pdf to html and parse the result for form fields or convert the pdf to a text file and parse it.
Are there any better solutions out there?
The Zend_Pdf class has the information you need, but it's not accessible or available through a function. You would need to add a public function to look through $this->_trailer->Root->AcroForm->Fields->items (for the Zend_Pdf object itself) and look for the frequencies of unique values of ->FT->value (the field type) for the fields stored in it.

Submit HTML form to PDF

We have a high-resolution PDF (for printing) which has some form fields on it. We would like to have an HTML form which submits to the PDF, which is then placed into the respective fields.
I found a solution on google: http://koivi.com/fill-pdf-form-fields/
However, with that solution you only get an FDF file... And the demo does not work for me, opening the FDF file simply downloads another FDF file.
Since this PDF will be available to the public we would like to keep it as simple as possible. If we must open our original PDF and import this FDF file, we need a different solution (which I'm not sure is what the FDF file is for, since it didn't work).
A related post talking about .net framework had the same idea, but there were only paid commercial solutions: From HTML form to PDF
The PHP solutions I have found so far are for creating a new PDF, which is not what I need. Our PDF is created with Adobe Illustrator (or a similar adobe product) and is high-res with embedded fonts, svg and image content.
The form elements are in place, we just need to get the data to there.
Update April 11, 2013:
Since posting this question I have been utilizing FPDF on multiple projects where I needed to accomplish this goal. Although it cannot seem to "merge" template PDFs with the provided data, it can create the PDF from scratch.
One example I have used, I had a high resolution PNG for printing (similar to initial question) which we had to write the customer's name and today's date clearly in the center. I simply made the background of the PDF using FPDF->Image() and write the text afterwards using FPDF->Text().
It was very simple after all, you will need to look up the paper sizes to determine the X,Y,W,H of the image and then base your text fields relative to those numbers.
There was even a Form Filling extension, but I couldn't get it to work.
It seems as though I should answer my own question, although Visions answer may be better (seems to be deleted?). I used Vasiliy Faronov's link which was a comment to my main question: https://stackoverflow.com/a/1890835/200445
Here I found how to install pdftk and run a command to merge (flatten) my FDF and PDF files. I still used the "hacky" way to generate an FDF using Koivi's FDF Generator but it works for the most part.
One caveat is that some characters, like single and double quotes are not inserted correctly. It may be an issue of escaping the fields, but I could not find an answer.
Regardless, my PDF form generator is working, but anyone with a similar issue should look for a better solution.
There are number of tools which are not paid like itextsharp. try the following https://web.archive.org/web/20211020001747/https://www.4guysfromrolla.com/articles/030211-1.aspx Hope this code will help you. I have tried it its worked for me. If you can pay then there are number of paid tools which convert the HtML to PDF like ABCPDF etc.This example is in Asp.net and i am sure if you can convert it in PHP it will work for you too.

Categories