Find & Replace Text with an image in a PDF using PHP

Find & Replace Text with an image in a PDF using PHP - php

I am currently building a digital signature system for my company and I need to be able to add the signature once signed to a PDF document.
I have been using FPDF & FDPI to overlay the signature onto the document (this works great if the document is static.)
The problem I have is that the document starts out as a word doc and is converted to PDF. The document has 20 pages and tons of fields passed to it by our case management software so to change to PDF will take me ages also it will change in size based on the information passed from the fields.
I was wondering if anyone has come across a way to search & replace text in a PDF or is there a way to parse the document add a signature and re-create a new PDF.
Any help on this would be great as I have spent a week now trying to find a solution.
Oh sorry for lengthy post just trying to get as much info in as possible.
Thanks
Brad

Actually I'm not aware of any class/library in PHP that is able to replace text in a PDF document with other content. If it is possible to convert the "fields" into real PDF form fields, you may check this out (not free!).

Thanks for your reply
I have managed to sort. I am going to parse the document using parserpdf and then find and replace a placeholder in the document with the signature and then re-build the PDF using FPDF.
Bit long winded but tested and works well.

Related

Convert PDF to HTML in PHP similar to DocuSign

we are developing a website that needs to convert PDF files into HTML because some of the PDF has a form (not necessarily fillable PDF, these PDFs are printed to be filled up).
So we want it to be filled up through our website instead of printing the files and filled up by pen. We are going paperless.
DocuSign provides these wherein you can upload PDF, then you can customized it to have textboxes, checkbox. So we're kinda using DocuSign as a reference but still haven't figured out how they did it (Almost perfect convertion of PDF to HTML vice-versa).
So far I've tried several 3rd party softwares for converting PDF to HTML. I've tried XPDF, Poppler, & ImageMagick.
ImageMagick converts a PDF to an image which is not suitable as these images has a large size when converted back to a PDF for printing.
Poppler is a fork XPDF based on my research, I've tried it after using XPDF to see if it's better, it basically does what XPDF do but it converts the PDF to have bigger pixels on the CSS when converted to HTML. That's fine but it loses the font family.
XPDF converts PDF to HTML but the pixel is smaller, so when I convert it back to PDF, it does not fit the whole page, and I still have to manually adjust all the CSS to fit it.
So after using these 3rd party softwares, I convert back the HTML files into PDF using MPDF, and the converted files has so much inconsistencies. Texts are not aligned properly. It's basically not the same as the original PDF.
Any help will be appreciated thanks!

What you are trying to do is not as straight forward it may seem. I have worked with Adobe Sign, formerly known as EchoSign, for years and I have a pretty good idea on how these services work. With that been said I strongly suggest looking into one of these eSign services instead of trying to roll out your own. It will save you a lot of time.
This is how it all works
The PDF must have a form itself with named fields. In other words, if you open such PDF in Adobe Reader or Chrome you should be able to fill in the fields. If your PDF does not have a PDF form you will need additional software like Acrobat PRO to create the form.
You must convert the PDF into a flat image that can be rendered in the browser.
You will need a tool to extract the PDF Form information, such as the field names, types, dimensions, and coordinates.
With all this information you can then render the PDF image(s) in the browser. Place absolute positioned HTML form elements over the image using the field type, dimensions, and coordinates from the previous step. Each HTML element needs to reference a PDF form field by name.
Once you have collected the information and a data map like field_name => field_value from your HTML widget, you will need to use additional software to programmatically fill in the PDF form in the original PDF. A PDF form information is often stored in FDF or XFDF file.
I don't know of a single tool that will help you with the things outlined above, at least not in PHP. However, I can provide you with a suggestion can be helpful:
PDFtk Server - Can help you to both, extract the PDF form fields information and fill in the same an XFDF file. Unforutently, the form field information that you can extract with such tool does not include dimensions and coordinates.
iText - A library available in .Net and Java that can be used to extract detailed information about the PDF form including the dimension and coordinates of the fields. You can create microservice using this toolkit that can communicate with PHP.
There are definitely a lot more tools out there for the job. Hopefully, this information will guide you in the right direction or help you make a decision on how to move forward with your project.

Can we replace custom tokens in PDF using PHP

I am creating a small application which manipulates PDF file as follows
A customer creates a CSV to a given specification that contains the
name,address, country, ink type, station to use. This CSV could also
include customized tokens which replace tokens that are written
within their PDF Document.
A customer creates a PDF document it could be a standard document
that's exactly the same that gets sent to everyone in their CSV file,
or it could contain special tokens which are replace with specific
contacts details within the CSV.
I've briefly looked at http://us.php.net/pdf and FPDF, but I was wondering what specific technique I'd use to achieve this.
I was thinking I'd insert an address tokens string where I want the address to go, and then use some function to update those tokens in the PDF document.
Can anyone point me in the right direction? I have php experience, but not with editing / generating pdf documents from php.

You can use MPDF lib to generate PDF. In it you just need to pass your HTML content.

From half the PHP devs I talk to they recommend this:
http://www.setasign.de/products/pdf-php-solutions/setapdf-linkreplacer/
For your problem - I believe you need to buy it but there is an eval copy for your devving needs.
HTHs - Thanks,
//P

I am using Tcpdf library to generate pdf.
please check url for example. http://www.tcpdf.org/examples.php

If you kept all the contact information on a single page you could use poppler-utils https://packages.debian.org/sid/poppler-utils to pdfmerge the page of contact info with the bulk of the PDF. Use tcpdf library to create the one page with contact info.

Submit HTML form to PDF

We have a high-resolution PDF (for printing) which has some form fields on it. We would like to have an HTML form which submits to the PDF, which is then placed into the respective fields.
I found a solution on google: http://koivi.com/fill-pdf-form-fields/
However, with that solution you only get an FDF file... And the demo does not work for me, opening the FDF file simply downloads another FDF file.
Since this PDF will be available to the public we would like to keep it as simple as possible. If we must open our original PDF and import this FDF file, we need a different solution (which I'm not sure is what the FDF file is for, since it didn't work).
A related post talking about .net framework had the same idea, but there were only paid commercial solutions: From HTML form to PDF
The PHP solutions I have found so far are for creating a new PDF, which is not what I need. Our PDF is created with Adobe Illustrator (or a similar adobe product) and is high-res with embedded fonts, svg and image content.
The form elements are in place, we just need to get the data to there.

Update April 11, 2013:
Since posting this question I have been utilizing FPDF on multiple projects where I needed to accomplish this goal. Although it cannot seem to "merge" template PDFs with the provided data, it can create the PDF from scratch.
One example I have used, I had a high resolution PNG for printing (similar to initial question) which we had to write the customer's name and today's date clearly in the center. I simply made the background of the PDF using FPDF->Image() and write the text afterwards using FPDF->Text().
It was very simple after all, you will need to look up the paper sizes to determine the X,Y,W,H of the image and then base your text fields relative to those numbers.
There was even a Form Filling extension, but I couldn't get it to work.
It seems as though I should answer my own question, although Visions answer may be better (seems to be deleted?). I used Vasiliy Faronov's link which was a comment to my main question: https://stackoverflow.com/a/1890835/200445
Here I found how to install pdftk and run a command to merge (flatten) my FDF and PDF files. I still used the "hacky" way to generate an FDF using Koivi's FDF Generator but it works for the most part.
One caveat is that some characters, like single and double quotes are not inserted correctly. It may be an issue of escaping the fields, but I could not find an answer.
Regardless, my PDF form generator is working, but anyone with a similar issue should look for a better solution.

There are number of tools which are not paid like itextsharp. try the following https://web.archive.org/web/20211020001747/https://www.4guysfromrolla.com/articles/030211-1.aspx Hope this code will help you. I have tried it its worked for me. If you can pay then there are number of paid tools which convert the HtML to PDF like ABCPDF etc.This example is in Asp.net and i am sure if you can convert it in PHP it will work for you too.

PHP - Workaround for reading user-selected text from PDF?

I am working on a project that allows a user to upload text or content from an HTML page in Japanese and then use their cursor to select words in the text/content to translate into English. However, I would like to be able to expand this functionality to PDF files. Essentially, I'd like the user to be able to submit a PDF file and have the browser render that PDF file in such a way that when the user selects/highlights words in the PDF, the browser can somehow relay what the text of the highlighted section is, such as via javascript, to be then relayed to a PHP variable.
I know there are a lot of posts on stackoverflow asking similar questions (I've spent hours upon hours trying to sort through them all!), but I can't seem to find a definitive answer on whether this is possible. It seems there are lots of options for converting PDF to HTML or extracting text from PDF, but to be quite honest, I'm confused if any of those options are relevant to what I am trying to accomplish. And I know there's a javascript API for Adobe, but I'm under the impression the javascript needs to be embedded in the PDF already, which will not be true if the user is uploading their own PDF files to render. Even if that is possible, it seems there's no native text selection support in the Adobe javascript API....
Is there a straightforward workaround (oxymoron?) to doing this? Again, I want to be able to pass text selected in a PDF to a variable -- the effect is the user highlights words they don't know so those words can be added to a word bank for retrieval in a dictionary.
Let me know if I can be clearer on anything. Thank you!

I think your best bet is to convert the PDF to HTML (see this answers) and then you are already set as you already implemented everything for regular HTML.

PHP to edit PDF

I have some random PDF that I need to edit. And by edit, to replace an image and some text.
All of the PHP PDF libraries that I saw, create a PDF from scratch.
Is there a way to edit a page of the PDF by replacing images and text ?

There was another recent discussion on this: PHP PDF template library with PDF output? - There is no ready-made library for that.
While technically it's doable (PDF is actually a simple text based registry format, looked through specification once); the internal structure and encoding of text make it awfully difficult to locate and replace text. If you hardcode the object ids, and just create a new 25 1 obj revision for example, then a simple programmatic update might work. But neither FPDF nor TCPDF can do that AFAIK. (Look into FPDI import however.) And if you say you have some "random pdf" it's even less likely.
Try one of the format conversion methods (openoffice to pdf). You could manually convert PDF to OpenDraw probably, and after PHP-based editing convert it back. I'm very unsure if it brings usable results though.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.