saving PDFs for later manipulation in PHP - php

I have a PDF with some text in it that I would like to modify dynamically using PHP. This is being done already with another PDF, and what happens is that PHP simply replaces a token in the form %token% with another value pulled from a database. If you open that PDF in a text editor, you can find the %token% in plain text. But with this other PDF that I want to do the same thing with, if you open it in a text editor, there are no tokens in plaintext (even though I explicitly created one using Adobe Acrobat Pro). Obviously, the PDF's string content in this PDF is either encrypted, compressed, or both. What I want to know is how can I save a PDF so that the string content remains as plaintext such that PHP can manipulate it.
Please note, I do not want to dynamically create the whole PDF from scratch using some PHP library. I know that is something that can be done, but the PDF I am working with already exists and I just want to modify it slightly in the manner described.

For things like that I like to use the free command line tool PDFtk, which can compress / decompress PDFs and some nice thinks more. You may have a look at: https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
PS: I edit a special pdf calendar form from the internet. I decompress it and replace the awful pink color for the weekend with gray (Saturday) and blue for Sunday. Violet I use as small bar to mark vacation days.

Related

Convert PDF to HTML in PHP similar to DocuSign

we are developing a website that needs to convert PDF files into HTML because some of the PDF has a form (not necessarily fillable PDF, these PDFs are printed to be filled up).
So we want it to be filled up through our website instead of printing the files and filled up by pen. We are going paperless.
DocuSign provides these wherein you can upload PDF, then you can customized it to have textboxes, checkbox. So we're kinda using DocuSign as a reference but still haven't figured out how they did it (Almost perfect convertion of PDF to HTML vice-versa).
So far I've tried several 3rd party softwares for converting PDF to HTML. I've tried XPDF, Poppler, & ImageMagick.
ImageMagick converts a PDF to an image which is not suitable as these images has a large size when converted back to a PDF for printing.
Poppler is a fork XPDF based on my research, I've tried it after using XPDF to see if it's better, it basically does what XPDF do but it converts the PDF to have bigger pixels on the CSS when converted to HTML. That's fine but it loses the font family.
XPDF converts PDF to HTML but the pixel is smaller, so when I convert it back to PDF, it does not fit the whole page, and I still have to manually adjust all the CSS to fit it.
So after using these 3rd party softwares, I convert back the HTML files into PDF using MPDF, and the converted files has so much inconsistencies. Texts are not aligned properly. It's basically not the same as the original PDF.
Any help will be appreciated thanks!
What you are trying to do is not as straight forward it may seem. I have worked with Adobe Sign, formerly known as EchoSign, for years and I have a pretty good idea on how these services work. With that been said I strongly suggest looking into one of these eSign services instead of trying to roll out your own. It will save you a lot of time.
This is how it all works
The PDF must have a form itself with named fields. In other words, if you open such PDF in Adobe Reader or Chrome you should be able to fill in the fields. If your PDF does not have a PDF form you will need additional software like Acrobat PRO to create the form.
You must convert the PDF into a flat image that can be rendered in the browser.
You will need a tool to extract the PDF Form information, such as the field names, types, dimensions, and coordinates.
With all this information you can then render the PDF image(s) in the browser. Place absolute positioned HTML form elements over the image using the field type, dimensions, and coordinates from the previous step. Each HTML element needs to reference a PDF form field by name.
Once you have collected the information and a data map like field_name => field_value from your HTML widget, you will need to use additional software to programmatically fill in the PDF form in the original PDF. A PDF form information is often stored in FDF or XFDF file.
I don't know of a single tool that will help you with the things outlined above, at least not in PHP. However, I can provide you with a suggestion can be helpful:
PDFtk Server - Can help you to both, extract the PDF form fields information and fill in the same an XFDF file. Unforutently, the form field information that you can extract with such tool does not include dimensions and coordinates.
iText - A library available in .Net and Java that can be used to extract detailed information about the PDF form including the dimension and coordinates of the fields. You can create microservice using this toolkit that can communicate with PHP.
There are definitely a lot more tools out there for the job. Hopefully, this information will guide you in the right direction or help you make a decision on how to move forward with your project.

How would I draw a rectangle in PHP, and then convert to a .pdf file?

I'm just wondering about a small problem I have. I'm creating a simple website with PHP, HTML and CSS. The user enters the dimensions of a rectangle into a form, and this all works perfectly. However, I'm unsure how to proceed.
The system should then use these dimensions (both stored as separate variables) to draw a rectangle of that size (in cm). The rectangle should be draw in a .pdf document, which the user can then download.
I'm a beginner, so sorry if there's anything wrong with this question.
You're going to want to start with looking into the html5 canvas element for drawing. There's plenty of tutorials online for you. As far as saving to pdf, see this: Convert canvas to PDF good luck.
Take a look at the fpdf extension available for php. Or even better and more consequent: use its unicode capable variant 'tfpdf'.
It allows you to generate a pdf document on the server side in a step by step way. You can control all details of that document. When finished you can return the document to the requesting client.

PHP - Workaround for reading user-selected text from PDF?

I am working on a project that allows a user to upload text or content from an HTML page in Japanese and then use their cursor to select words in the text/content to translate into English. However, I would like to be able to expand this functionality to PDF files. Essentially, I'd like the user to be able to submit a PDF file and have the browser render that PDF file in such a way that when the user selects/highlights words in the PDF, the browser can somehow relay what the text of the highlighted section is, such as via javascript, to be then relayed to a PHP variable.
I know there are a lot of posts on stackoverflow asking similar questions (I've spent hours upon hours trying to sort through them all!), but I can't seem to find a definitive answer on whether this is possible. It seems there are lots of options for converting PDF to HTML or extracting text from PDF, but to be quite honest, I'm confused if any of those options are relevant to what I am trying to accomplish. And I know there's a javascript API for Adobe, but I'm under the impression the javascript needs to be embedded in the PDF already, which will not be true if the user is uploading their own PDF files to render. Even if that is possible, it seems there's no native text selection support in the Adobe javascript API....
Is there a straightforward workaround (oxymoron?) to doing this? Again, I want to be able to pass text selected in a PDF to a variable -- the effect is the user highlights words they don't know so those words can be added to a word bank for retrieval in a dictionary.
Let me know if I can be clearer on anything. Thank you!
I think your best bet is to convert the PDF to HTML (see this answers) and then you are already set as you already implemented everything for regular HTML.

PHP to edit PDF

I have some random PDF that I need to edit. And by edit, to replace an image and some text.
All of the PHP PDF libraries that I saw, create a PDF from scratch.
Is there a way to edit a page of the PDF by replacing images and text ?
There was another recent discussion on this: PHP PDF template library with PDF output? - There is no ready-made library for that.
While technically it's doable (PDF is actually a simple text based registry format, looked through specification once); the internal structure and encoding of text make it awfully difficult to locate and replace text. If you hardcode the object ids, and just create a new 25 1 obj revision for example, then a simple programmatic update might work. But neither FPDF nor TCPDF can do that AFAIK. (Look into FPDI import however.) And if you say you have some "random pdf" it's even less likely.
Try one of the format conversion methods (openoffice to pdf). You could manually convert PDF to OpenDraw probably, and after PHP-based editing convert it back. I'm very unsure if it brings usable results though.

Problem in picture overlapping when I convert HTML page to PDF

I want to overlap pictures, but it is not working and I need some help.
Here's the link to the page I'd like to convert:
http://9m9.com/innovative/sample/two.html
I want to convert this page to a PDF. You can see the small image overlapping the bigger one.
This is the page where you can click on a link that will convert the page to PDF.
http://citysoftsolutions.com/eclients/virtualtour/view_property_images.php?pid=9&uid=67
As you can see the image is placed behind the big image.
I'm using this converter script: http://mpdf.bpm1.com/
When I printed it using PrimoPDF driver it came out just fine. Last image was easily laid over. So there must be a bug with the script you're using.
What do I suggest?
If you'd like to convert your pages to PDFs "on the fly" I suggest you either
contact script creator and inform them of a bug in the script
use a different script (I'd check out this question that can help you)
If you'd like to just provide PDFs of your page I suggest you install a PDF printer driver (like PrimoPDF that I'm using) and print those pages yourself and use those.
I'm not working for Nitro PDF Software company nor am I related to them in any way. So this is not me advertising their products/services.
On a sidenote
Something's telling me that what you'd actually like to do is to create a PDF flyer/promo material or something. If that's actually what you're after I suggest you do that using some software that's meant for such a job. Microsoft Office Word will do, but you'll better off using some other. If it's a one page leaflet you could use Adobe Illustrator or CorelDraw. But if it's going to be an actual multipage document use something like Word or Adobe InDesign.
Word is probably something you can easily master. So go with that one.

Categories