PHP - Workaround for reading user-selected text from PDF?

PHP - Workaround for reading user-selected text from PDF? - php

I am working on a project that allows a user to upload text or content from an HTML page in Japanese and then use their cursor to select words in the text/content to translate into English. However, I would like to be able to expand this functionality to PDF files. Essentially, I'd like the user to be able to submit a PDF file and have the browser render that PDF file in such a way that when the user selects/highlights words in the PDF, the browser can somehow relay what the text of the highlighted section is, such as via javascript, to be then relayed to a PHP variable.
I know there are a lot of posts on stackoverflow asking similar questions (I've spent hours upon hours trying to sort through them all!), but I can't seem to find a definitive answer on whether this is possible. It seems there are lots of options for converting PDF to HTML or extracting text from PDF, but to be quite honest, I'm confused if any of those options are relevant to what I am trying to accomplish. And I know there's a javascript API for Adobe, but I'm under the impression the javascript needs to be embedded in the PDF already, which will not be true if the user is uploading their own PDF files to render. Even if that is possible, it seems there's no native text selection support in the Adobe javascript API....
Is there a straightforward workaround (oxymoron?) to doing this? Again, I want to be able to pass text selected in a PDF to a variable -- the effect is the user highlights words they don't know so those words can be added to a word bank for retrieval in a dictionary.
Let me know if I can be clearer on anything. Thank you!

I think your best bet is to convert the PDF to HTML (see this answers) and then you are already set as you already implemented everything for regular HTML.

Related

Determine if PDF file has searchable text in PHP

We have hundreds of PDF files on a server. Some of them contain searchable text and others do not.
I was asked to find out which are searchable and which are not.
Does anybody know of a way to read in a bunch of PDFs and determine if that PDF document contains text that is searchable/selectable or if the pdf only contains non-selectable/searchable text which needs to be OCRd?
I don't even need to actually read in the text; I just need to be able to detect possibly by tags or keywords, something that suggests that there are fonts or something like that in the raw data.
Are there tags in a searchable PDF that make it easy to detect?
Thanks

You could modify this code(pdf2text) to suit your purposes, I believe. Or this answer might get you to the right spot as well.

Filling an existing dynamic and complex PDF form from Website

I have seen a lot of questions regarding PDF's forms filled from a Website by using several PDF libraries but I can't find an straight answer to my concern. I have a very complex PDF that somebody built with LiveCycle, which includes things like buttons for dynamic addition of rows in tables.
I wonder if it is possible to fill this form from a Website's form or a DDBB, and if there is any library that can achieve things like adding the extra rows to the table and/or to peep inside the internal PDF's javascript to understand it's logic.
Adobe LiveCycle is not even an option, as their server solutions' prices, are far out of the mortals' reach.
Can anybody point me to the right direction?

It's possible if you hand write a parser (which is quite a feat..) but it's extremely difficult. It would probably be easiest to simply generate a new pdf with the needed information stored as static text.

After some experiments I have found that the only thing you need is to modify the XFA lines inside the XDP file which is contained inside the PDF (when it is a LiveCycle generated one) and everything will work.
An explanation can be found in here:
How can I merge data into an XDP file and return a PDF (via .NET)?

Submit HTML form to PDF

We have a high-resolution PDF (for printing) which has some form fields on it. We would like to have an HTML form which submits to the PDF, which is then placed into the respective fields.
I found a solution on google: http://koivi.com/fill-pdf-form-fields/
However, with that solution you only get an FDF file... And the demo does not work for me, opening the FDF file simply downloads another FDF file.
Since this PDF will be available to the public we would like to keep it as simple as possible. If we must open our original PDF and import this FDF file, we need a different solution (which I'm not sure is what the FDF file is for, since it didn't work).
A related post talking about .net framework had the same idea, but there were only paid commercial solutions: From HTML form to PDF
The PHP solutions I have found so far are for creating a new PDF, which is not what I need. Our PDF is created with Adobe Illustrator (or a similar adobe product) and is high-res with embedded fonts, svg and image content.
The form elements are in place, we just need to get the data to there.

Update April 11, 2013:
Since posting this question I have been utilizing FPDF on multiple projects where I needed to accomplish this goal. Although it cannot seem to "merge" template PDFs with the provided data, it can create the PDF from scratch.
One example I have used, I had a high resolution PNG for printing (similar to initial question) which we had to write the customer's name and today's date clearly in the center. I simply made the background of the PDF using FPDF->Image() and write the text afterwards using FPDF->Text().
It was very simple after all, you will need to look up the paper sizes to determine the X,Y,W,H of the image and then base your text fields relative to those numbers.
There was even a Form Filling extension, but I couldn't get it to work.
It seems as though I should answer my own question, although Visions answer may be better (seems to be deleted?). I used Vasiliy Faronov's link which was a comment to my main question: https://stackoverflow.com/a/1890835/200445
Here I found how to install pdftk and run a command to merge (flatten) my FDF and PDF files. I still used the "hacky" way to generate an FDF using Koivi's FDF Generator but it works for the most part.
One caveat is that some characters, like single and double quotes are not inserted correctly. It may be an issue of escaping the fields, but I could not find an answer.
Regardless, my PDF form generator is working, but anyone with a similar issue should look for a better solution.

There are number of tools which are not paid like itextsharp. try the following https://web.archive.org/web/20211020001747/https://www.4guysfromrolla.com/articles/030211-1.aspx Hope this code will help you. I have tried it its worked for me. If you can pay then there are number of paid tools which convert the HtML to PDF like ABCPDF etc.This example is in Asp.net and i am sure if you can convert it in PHP it will work for you too.

Problem in picture overlapping when I convert HTML page to PDF

I want to overlap pictures, but it is not working and I need some help.
Here's the link to the page I'd like to convert:
http://9m9.com/innovative/sample/two.html
I want to convert this page to a PDF. You can see the small image overlapping the bigger one.
This is the page where you can click on a link that will convert the page to PDF.
http://citysoftsolutions.com/eclients/virtualtour/view_property_images.php?pid=9&uid=67
As you can see the image is placed behind the big image.
I'm using this converter script: http://mpdf.bpm1.com/

When I printed it using PrimoPDF driver it came out just fine. Last image was easily laid over. So there must be a bug with the script you're using.
What do I suggest?
If you'd like to convert your pages to PDFs "on the fly" I suggest you either
contact script creator and inform them of a bug in the script
use a different script (I'd check out this question that can help you)
If you'd like to just provide PDFs of your page I suggest you install a PDF printer driver (like PrimoPDF that I'm using) and print those pages yourself and use those.
I'm not working for Nitro PDF Software company nor am I related to them in any way. So this is not me advertising their products/services.
On a sidenote
Something's telling me that what you'd actually like to do is to create a PDF flyer/promo material or something. If that's actually what you're after I suggest you do that using some software that's meant for such a job. Microsoft Office Word will do, but you'll better off using some other. If it's a one page leaflet you could use Adobe Illustrator or CorelDraw. But if it's going to be an actual multipage document use something like Word or Adobe InDesign.
Word is probably something you can easily master. So go with that one.

Spinning PDF's by passing a URL and $_GET parameters?

I have spent a lot of time trying to get dompdf (http://www.digitaljunkies.ca/dompdf/) to work but I keep running into problems. I am trying to generate a PDF from a PHP script which generates a fairly complex, filled out web form. The script accepts a $_GET parameter (record number) and fills out the form accordingly with data from the database. I have no problem getting this data into the script as a string or any type of value really. What I am wondering is what the best approach would be for converting this type of data to a PDF?
The flow is as follows: user completes form and is taken to confirmation page which I would like to add a "Save as PDF" button. At this point one of two things could happen, the page that is currently being displayed in the browser could be spun directly to a pdf or a call to itself (scriptname.php?id=xyz) could be made using something like PHP's http_get() function and store the HTML as a string. From there I am having issues with preparing an accurate representation as a PDF.
I have heard some talk about fpdf but their examples don't really lead me to believe you can use dynamic data as the source, but please correct me if I am wrong about this.
Any input would be appreciated.
-- Nicholas

Well, I didn't know dompdf. Strange that it uses either a commercial library (PDFlib) or an outdated (?) one (CPDF, not updated for 3 years). But well, as long as it works (concept is interesting).
I don't understand what you mean by "use dynamic data as the source" (or rather, I see not point in generating static PDF!), but FPDF is used to generate various dynamic documents, like invoices in e-commerce products. I saw people using forks (like TCPDF) to handle Unicode data, though.
You can't transform an HTML page to PDF with FPDF, but you have a quite precise control of layout, using concept of cells with data.
You can see such kind of code there: http://svn.prestashop.com/trunk/classes/PDF.php

In the past when faced with this issue, I have used FPDF for the placement of data on a PDF template. Then, by setting the appropriate HTTP header, force the browser to pop open the Download / Save As box for the user to save said PDF.
In a class that extends FPDF/FPDI appropriately, use something like the following to generate a PDF from a template PDF you've already created (http://www.setasign.de/products/pdf-php-solutions/fpdi/):
$this->setSourceFile('pdf_template.pdf');
$template_page = $this->importPage(1, '/MediaBox');
$this->useTemplate($template_page, 0, 0);
Then, have FPDF generate the PDF for output using:
$this->Output();
You can also extend FPDF to accept (limited) HTML for formatting using the script found here: http://www.fpdf.org/en/script/script41.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.