Convert docx to pdf in PHP - php

Right first a little background that will help put this all in focus.
I have several indd files (indesign). I can convert these to pdf and then to docx.
Using the phpword library I can then effectively do a mail merge and replace several areas of my document with text and one image.
I then want to convert that to a pdf, which I can then stitch several pdfs together for printing with ghostscript.
I have a word macro that I can execute just find via standard command line functions. If I try that same command line in php it just hangs.
I've tried various forms of that, using system, exec, passthru - using Psexec all either hang and then timeout, or don't work and skip through.
I've seen other examples using COM objects thing like this.
http://www.sitepoint.com/make-microsoft-word-documents-php/
all either hang or give me problems with the com object that I'm trying to make.
Am I trying for the impossible, or perhaps is there another way.
I've also given e-PDF Document Converter v2.1 a go but without success.
Currently I'm thinking that there is some permission thing going on but I'm really at a loss as to how to get around it or what to do.
I would maybe like to use either the libreoffice or the openoffice as they both seem to have command line tools but when I open the pdf or the doc file they display very poorly.
Any help.
Thanks
Richard
Update
Just thinking maybe I'll stitch the word documents together and then just allow the user to download it and then they can print it.
Job done easy!
But if there is a better way - I'm open to it.
Update 2
On a windows platform

Maybe something like next ?
sudo apt-get install unoconv
doc2pdf respondus-docx-sample-file.docx
In php :
exec("doc2pdf \"" . $youPdfFile . "\"");

Related

PHP flatten PDF form after filling fields with fpdm WITHOUT PDFtk

I've seen a lot of issues where the problem was the same, but most solutions still ended with installing PDFtk or they wouldn't need the flatten function. Sadly not for me.
The issue
Using fpdm I've created an automatically filled PDF-form.
Problem is, that the values in checkboxes do not show up if the PDF is opened and printed in the browsers PDF viewer (on all tested browsers). As it will be users who will download this PDF the values should be visible no matter in what program the PDF is being viewed in.
So my idea was to flatten the PDF but fpdm needs PDFtk to flatten the PDF which cannot be installed on the server.
Also as we're using TYPO3 CMS I would like to avoid adding a complete framework (such as zend-framework).
I have tried to flatten the PDF using graphicsMagick, but first off the quality was really bad and secondly one of two pages was missing using the following command from PHP:
shell_exec('gm -flatten form.pdf form.pdf')
Do I simply need to change the gm-command? If yes, how so?
And if GM doesn't work does anyone know an open source/free solution to flatten PDF forms?
Thanks in advance for your help.
Update
I found a command to flatten the PDF with graphicsMagick and it looks a lot better than with my last attempt:
shell_exec('gm convert -density 300 ' . $tmpFile . ' +adjoin -append -flatten ' . $PDFFile);
It's still not perfect as the fonts change to what was used before flattening, but maybe I'll find a solution for that too. But contrary to my last attempt no page is missing.
Yet, I'm still wondering if PDFtk is the only free PHP compatible tool to fill AND flatten PDF forms (ignoring all the tools that in the end are dependant from PDFtk)?
Instead of using graphicsMagick to flatten the pdf we stumbled upon pdftocairo. The following command then flattens the pdf without changing fonts and support for comb-fields (which wasn't supported neither by graphicsMagick nor GhostScript):
shell_exec("pdftocairo -pdf $tmpFile $PDFFile");
So if anyone needs to flatten PDFs on a Ubuntu 18.04 server pdftocairo contrary to PDFtk can be installed on it without additional packages and works like a charm for what I needed. It also has more PDF-relevant functions which could probably be comparable to PDFtk.

Retrieving images in the uploaded pdf document in php

I am trying to display the images in the pdf document that I uploaded to the server as hyperlinks in php so that if user clicks on them they will get the corresponding document.
Please help me ,Thanks in advance!
Use pdfimages, which comes with the open-source xpdf software package (for *nix operating systems). You'll have to call it through exec or the like, then work with the output from PHP. I am not aware of any PHP library that provides this functionality, so you're going to have to experiment.
EDIT
You mentioned that you aren't experienced with PHP... I thought I'd add that this isn't a quick-and-easy type of task, you certainly aren't going to find a bunch of tutorials around the internet for this.
To get started, you'll have to install the xpdf package on your server. There's a lot of different ways to do this depending on which OS you've got.
After that is set up, you'll be using a command line to execute a program on your server; you'll want to capture the output of that command in PHP and work with it further. So initially, you'll want to work out exactly what your command line will look like as well as what the output looks like and means - do this from command line, don't worry about the PHP part yet. In this case, your output is going to be a list of the image files extracted from a given PDF, you're command line call will look something like "pdfimages mypdf.pdf". Play around, find out what happens.
After you work out exactly what command line you need to send and what the command does, you can focus on the PHP angle. In a nutt shell, you want PHP to execute the exact command that you've already worked out. Look at the manual for exec for information on how to call a command line and get the output back. Write your script to make the correct call and show the call's output.
Next, move on to doing something with that output. I presume you'll want to somehow store the extracted images in a web-accessible place, put them in the database, show them to the user, etc. That is the very last stage after you've worked out the initial steps.
Good luck!

Converting doc, docx, pdf to HTML using PHP linux

i run a job search site, and i need to convert doc, docx and pdf files into HTML on linux CentOS server running php. People submit these files as resumes. So far, I found PHPDocx to be great at converting docx to html. But I am stuck at doc/pdf. PDFTOHTML gives error "bad color" when i run tests. As far as doc, i only found wvwave, which seems complex and bulky to install.
does anyone have any ideas on how to easily convert doc/pdf to HTML?
The only thing i can think of is FPDF.
It is intended for creating PDF files in PHP but it can also open PDF files.
Maybe you can use that as a base and develop some sort of toHTML function for it.
It is completely free to use and it has some extensions already.
It MIGHT help you.
http://www.fpdf.org
EDIT:
Thanks for the addition to my post in the comments to Pierre:
You can use fpdi: http://www.setasign.de/products/pdf-php-solutions/fpdi but the input pdf is just like an image.
I havent taken a look at it myself so far but this might help.
As far as .doc files go how about trying OpenOffice/LibreOffice, something like:
lowriter -convert-to html doc_file.doc –
As far as PDF goes, if the PDF is a graphical representation of text then you're out of luck, best you can do is try convert it to an image with ImageMagick, if it is a proper text it should easily convert.
There are various tools out there already to do this, such as http://dag.wieers.com/home-made/unoconv/, http://www.phpdocx.com/ (which you've already tried)
http://www.phplivedocx.org/2009/08/13/convert-docx-doc-rtf-to-html-in-php/ looks promising.
Or, you could install a portable version of libreoffice on your server which allows command line conversion
https://help.libreoffice.org/Common/Starting_the_Software_With_Parameters
I'm sure there'll be tutorials out there (on libreoffice support area)
To easily convert pdf to html, I would suggest pdf2htmlEX which produces outstanding HTML and is fast enough for runtime converting. You should first put some effort to optimize and build it for your system. There is simple build howto included on the project link.

Open documents(pdf,doc,docx,txt) in browser page using php (without using google docs viewer) [duplicate]

This question already has answers here:
Is there a PDF parser for PHP? [closed]
(7 answers)
Closed 8 years ago.
My question is that i want to open documents(pdf,doc,docx,txt) in browser page using php (without using google docs viewer) can any one help me?
Some of these are doable. Some, not so much. Let's tackle the low-hanging fruit first.
Text files
You can just wrap the content in <pre> tags after running it through htmlspecialchars.
PDF
There is no native way for PHP to turn a PDF document into HTML and images. Your best bet is probably ImageMagick, a common image manipulation program. You can basically call convert file.pdf file.png and it will convert the PDF file into a PNG image that you can then serve to the user. ImageMagick is installed on many Linux servers. If it's not available on your host's machine, please ask them to install it, most quality hosts shouldn't have a problem with this.
DOC & DOCX
We're getting a bit more tricky. Again, there's no way to do this in pure PHP. The Docvert extension looks like a possible choice, though it requires OpenOffice be installed as well. I was actually going to recommend plain vanilla OpenOffice/LibreOffice as well, because it can do the job directly from the command line. It's very unlikely that a shared host will want to install this. You'll probably need your own dedicated or virtual private server.
In the end, while these options can be made to work, the output quality is not guaranteeable. Overall, this is kind of a bad idea that you should not seriously consider implementing.
I am sure libraries and such exist that can do this. Google could probably help you there more than I can.
For txt files I would suggest breaking lines after a certain number of characters and putting them inside pre tags.
I know people will not be happy about this response, but if you are on a Linux environment and have pdf2html installed you could use shell_exec and call pdf2html.
Note: If you use shell_exec be wary of what you pass to it since it will be executed on the server outside of PHP.
I thought I'd just add that pdfs generally view well in a simple embed tag.
Or use an object so you can have fall backs if it cannot be displayed on the client.

I want to write multiple images to an odt file either in python or php

I want to write multiple image files to a odt file. I will be specifying a dir and the script will take it from there thru a loop. But where do i start? I have never done anything like this before!
I found this python code, which can convert html 2 python... so we can parse an html first and then call this one. But there is no reference on how to use this.
html2odt code
Atlast I found a PHP way to write odt direct! Its well documented.
http://www.odtphp.com/
I have also written a complete practical solution in php. You can upload multiple images and get the odt document generated.
The code is hosted at http://code.google.com/p/images2odt/
The first post is done here.
For anyone wanting to use the Python code will need a Python interpreter version 2.6. It might also work with version 2.7. It's mainly used in Linux but there are Windows and Mac versions as well. You will also need the files listed in the from and import statements. These files are in some of the other folders. It looks like it is a part of a much bigger Linux package. One last thing, Python scripts usually takes their arguments from a command line.
Additional info:
I looked over the setup.py file and it told me that this is an API library for open documents called odfpy. The version is 0.9.2. The link it has for the documentation is broken. A google search for odfpy came up with a place to download a more recent version (0.9.4) in a tarbell here:
http://pypi.python.org/pypi/odfpy
The documentation can be found here in an Open Office document:
https://joinup.ec.europa.eu/software/odfpy/document/api-odfpyodt

Categories