I've seen a lot of issues where the problem was the same, but most solutions still ended with installing PDFtk or they wouldn't need the flatten function. Sadly not for me.
The issue
Using fpdm I've created an automatically filled PDF-form.
Problem is, that the values in checkboxes do not show up if the PDF is opened and printed in the browsers PDF viewer (on all tested browsers). As it will be users who will download this PDF the values should be visible no matter in what program the PDF is being viewed in.
So my idea was to flatten the PDF but fpdm needs PDFtk to flatten the PDF which cannot be installed on the server.
Also as we're using TYPO3 CMS I would like to avoid adding a complete framework (such as zend-framework).
I have tried to flatten the PDF using graphicsMagick, but first off the quality was really bad and secondly one of two pages was missing using the following command from PHP:
shell_exec('gm -flatten form.pdf form.pdf')
Do I simply need to change the gm-command? If yes, how so?
And if GM doesn't work does anyone know an open source/free solution to flatten PDF forms?
Thanks in advance for your help.
Update
I found a command to flatten the PDF with graphicsMagick and it looks a lot better than with my last attempt:
shell_exec('gm convert -density 300 ' . $tmpFile . ' +adjoin -append -flatten ' . $PDFFile);
It's still not perfect as the fonts change to what was used before flattening, but maybe I'll find a solution for that too. But contrary to my last attempt no page is missing.
Yet, I'm still wondering if PDFtk is the only free PHP compatible tool to fill AND flatten PDF forms (ignoring all the tools that in the end are dependant from PDFtk)?
Instead of using graphicsMagick to flatten the pdf we stumbled upon pdftocairo. The following command then flattens the pdf without changing fonts and support for comb-fields (which wasn't supported neither by graphicsMagick nor GhostScript):
shell_exec("pdftocairo -pdf $tmpFile $PDFFile");
So if anyone needs to flatten PDFs on a Ubuntu 18.04 server pdftocairo contrary to PDFtk can be installed on it without additional packages and works like a charm for what I needed. It also has more PDF-relevant functions which could probably be comparable to PDFtk.
Related
I am developing an API, in PHP, hosted on a linux server, that requires me to make jpeg previews for a .pptx powerpoint presentation.
I first convert the file to pdf and then convert the pdf to jpegs.
The second step is easy, with ghostscript, it's the first part that's proving difficult.
I have tried using the libreoffice executable, but pptx isn't completely compatible. Certain backgrounds become invisible.
I have the same problem with many 3rd party APIs (which I suspect also use libreoffice); the ones that do work, are ridiculously expensive.
Installing office on a Linux server and using COM functions seems impossible, or very tedious at best.
I have looked at Aspose.Slides, which also seems rather expensive, and their documentation is filled with errors.
I could use suggestions on how to tackle this problem.
I have tried to find the underlying problem of why LibreOffice and online conversion tools have a problem with the backgrounds of the presentations I need to convert.
The background is a .emf file, which has bad support.
My solution
I've unzipped the presentation, converted the .emf files to png (using ghostscript), changed all mentions of .emf to .png in the XML, and rezipped the altered presentation.
When I now use the LibreOffice headless to convert to pdf, the background shows up.
It might be a bit hacky, but it works for the intent of my program.
ps. I see that my question has gathered a few downvotes. In my opinion it was a valid question, and listed the various solutions that had worked for others, but not for me. If anyone has insights or ways to improve it, feel free to comment.
Right first a little background that will help put this all in focus.
I have several indd files (indesign). I can convert these to pdf and then to docx.
Using the phpword library I can then effectively do a mail merge and replace several areas of my document with text and one image.
I then want to convert that to a pdf, which I can then stitch several pdfs together for printing with ghostscript.
I have a word macro that I can execute just find via standard command line functions. If I try that same command line in php it just hangs.
I've tried various forms of that, using system, exec, passthru - using Psexec all either hang and then timeout, or don't work and skip through.
I've seen other examples using COM objects thing like this.
http://www.sitepoint.com/make-microsoft-word-documents-php/
all either hang or give me problems with the com object that I'm trying to make.
Am I trying for the impossible, or perhaps is there another way.
I've also given e-PDF Document Converter v2.1 a go but without success.
Currently I'm thinking that there is some permission thing going on but I'm really at a loss as to how to get around it or what to do.
I would maybe like to use either the libreoffice or the openoffice as they both seem to have command line tools but when I open the pdf or the doc file they display very poorly.
Any help.
Thanks
Richard
Update
Just thinking maybe I'll stitch the word documents together and then just allow the user to download it and then they can print it.
Job done easy!
But if there is a better way - I'm open to it.
Update 2
On a windows platform
Maybe something like next ?
sudo apt-get install unoconv
doc2pdf respondus-docx-sample-file.docx
In php :
exec("doc2pdf \"" . $youPdfFile . "\"");
I'm creating a docx file from an HTML page using pandoc, but for the life of me I can't seem to get it to take on any kind of styling or successfully use a dotx template. I don't know if it's because you can't style docx files or I'm doing something wrong - documentation isn't all that verbose for pandoc.
I've also tried just echoing the html out and setting headers so the client will open the file as a doc, but this has some problems when you save it (it will try to save as an html file and converting to a doc isn't all that easy).
What I want to do is create an editable document which is styled and contains a logo image - just font types, colours and sizes would be enough, maybe some basic positioning would be nice.
Does anyone know how to acheive this on a LAMP - like system?
I stumbled on using Libreoffice on the CLI to do the conversion, with a much greater degree of success. It's still not perfect but alot better than what I was getting, and seems to take onboard font types, sizes and colours alot better.
Steps to install and use (CentOS / Redhat here):
sudo yum install libreoffice libreoffice-headless
You may need some X11 / Xorg libs, easiest to just install Xorg if it won't run.
libreoffice --headless --convert-to docx --outdir ./ myfile.html
Worked for me, I ended up with a serviceable .docx file which could be read by MS Word 2008 and LibreOffice 3.5.6.2.
Other tools that might also be worth examining are JODReports and Docmosis which are focussed on generation from templates (mail-merge) rather than just format conversion. JODReports is free/Open Source, Docmosis is not. Both can be ivoked from PHP in various ways and Docmosis has a cloud-service which means a zero-install footprint if your application is allowed to reach the cloud. Please note I work for the company that created Docmosis.
I think they both can work from docx/dotx templates and produce a variety of output formats including DocX
Hope that helps.
i run a job search site, and i need to convert doc, docx and pdf files into HTML on linux CentOS server running php. People submit these files as resumes. So far, I found PHPDocx to be great at converting docx to html. But I am stuck at doc/pdf. PDFTOHTML gives error "bad color" when i run tests. As far as doc, i only found wvwave, which seems complex and bulky to install.
does anyone have any ideas on how to easily convert doc/pdf to HTML?
The only thing i can think of is FPDF.
It is intended for creating PDF files in PHP but it can also open PDF files.
Maybe you can use that as a base and develop some sort of toHTML function for it.
It is completely free to use and it has some extensions already.
It MIGHT help you.
http://www.fpdf.org
EDIT:
Thanks for the addition to my post in the comments to Pierre:
You can use fpdi: http://www.setasign.de/products/pdf-php-solutions/fpdi but the input pdf is just like an image.
I havent taken a look at it myself so far but this might help.
As far as .doc files go how about trying OpenOffice/LibreOffice, something like:
lowriter -convert-to html doc_file.doc –
As far as PDF goes, if the PDF is a graphical representation of text then you're out of luck, best you can do is try convert it to an image with ImageMagick, if it is a proper text it should easily convert.
There are various tools out there already to do this, such as http://dag.wieers.com/home-made/unoconv/, http://www.phpdocx.com/ (which you've already tried)
http://www.phplivedocx.org/2009/08/13/convert-docx-doc-rtf-to-html-in-php/ looks promising.
Or, you could install a portable version of libreoffice on your server which allows command line conversion
https://help.libreoffice.org/Common/Starting_the_Software_With_Parameters
I'm sure there'll be tutorials out there (on libreoffice support area)
To easily convert pdf to html, I would suggest pdf2htmlEX which produces outstanding HTML and is fast enough for runtime converting. You should first put some effort to optimize and build it for your system. There is simple build howto included on the project link.
I need to include pdf files in some webpages, and I'm gettin' in troubles.
The app is a simple newspaper's archive, in which i can read right on page or download as pdf files, one file per page. What my customer can provide me is one pdf file for each page; what my customer wants from me is to navigate them in indexes (with page thumbnail) and have a read from a choosen one direcly in page; I'm using php/mysql.
I started trying out to use the <object> tag with type="application/pdf", but i found it's deprecate 'cause it's not crossplatform at all (there's no support on linux's browsers, but even my windows' firefox 3.5 couldn't show me anything).
I guessed I could transform that pdf in something different (html or simply images are good enough), but the only thing i found is ImageMagick, that I cannot use as I must install on server and I can't, as I'm not admin of that machine.
So, I'm finally looking for suggestions
Thanks
Display the pdf inline using an IFRAME. The thumbnail you can generate with imageMagik. You should be able to use the command line version of ImageMagik to resize and convert to jpg.
edit
Your best bet is to talk to the server admin and have them install php support for ImageMagik then you can use it as a class.
If you can't get support to install on the server, you will have to use the command line version.
You might be able to Google around for a library that wraps the command line, but it would be trivial to write it yourself.
With this in place you can create a large readable black and white png for each page. It should click through to the pdf.