Having an issue with some PDF files not displaying properly in our iPad app. I have come to the conclusion that we are needing to standardize by "converting" PDF to PDF. I have successfully processed this using ImageMagick to convert the PDF to PNG (resized), and then pushing the PNG(s) back into a PDF. However, something within ImageMagick is making photos within PDFs display wrong. Same issue just converting a JPG or other graphic to PDF in ImageMagick. I solved that by taking the output of the converted ImageMagick file and converting it again using GD to PNG, then pushing it through our PDF converter.
So my question is this: What other PHP workflows would work with this, other than using ImageMagick for the conversion back to PDF? We are not opposed to a paid solution, we just need something that works. Our server runs centOS.
My gut instinct, other than yelling at whoever wrote the PDF reader that you're suffering with, would be to convert the PDF to PostScript using pdftops, then convert it back into a PDF using Ghostscript. You can enable a number of document compatibility options at that point, which may make it more digestible.
While this may have side effects, they should be minimal. PDFs are basically a wrapper around a PostScript document, and it looks like pdftops can not do utterly stupid things during the conversion process.
This may break or simply not work with advanced PDF features, like digital signatures or forms.
Related
I am developing an API, in PHP, hosted on a linux server, that requires me to make jpeg previews for a .pptx powerpoint presentation.
I first convert the file to pdf and then convert the pdf to jpegs.
The second step is easy, with ghostscript, it's the first part that's proving difficult.
I have tried using the libreoffice executable, but pptx isn't completely compatible. Certain backgrounds become invisible.
I have the same problem with many 3rd party APIs (which I suspect also use libreoffice); the ones that do work, are ridiculously expensive.
Installing office on a Linux server and using COM functions seems impossible, or very tedious at best.
I have looked at Aspose.Slides, which also seems rather expensive, and their documentation is filled with errors.
I could use suggestions on how to tackle this problem.
I have tried to find the underlying problem of why LibreOffice and online conversion tools have a problem with the backgrounds of the presentations I need to convert.
The background is a .emf file, which has bad support.
My solution
I've unzipped the presentation, converted the .emf files to png (using ghostscript), changed all mentions of .emf to .png in the XML, and rezipped the altered presentation.
When I now use the LibreOffice headless to convert to pdf, the background shows up.
It might be a bit hacky, but it works for the intent of my program.
ps. I see that my question has gathered a few downvotes. In my opinion it was a valid question, and listed the various solutions that had worked for others, but not for me. If anyone has insights or ways to improve it, feel free to comment.
I'm currently working on a project involving converting form results to a downloadable PDF, which is simple enough. I was recently asked, however, to add attachment functionality. I'm using dompdf to convert the form results to PDF, but is there a way to convert the attachments separately (can be jpg, png, doc/x, or pdf) to a PDF file and then append the attachment file to the dompdf output?
I can handle the implementation; are there any free libraries that will support anything like this? I found FPDF, which supports images, but it does not support Word files.
First of all, you will need to find a library for every kind of conversion you need (you mentioned jpg, png, and doc/x, but you didn't say if that was all of them.)
For common office formats, you can launch a headless (meaning it can run on a server without a graphical display) instance of OpenOffice or LibreOffice. Then you can interact with it from various programming languages, or you can use a ready-made commandline tool such as pyodconverter, to ask it to convert between various file formats. This is the best way to convert doc and docx files to pdf, by the way, short of spending money on Microsoft software.
As for "appending the attachment file", by which I take it you mean concatenating a bunch of PDF files together, you can use the free tool pdftk.
With PHP i'm creating an image with embedded content (base64 encoded files). If i see it using Firefox, or downloading it and then opening it with Inkscape (www.inkscape.org), the image is fine!.
But when i try to convert it using imagemagick (using convert command or with Imagick support for PHP) the embedded image doesn't come in with the final result.
I don't know if there is a special command or configuration i'm missing. I'm not using any special setup. Just ...
convert image.svg image.png
Thank you very much for your answer.
ImageMagick's built-in SVG rendering is really pretty horrible. Don't use it if you can avoid it. I'd recommend using librsvg instead, either using the command-line rsvg tool, or possibly with the PHP rsvg extension.
(Librsvg's rendering isn't always perfect either, but it should be able to handle embedded images just fine. If you want even better rendering, you could always try using Inkscape from the command line.)
Is the embedded image in PNG or in Jpeg format? What converter is used in PHP? I tried on Windows with latest ImageMagick, which doesn't come with RSVG. I found a RSVG-Converter build for Windows, and, while it does a good job, it skips the Jpeg version of the image of a test file.
On the other hand, the built-in converter handles correctly both images, but does an awful job on the home image shown in the SVG tutorial.
What is, according to you, the best way to convert uploaded files of any kind (.doc, .docx,...) into a pdf-file using nothing but php. Is it even possible to do so?
I looked at FPDF, but this creates the pdf files from text.
An other solution previously given was to use the PDFlib library on your server, but unfortunately, my server doesn't support this library...
What is the best way to convert to files my users upload on my site to pdf files?
A simpler approach would be to restrict uploads to .PDF format programmatically and require your users to only upload .pdf files. Provide a link on the upload page to a free and open source pdf printer (e.g. Cuteftp) that the user can install to create .pdf documents from any file that can be printed.
Trying to do it through PHP will be problematic because the uploads could be generated from many different programs that would be impossible to cater for in their entirety. e.g. How would it handle Scribus or ABC Flowcharter or any other 'non-standard' application someone used to create a document?
Much better to filter the upload upfront.
The best server-side PDF generator from those I tried was, so far, wkhtmltopdf, a WebKit-based, self-contained invisible browser that can render any HTML+CSS and generate a PDF from it. Reasonably fast and fairly reliable, has some useful PDF options, such as page size, orientation, etc.
The second part of the job in your case is to convert documents to HTML prior to feeding them to wkhtmltopdf. If possible, have your users upload the docs in HTML (Word and Co. can export (crappy) HTML). If this is not an option, you will have to find a tool just for that, which, in my opinion, is much easier than finding a tool that converts Word docs directly into PDF.
Good thing about wkhtmltopdf is also that you can feed the output of your PHP script to it using the ob_xxx() functions.
PHP Excel best simple way to create doc, docx, xls, xlsx, pdf files with PHP. Its lot easier with clear documentation.
Use Microsoft Office to render Microsoft Office documents, if you care about accuracy at all. This is easily done by invoking Office over COM.
Get access to your server, and install what you need. Doing so would be far easier than monkeying around with sub-par solutions.
Well... I can think of one way of doing it quite easily, but it doesn't involve using PHP.
Upload your documents to a folder on your server, that are browsable by your users.
EG: http://mysite.com/docs/
Then get your users to install a virtual printer driver such as Primo PDF
http://www.primopdf.com/index.aspx
then they can load the document into their browser, and print to PDF for offline browsing.
If this is not an option, and your dealing with office documents that conform to the openXML standard, you could attempt to parse the XML doc into a PHP page for display in the browser, then use JavaScript to trigger a print.
Unfortunately, it does still depend on your user having a PDF printer installed.
Alternatively, you could just load the docs natively, and print to your own PDF printer, then upload the PDF's to the web server for download.
I can't think of any easy way of doing this otherwise, without installing all sorts of different document parser tool-kits and doing a huge amount of behind the scenes work.
i run a job search site, and i need to convert doc, docx and pdf files into HTML on linux CentOS server running php. People submit these files as resumes. So far, I found PHPDocx to be great at converting docx to html. But I am stuck at doc/pdf. PDFTOHTML gives error "bad color" when i run tests. As far as doc, i only found wvwave, which seems complex and bulky to install.
does anyone have any ideas on how to easily convert doc/pdf to HTML?
The only thing i can think of is FPDF.
It is intended for creating PDF files in PHP but it can also open PDF files.
Maybe you can use that as a base and develop some sort of toHTML function for it.
It is completely free to use and it has some extensions already.
It MIGHT help you.
http://www.fpdf.org
EDIT:
Thanks for the addition to my post in the comments to Pierre:
You can use fpdi: http://www.setasign.de/products/pdf-php-solutions/fpdi but the input pdf is just like an image.
I havent taken a look at it myself so far but this might help.
As far as .doc files go how about trying OpenOffice/LibreOffice, something like:
lowriter -convert-to html doc_file.doc –
As far as PDF goes, if the PDF is a graphical representation of text then you're out of luck, best you can do is try convert it to an image with ImageMagick, if it is a proper text it should easily convert.
There are various tools out there already to do this, such as http://dag.wieers.com/home-made/unoconv/, http://www.phpdocx.com/ (which you've already tried)
http://www.phplivedocx.org/2009/08/13/convert-docx-doc-rtf-to-html-in-php/ looks promising.
Or, you could install a portable version of libreoffice on your server which allows command line conversion
https://help.libreoffice.org/Common/Starting_the_Software_With_Parameters
I'm sure there'll be tutorials out there (on libreoffice support area)
To easily convert pdf to html, I would suggest pdf2htmlEX which produces outstanding HTML and is fast enough for runtime converting. You should first put some effort to optimize and build it for your system. There is simple build howto included on the project link.