Converting PPTX to PDF with PHP - php

I am developing an API, in PHP, hosted on a linux server, that requires me to make jpeg previews for a .pptx powerpoint presentation.
I first convert the file to pdf and then convert the pdf to jpegs.
The second step is easy, with ghostscript, it's the first part that's proving difficult.
I have tried using the libreoffice executable, but pptx isn't completely compatible. Certain backgrounds become invisible.
I have the same problem with many 3rd party APIs (which I suspect also use libreoffice); the ones that do work, are ridiculously expensive.
Installing office on a Linux server and using COM functions seems impossible, or very tedious at best.
I have looked at Aspose.Slides, which also seems rather expensive, and their documentation is filled with errors.
I could use suggestions on how to tackle this problem.

I have tried to find the underlying problem of why LibreOffice and online conversion tools have a problem with the backgrounds of the presentations I need to convert.
The background is a .emf file, which has bad support.
My solution
I've unzipped the presentation, converted the .emf files to png (using ghostscript), changed all mentions of .emf to .png in the XML, and rezipped the altered presentation.
When I now use the LibreOffice headless to convert to pdf, the background shows up.
It might be a bit hacky, but it works for the intent of my program.
ps. I see that my question has gathered a few downvotes. In my opinion it was a valid question, and listed the various solutions that had worked for others, but not for me. If anyone has insights or ways to improve it, feel free to comment.

Related

Convert office files (Excel spreadsheets, Word documents, Powerpoint presentations) into images or pdfs in a PHP/Asp.net web app

I'm doing a job for a client where I'm supposed to make a web application that can convert office documents and pdfs into a series of images, one per page/sheet. It's really easy with pdfs but much harder with the office files. So a solution that will convert the office files into pdfs is also good, I'll do the conversion to images later.
The environment will most likely be a windows server with the office products already installed (weird, I know)
Ideas I tried so far:
Using office COM objects - I ran into multiple problems and it seems inefficient and unreliable.
OpenOffice and unoconv - ran into problems using them on Windows. I may try Linux but my client says OpenOffice didn't work for them and I think I read somwhere that this solution is not recommended.
Aspose - tried it before realizing my client won't pay for it. It seems like the ideal solution but it's too expensive. So I need a solution without payed software.
Other options I thought of and didn't try:
Office interop dll - sounds ideal but people talk about performance issues and memory leaks.
Power Tools for Open XML (https://powertools.codeplex.com/) or DocX (http://docx.codeplex.com/) - libraries that sound useful but if I understand correctly they'll only work for docx files.
Any suggestions?
Are you aware that Adobe offers this as a service?

Displaying word documents, excel sheet, power point in browser

Is there is any way to displaying world document,excel sheet and power point in browser with out downloading.
I assume that you are going to use php for this, so you can try checking some libraries such as PHPWord by Microsoft for example.
If you wish to only display the document content, it is possible to do using some scripting language such as php. Basically office 2007+ formats are zipped XML documents with changed extension. Make a simple word 2007+ document, save it and change extension from .docx to .zip, than you can extract it and see what it's made of. You can find a lot of details here. Now displaying content may be a little tricky. As mentioned, there are libraries out there to handle this, but how will they handle the documents, I am not really sure. Most of them are abandoned, PHPword is in beta since 2011.
There are some indications that Apache is working on cloud version of Open office, but there is no release date yet. Once done, you will have a full featured office suite web app.
If you feel really creative you could use cron job (or scheduled task if you like Windows) to open a document, take a screenshot and basically make .jpg or .png version of the document (works fine with short documents, longer ones may be problematic), displaying it in a browser without much complication. It is also possible to schedule export to .pdf - all browsers do have Adobe PDF plugins.
To sum up, using php for parsing simple documents should be fine, but getting complex docs to display properly, may be much more difficult task and possibly not worth your time. I would go for cron export to pdf, to preserve most if not all of the document's structure.

How to convert documents from .doc to text

I have been pondering writing this question for quite some time.
I work for a small-sized news corporation in Vietnam.
The server I have is running for documents is the latest version of Ubuntu (with PHP/Apache obviously), which means that formats such as .doc and .docx will not be able to be opened natively, as far as I know.
However, when reporters upload documents, half the time they do it in some sort of Microsoft format. This means my Linux machine cannot open and pick out keywords, which is extremely frustrating to me; this is because things like pdf2txt.py do not work.
Is a way to get around this problem, without inconveniencing the reporters too much? I understand that since I am running a Linux server, I may have to run some sort of third-party application to do the work for me, which could work in the short run, but it could pose some security risks.
Summary: How can I have a Linux server automatically convert any format such as .doc and .docx to PDF for further manipulation?
For oldschool doc files, take a look at catdoc, and wv.
For an all around solution that can convert anything that OpenOffice can open to anything that OpenOffice can save, is unoconv.

Converting doc, docx, pdf to HTML using PHP linux

i run a job search site, and i need to convert doc, docx and pdf files into HTML on linux CentOS server running php. People submit these files as resumes. So far, I found PHPDocx to be great at converting docx to html. But I am stuck at doc/pdf. PDFTOHTML gives error "bad color" when i run tests. As far as doc, i only found wvwave, which seems complex and bulky to install.
does anyone have any ideas on how to easily convert doc/pdf to HTML?
The only thing i can think of is FPDF.
It is intended for creating PDF files in PHP but it can also open PDF files.
Maybe you can use that as a base and develop some sort of toHTML function for it.
It is completely free to use and it has some extensions already.
It MIGHT help you.
http://www.fpdf.org
EDIT:
Thanks for the addition to my post in the comments to Pierre:
You can use fpdi: http://www.setasign.de/products/pdf-php-solutions/fpdi but the input pdf is just like an image.
I havent taken a look at it myself so far but this might help.
As far as .doc files go how about trying OpenOffice/LibreOffice, something like:
lowriter -convert-to html doc_file.doc –
As far as PDF goes, if the PDF is a graphical representation of text then you're out of luck, best you can do is try convert it to an image with ImageMagick, if it is a proper text it should easily convert.
There are various tools out there already to do this, such as http://dag.wieers.com/home-made/unoconv/, http://www.phpdocx.com/ (which you've already tried)
http://www.phplivedocx.org/2009/08/13/convert-docx-doc-rtf-to-html-in-php/ looks promising.
Or, you could install a portable version of libreoffice on your server which allows command line conversion
https://help.libreoffice.org/Common/Starting_the_Software_With_Parameters
I'm sure there'll be tutorials out there (on libreoffice support area)
To easily convert pdf to html, I would suggest pdf2htmlEX which produces outstanding HTML and is fast enough for runtime converting. You should first put some effort to optimize and build it for your system. There is simple build howto included on the project link.

Programatically changing PDF quality in PHP

Can I programatically change the quality of a PDF?
I have a client that is a newspaper, and when they submit the PDF form of the paper to their site they are submitting the same copy they send to the printer which can range from 30-50Mb. I can manually lower the quality (still plenty high for the web) and it will be 3-5Mb so this will help my hosting substantially.
That seems like something that would require nothing short of the Adobe PDF SDK / libraries. I have worked with them quite bit, but I have never attempted to change the resolution of an existing PDF. The libraries are pricey so it's likely that is not an option for you.
I want to say that Perl's PDF::API2 has an optimize script bundled with it, but I have never used that functionality. It may be worth a look. The module itself is pretty thorough. Although, a PDF that large not be that fastest to process with it.
You should check TCPDF library or php documentation.
I never worked with pdfs in php, but I think you can do that you need with TCPDF easily. If your pdf is composed by images, check this example, maybe help you.
Regards
Zend_Pdf is the best Free library for reading and manipulating existing PDFs, but it does not go nearly deep enough to do what you need. I do not believe there is a PHP library for manipulating PDF files at that level (that is, being able to extract embedded images and replace them with lower quality versions).

Categories