Convert .PPT (2007-x) to JPEG/GIF in Linux - php

I am having problems finding a solution to convert Powerpoint documents to JPEG.
Imagick is not able to handle .PPT
So I used unoconv, which handles .ppt files, but only up to ppt.(97/2003/XP) not 2007 altough it say input yes. It tells me that it cannot handle the source.
Is there a commandline solution or library, that is able to do that ?
PS: unoconv is based on openoffice lib.
Thanks in advance!

First try using unoconv v0.6, and make sure you read the README which has lots of tips to troubleshoot problems. Often issues are caused because e.g. not all required packages have been installed for a specific document format, or an existing office process is causing errors.
The troubleshooting section should be the first thing to look at:
https://github.com/dagwieers/unoconv#troubleshooting-instructions

You could try JODConverter and a recent version of OpenOffice.

Related

Converting PPTX to PDF with PHP

I am developing an API, in PHP, hosted on a linux server, that requires me to make jpeg previews for a .pptx powerpoint presentation.
I first convert the file to pdf and then convert the pdf to jpegs.
The second step is easy, with ghostscript, it's the first part that's proving difficult.
I have tried using the libreoffice executable, but pptx isn't completely compatible. Certain backgrounds become invisible.
I have the same problem with many 3rd party APIs (which I suspect also use libreoffice); the ones that do work, are ridiculously expensive.
Installing office on a Linux server and using COM functions seems impossible, or very tedious at best.
I have looked at Aspose.Slides, which also seems rather expensive, and their documentation is filled with errors.
I could use suggestions on how to tackle this problem.
I have tried to find the underlying problem of why LibreOffice and online conversion tools have a problem with the backgrounds of the presentations I need to convert.
The background is a .emf file, which has bad support.
My solution
I've unzipped the presentation, converted the .emf files to png (using ghostscript), changed all mentions of .emf to .png in the XML, and rezipped the altered presentation.
When I now use the LibreOffice headless to convert to pdf, the background shows up.
It might be a bit hacky, but it works for the intent of my program.
ps. I see that my question has gathered a few downvotes. In my opinion it was a valid question, and listed the various solutions that had worked for others, but not for me. If anyone has insights or ways to improve it, feel free to comment.

generate docx file with basic styling in php

I'm creating a docx file from an HTML page using pandoc, but for the life of me I can't seem to get it to take on any kind of styling or successfully use a dotx template. I don't know if it's because you can't style docx files or I'm doing something wrong - documentation isn't all that verbose for pandoc.
I've also tried just echoing the html out and setting headers so the client will open the file as a doc, but this has some problems when you save it (it will try to save as an html file and converting to a doc isn't all that easy).
What I want to do is create an editable document which is styled and contains a logo image - just font types, colours and sizes would be enough, maybe some basic positioning would be nice.
Does anyone know how to acheive this on a LAMP - like system?
I stumbled on using Libreoffice on the CLI to do the conversion, with a much greater degree of success. It's still not perfect but alot better than what I was getting, and seems to take onboard font types, sizes and colours alot better.
Steps to install and use (CentOS / Redhat here):
sudo yum install libreoffice libreoffice-headless
You may need some X11 / Xorg libs, easiest to just install Xorg if it won't run.
libreoffice --headless --convert-to docx --outdir ./ myfile.html
Worked for me, I ended up with a serviceable .docx file which could be read by MS Word 2008 and LibreOffice 3.5.6.2.
Other tools that might also be worth examining are JODReports and Docmosis which are focussed on generation from templates (mail-merge) rather than just format conversion. JODReports is free/Open Source, Docmosis is not. Both can be ivoked from PHP in various ways and Docmosis has a cloud-service which means a zero-install footprint if your application is allowed to reach the cloud. Please note I work for the company that created Docmosis.
I think they both can work from docx/dotx templates and produce a variety of output formats including DocX
Hope that helps.

.docx, .xlsx, .pdf to .pdf using PHP

I have relatively sensitive data in .docx, .xlsx and PDF files that all need to be converted to a single PDF file locally. Sending these files off to phpdocx or Google Docs or anything like this is not an option.
The only other option I am seeing is OpenOffice / LibreOffice but I am not satisfied with how they are converting the documents.
Is there any other alternative anyone is aware of? Thanks!
Definitely a difficult task. The very recent release of LibreOffice 3.6 has fixes to it's docx processing if that might help, but you haven't specified what the actual problems you encountered when you tried OpenOffice.
If you have time to experiment (and bring in any tools/languages you need to get the job done) you could try LibreOffice to produce PDFS, then use one of the many PDF libs to stitch the PDFs into the single file you require.
You could also look at ODFConverter which has traditionally been much better with DOCX than either OpenOffice or LibreOffice. This would allow you docx -> odt -> pdf. I think it can do the xlsx also. Then do the PDF stitching again.
I suggest testing the stages manually at first and if promising, try something like JODConverter (requires Java) to allow you to automate the process via scripts.
Good luck.

Converting doc, docx, pdf to HTML using PHP linux

i run a job search site, and i need to convert doc, docx and pdf files into HTML on linux CentOS server running php. People submit these files as resumes. So far, I found PHPDocx to be great at converting docx to html. But I am stuck at doc/pdf. PDFTOHTML gives error "bad color" when i run tests. As far as doc, i only found wvwave, which seems complex and bulky to install.
does anyone have any ideas on how to easily convert doc/pdf to HTML?
The only thing i can think of is FPDF.
It is intended for creating PDF files in PHP but it can also open PDF files.
Maybe you can use that as a base and develop some sort of toHTML function for it.
It is completely free to use and it has some extensions already.
It MIGHT help you.
http://www.fpdf.org
EDIT:
Thanks for the addition to my post in the comments to Pierre:
You can use fpdi: http://www.setasign.de/products/pdf-php-solutions/fpdi but the input pdf is just like an image.
I havent taken a look at it myself so far but this might help.
As far as .doc files go how about trying OpenOffice/LibreOffice, something like:
lowriter -convert-to html doc_file.doc –
As far as PDF goes, if the PDF is a graphical representation of text then you're out of luck, best you can do is try convert it to an image with ImageMagick, if it is a proper text it should easily convert.
There are various tools out there already to do this, such as http://dag.wieers.com/home-made/unoconv/, http://www.phpdocx.com/ (which you've already tried)
http://www.phplivedocx.org/2009/08/13/convert-docx-doc-rtf-to-html-in-php/ looks promising.
Or, you could install a portable version of libreoffice on your server which allows command line conversion
https://help.libreoffice.org/Common/Starting_the_Software_With_Parameters
I'm sure there'll be tutorials out there (on libreoffice support area)
To easily convert pdf to html, I would suggest pdf2htmlEX which produces outstanding HTML and is fast enough for runtime converting. You should first put some effort to optimize and build it for your system. There is simple build howto included on the project link.

How do I read Word, Excell, and PDF docs in PHP?

I need to be able to read the text of many different file types in PHP, including .doc, .docx, excel, and PDF files. I found a few methods online that require installing multiple packages but I was wondering if there was a better way to do this?
No matter which way you swing it, there is no way to kill all these birds with one stone.
Word Thread:
Reading/Writing a MS Word file in PHP
Excel Thread:
Reading an Excel file in PHP
PDF Thread:
Read pdf files with php
office 2007 is very easy, just need to unzip them and read the xml files, older versions of office and pdf will need extra packages.
I don't think there is native support for reading documents with PHP. Installing these packages is the only choice. :-)
maybe this URL can help you:
https://github.com/PHPOffice
where have:
-PhPWord,
-PhpSpreadsheet(instead of PhPExcel)
...

Categories