I need to be able to read the text of many different file types in PHP, including .doc, .docx, excel, and PDF files. I found a few methods online that require installing multiple packages but I was wondering if there was a better way to do this?
No matter which way you swing it, there is no way to kill all these birds with one stone.
Word Thread:
Reading/Writing a MS Word file in PHP
Excel Thread:
Reading an Excel file in PHP
PDF Thread:
Read pdf files with php
office 2007 is very easy, just need to unzip them and read the xml files, older versions of office and pdf will need extra packages.
I don't think there is native support for reading documents with PHP. Installing these packages is the only choice. :-)
maybe this URL can help you:
https://github.com/PHPOffice
where have:
-PhPWord,
-PhpSpreadsheet(instead of PhPExcel)
...
Related
There are so many excel reader libraries for php with needs zip extension enabled. With out any such library,
1.can we read an excel file with pure php?
2. How much effort it needs?
Thanks /G
http://phpexcel.codeplex.com/
The effort depends on what you want to do :-)
I am having problems finding a solution to convert Powerpoint documents to JPEG.
Imagick is not able to handle .PPT
So I used unoconv, which handles .ppt files, but only up to ppt.(97/2003/XP) not 2007 altough it say input yes. It tells me that it cannot handle the source.
Is there a commandline solution or library, that is able to do that ?
PS: unoconv is based on openoffice lib.
Thanks in advance!
First try using unoconv v0.6, and make sure you read the README which has lots of tips to troubleshoot problems. Often issues are caused because e.g. not all required packages have been installed for a specific document format, or an existing office process is causing errors.
The troubleshooting section should be the first thing to look at:
https://github.com/dagwieers/unoconv#troubleshooting-instructions
You could try JODConverter and a recent version of OpenOffice.
I have relatively sensitive data in .docx, .xlsx and PDF files that all need to be converted to a single PDF file locally. Sending these files off to phpdocx or Google Docs or anything like this is not an option.
The only other option I am seeing is OpenOffice / LibreOffice but I am not satisfied with how they are converting the documents.
Is there any other alternative anyone is aware of? Thanks!
Definitely a difficult task. The very recent release of LibreOffice 3.6 has fixes to it's docx processing if that might help, but you haven't specified what the actual problems you encountered when you tried OpenOffice.
If you have time to experiment (and bring in any tools/languages you need to get the job done) you could try LibreOffice to produce PDFS, then use one of the many PDF libs to stitch the PDFs into the single file you require.
You could also look at ODFConverter which has traditionally been much better with DOCX than either OpenOffice or LibreOffice. This would allow you docx -> odt -> pdf. I think it can do the xlsx also. Then do the PDF stitching again.
I suggest testing the stages manually at first and if promising, try something like JODConverter (requires Java) to allow you to automate the process via scripts.
Good luck.
I develop webapp with PHP5 to read or to actualize xlsm files.
I have tried with PHPEXCEL, but this library not supported xlsm-files format.
Everything what I need, is open the file, write the data in there and stores as a xlsm-file.
The macro should not be changed.
It is important that the file returns which contains the macros because this file it is used daily to do several import. I may not change the file-format.
if somebody has tips or tutor's courses for this specific task has, please writes to me.
Thanks 4 your help
If you need to retain macros from an Excel template, then you'll need to use something like COM because there aren't any other libraries that handle macros from PHP
xslm files are actually ZIP files with XML documents and other assets inside them. PHPEXCEL and other similar MS Office file format readers and writers only read the older binary-blob formats, not the newer Office Open XML formats.
Try using ZipArchive to open the file in PHP, and one of the PHP XML libraries to read the xml inside the file. As long as you don't alter the macros, the macros will be preserved.
However, if you actually need to execute the macros, you need a full Office runtime. In this case you must use COM on windows with a copy of Office to run the file.
I have a module which merges a document from database records and .docx or .odt document model.
I have to output .docx, .odt or .pdf. For outputting to Microsoft and Open formats, there is no problem, all works properly.
But what I want to know is, can I output to a format (like XML or HTML) which I can use to subsequently build a PDF document?
If I can't, are there any libraries which provide a merge document capability like:
DOCX (or ODT) + database record => PDF
And I don't want to use phplivedocx.
I successfully put a portable version of libreoffice on my host's webserver, which I call with PHP to do a commandline conversion from .docx, etc. to pdf. on the fly. I do not have admin rights on my host's webserver. Here is my blog post of what I did:
http://geekswithblogs.net/robertphyatt/archive/2011/11/19/converting-.docx-to-pdf-or-.doc-to-pdf-or-.doc.aspx
Yay! Convert directly from .docx or .odt to .pdf using PHP with LibreOffice (OpenOffice's successor)!
I don't know any PHP library that does DOCX => PDF. In fact, the DOCX conversion to something else in PHP is an opened problem today. This is independent from how you made the DOCX.
But as you said, they are PHP libraries for HTML => PDF.
Html2Pdf is a well reputed PHP library that does HTML => PDF.
There is also DomPdf.
So if you can found a PHP library for DOCX => HTML, then it would work.
Of course it has some limitations because even if both PDF and DOCX are opened format, they have very specific features, they need huge rendering process, and the editors keep some good tips for them.
Converting DOCX to HTML is theoretically possible. There is a Windows software that does it by EpingSoft. If you need to do it in PHP, some web articles tell you how to make it, but since I cannot found any PHP code doing this, I guess it is more theoretical than practical.
http://www.quepublishing.com/articles/article.aspx?p=691502
How complicated that process would be
depends on how much of Word's native
formatting you need to preserve during
the conversion.
If you want to try this way, it's good to know that OpenTBS enables you to read the XML before and after the merge. It is based on a PHP class names TbsZip that can read any XML file in the DOCX since it's in fact a zip archive.
There is also posible to use PDF files directly in TBS after decompressing:
qpdf --qdf --object-streams=disable in.pdf out.pdf