Manipulating Microsoft Word Office 2007 .docx document from PHP - php

I need an option from within PHP to Manipulate .docx (Microsoft Office 2007) document.
I need to:
Read the internal text
Convert to .html
To view them inside a browser.
To replace text.
I know I can use Word Automation, creating a COM object of Microsoft Word, but it's too slow, unstable and I have to have it installed on the server.
Is there any library or code that can do it from PHP?

There is PHPWord for that by the authors of PHPExcel.

Docx is just a ZIP file containing multiple XML files and embedded media files like images. Because of this, you can read and edit the document with ease. Just unzip it, open word/document.xml, do reading & writing, and repack the files.
Convet to HTML may be difficult. But you'll find a thumbnail of the first page in docProps/thumbnail.jpeg.
Note that you'll have to familiarize yourself with the XML structure to do any complex edits. There's a summary XML docProps/app.xml which has some metadata for the file so don't forget to update it. Read more from Wikipedia: http://en.wikipedia.org/wiki/Office_Open_XML

You may have a look at PHPDocX I believe it does all you are asking for.
You may replace variables in a template or just plain text from a prexisting Word document.
It offers quite a few conversion options.
You can also extract the text.

You can work with the internal format directly.
DOCX is just a zip file, and inside that there's word/document.xml containing the actual document.
It's quite trivial to unzip the file, read document.xml, str_replace() what you're looking for, save it and re-zip the directory, and it makes for a lightweight, quick and easy mail merge capability for word documents. This also works for other office formats.
Here's the official docs on the internal structure for more information.

There is also a PHP class for merging new content into an existing .docx file. It is available here: http://www.tinybutstrong.com/ . The documentation is pretty good as well as having many examples and it is all free and open source. It does require familiarity with the .docx concepts, though.

Related

Create Word Document from PHP Documentation

To document my code I thought it would be best practice to use phpDoc syntax, because there are several parsers out there and some IDEs create IntelliSense out of it.
Now I need to put the documentation (API) into a word file, but I don't know which parser is able to output .doc or similar.
I tried DoxyGen, which outputs .rtf and phpDocumentor2, which can only export to .html and .xml (?).
Is there a way to generate a .doc(x) file from phpDoc? Or a simple way to get a document which can be imported to word?
I would appreciate if I don't have to change the phpDoc syntax, because my documentation is very long.
Edit: The prefered parser would be phpDocumentor2, because it supports PHP 5.3 functionalities and it's faster than DoxyGen, but phpDocumentor2 has less features than phpDocumentor, which is no longer maintained, related to output formats.
Edit: I tried to copy content from the .rtf file into the .docx file, but when I select 'Use Destination Styles', both Word instances suspend and do not respond.
Presumably you want one large Word doc that contains all the info for your project in the one doc/file... therefore just opening the phpDoc2 HTML output into Word in order to convert it to docx will not meet your need, since that would be one docx per phpdoc2 HTML page.
You might try altering your searches to be for a tool that can spider a given HTML page, recursively pick up all its target page hierarchy, and convert it all into a single docx. You might have more luck finding a tool that does this but produces a PDF... then you could just use Word to convert the PDF into docx.

word document {tokens} replacement using PHP

I am trying to read a .doc file and find tokens like {name}, {phone}, {address} etc. now display tokens with text box and allow user to replace by inserting original data. so that .doc file will replace with actual data.how to do this using php? the color, fonts, and style of .doc should not be changed.
thanks....
This will be very tricky if you are using the old style Word documents. The new Word documents are saved in a some sort of Zip archive and therefore are much easier to edit.
You can extract this files and with some knowledge of the contents and Word WSDL you can edit the contents of the file.
Much easier is to make use of the PHPDocX Library. We are using it in a project and works like a charm. Only disadvantage is that it only works with .docx files.

Is it possible to extract Meta information from MS office files and/or PDFs with PHP?

So I have files....
.doc
.docx
.xls
.xlsx
and .pdf
that are on the my server.
Is it possible (and if it is, how) to extract the meta data from those files using PHP?
I'm looking for things like Author, keywords, title, etc...
In office documents it's the information stored along with the document properties (File...Properties...Summary for 2003, Prepare...Properties for 2007).
In PDFs it's information found in Document Properties.
This is not on a Windows server.
I have managed to extract a lot of Meta information using XPDF on a linux system a few years back. Nowadays, though, I would say Zend_PDF is your best bet. Haven't used it myself but looks good and promises everything you need. Seems to have no library dependencies, either.
For Word .DOCs, if you don't find a better way, plug into an OpenOffice server instance / command line and convert the files to ODT, which is XML and parseable. If it's not possible to extract the meta data per Macro - it should be, but I don't know how much work it is. This OpenOffice Forum entry gives a ton of starting points for automated conversion.
The ...X formats are some sort of XML, so it should be easily possible to fetch the meta data from them. Alternatively, you should be able to use OpenOffice's conversion filters here as well, if they transport the meta data.

How can i convert a php page into .doc file with php

Recently i worked in a project. On this project I need convert page into a Microsoft word document (.doc file) and offer the document for download, all using PHP. But I can't solve this problem.
Please help me. Thank You very much, Arif
This is not easy to solve.
First off, if you want to write real word documents, you will have to do on Windows. You can use COM to talk to Word and this is how you manage to get good results. I've tried all the unix/linux based solutions and the results were not so great.
Otherwise, I'd suggest you write RTF -- which is just as good. And in the end, you can call the .rtf-file, .doc and no one will notice it. RTF has a couple limitations (formatting), but on the flipside -- it's all ASCII and the RTF standard is pretty comprehensive and well documented.
There's a class which does it pretty nicely -- phpLiveDocx (this is a great introduction). And this class also claims to write PDF and DOC -- but I haven't tried those yet. I use another solution for PDF.
I would recommend using the RTF format instead of the .doc - it's much simpler to write to, and all text editors understand it. Similar recommendation for .csv when you want to output an Excel file.
Perhaps not the answer you seek, but still interesting to note, there is a open source word processor out there called abiword that has a CLI (Command Line Interface). You can use it to easily convert between document formats. I know that at least one website uses it to convert text files into various formats.
It is actively getting developed and could easily be used as a 3de party black box solution to converting documents server side.
Here is a blog from one of the developers on how to integrate it with PHP
Server-Side AbiWord
abiword home page

Extract hyperlinks from .doc

Is there any way to extract hyperlinks from .doc. I got bunch of hyperlinks in doc that I need to import in my database.
I have tried converting doc to HTML, but hyperlinks are not transferred.
Regardz,
Mladen
We had a similar issue and ended up using a third party component called Aspose.Words.
You can find it here: http://www.aspose.com
It's available for .NET and Java.
You could try importing the file into OpenOffice and see whether hyperlinks are transferred. OpenDocument is just a ZIP file with XML inside, very easy to parse once you've got the hang of it.
I have done the following thing. I have opened the .doc file with officeXP, then published it as a blog and after that I have saved that blog in the form of filtered web page. That gives you nice HTML which you can parse with ease.
I realise this is some months after your initial question, however, You can also extract hyperlinks in a .doc file through through Word Automation. There are hyperlink objects in the API that you can easily extract.

Categories