word document {tokens} replacement using PHP - php

I am trying to read a .doc file and find tokens like {name}, {phone}, {address} etc. now display tokens with text box and allow user to replace by inserting original data. so that .doc file will replace with actual data.how to do this using php? the color, fonts, and style of .doc should not be changed.
thanks....

This will be very tricky if you are using the old style Word documents. The new Word documents are saved in a some sort of Zip archive and therefore are much easier to edit.
You can extract this files and with some knowledge of the contents and Word WSDL you can edit the contents of the file.
Much easier is to make use of the PHPDocX Library. We are using it in a project and works like a charm. Only disadvantage is that it only works with .docx files.

Related

Extracting a String from .doc or .docx file, removing that string, and saving file again in origin format using php

I have an invoice.doc file and want to extract a customer email address, remove it from the doc file, add a company logo on top right, and save the file in original format using php.
MS Word saves its files in a compressed format, so you won't be able to see or edit the contents without decompressing it first. If you pop it open with a regular text editor you will know what I mean.
Your best shot would probably be to use PHPWord.
Take a look at it here: http://phpword.codeplex.com/
For old .DOC documents, to extract the e-mail you could use AntiWord. To alter the document is different story tho. Perhaps using ActiveX if you are on Windows with MS Office installed.
For new .DOCX format you do have some options, because basically the document is just a zipped XML file.

Is there a way with PHP to access a file on a server and save only the first half of the file?

I want to give users a preview of certain files on my site and will be using scribd API. Does anyone know how I can access the full file from my server and save the file under a different name , which I will then show to users..Can't think of a way to do this with PHP for .docx and image files...Help is much appreciated.
For "splitting" images, use an image processing library like gd to crop the image (lots of examples to be found on how to do that all over the place). For Word documents, use a library like PHPWord (or one of the other myriad such libraries) to open the document, remove/extract as much text as you need, then save that into a new Word file.
For other file types, find the appropriate method that allows you to manipulate that format, then do whatever you need to do with it.

How to clean up garbage text from string using PHP?

I am trying to parse a word document file. I upload the using PHP then I am trying to get contents using file_get_contents(); function but the problem is when its displayed in front end a lots of garbage code in there like
Æ�Ѐ¤d�¤d�[$\$gd®l±����„h¤d�¤d�[$\$^„hgd®l±���
&�F�¤d�¤d�[$\$gd3¡���gd3¡����„,¤d�¤d�[$\$^„,gd(E����¤d�¤d�[$\$gdÿ/��<��C��D��I��Å������O��P��‚��¡��¢��¬��­��®��Ù��ã��ó��ô�����
So my question is how can I clean up this text?
Maybe give this a shot? http://www.phpclasses.org/package/3553-PHP-Edit-Microsoft-Word-documents-using-COM-objects.html
Word documents (like docx and doc) are not straight text files - they are actually proprietary file types that do not just have the text from byte 0 - this is how they have fancy formatting and fonts. .docx files are actually archives (.zip files) that contain a myriad of XML and styles.
Your best bet is to use a text input form, or find code online that allows you to extract just the text. Or, download the doc files to your own computer and use your own copy of MS word to open it.

Manipulating Microsoft Word Office 2007 .docx document from PHP

I need an option from within PHP to Manipulate .docx (Microsoft Office 2007) document.
I need to:
Read the internal text
Convert to .html
To view them inside a browser.
To replace text.
I know I can use Word Automation, creating a COM object of Microsoft Word, but it's too slow, unstable and I have to have it installed on the server.
Is there any library or code that can do it from PHP?
There is PHPWord for that by the authors of PHPExcel.
Docx is just a ZIP file containing multiple XML files and embedded media files like images. Because of this, you can read and edit the document with ease. Just unzip it, open word/document.xml, do reading & writing, and repack the files.
Convet to HTML may be difficult. But you'll find a thumbnail of the first page in docProps/thumbnail.jpeg.
Note that you'll have to familiarize yourself with the XML structure to do any complex edits. There's a summary XML docProps/app.xml which has some metadata for the file so don't forget to update it. Read more from Wikipedia: http://en.wikipedia.org/wiki/Office_Open_XML
You may have a look at PHPDocX I believe it does all you are asking for.
You may replace variables in a template or just plain text from a prexisting Word document.
It offers quite a few conversion options.
You can also extract the text.
You can work with the internal format directly.
DOCX is just a zip file, and inside that there's word/document.xml containing the actual document.
It's quite trivial to unzip the file, read document.xml, str_replace() what you're looking for, save it and re-zip the directory, and it makes for a lightweight, quick and easy mail merge capability for word documents. This also works for other office formats.
Here's the official docs on the internal structure for more information.
There is also a PHP class for merging new content into an existing .docx file. It is available here: http://www.tinybutstrong.com/ . The documentation is pretty good as well as having many examples and it is all free and open source. It does require familiarity with the .docx concepts, though.

is there a window program that can convert word (.doc and .docx) into text

I need a window program to convert word file (.doc) into text. Something like "anitiword" for windows.
I need it because I need to convert word file into text and use Lucence to index it and I am in a windows environment :(
Thanks for all your help!!!
Yes. That program is called MS Word.
Open the file in Word via COM, and save it as text programmatically. On the other hand, is Lucene not able to read Word documents natively?
if you really need a program, here's one. Have not tried, but you can give it a shot. Otherwise, you can just use COM / vbscript.
Using POI (http://poi.apache.org/) you should be able to index the old binary DOC formats. Relevant code snippets can be found on http://kalanir.blogspot.com/2008/08/how-to-index-microsoft-format-documents.html.
And for DOCX, since that's basically a ZIP file which contains a bunch of XML and resource files, it should be relatively easy to find the XML file containing the actual text (I think it's word/document.xml) and indexing the text contained in it (after stripping off all XML data)...
You can use the OpenXML SDK to easily strip the text out of DOCX files. Does not work with .doc though--you probably need to use MS Word and COM for that.

Categories