Read complete content of .rtf document - php

I have to create a document using user inserted data and including data from a .rtf document into a web page layout i created (HTML+CSS and PHP for scripting).
My problem is, i can't find any way to obtain the full content of the .rtf document.
Being a technical document symbols, tables, graphs and images are very often included: with the methods I've found i could obtain the text with symbols in a decent formatting but i had no luck with images.
So what i need is a way to obtain the full content of a .rtf file, possibly maintaining the document formatting, so i can display and organize it in a webpage; preferrably in pure PHP but use of js/executables via php is fine.
I've tried:
-rtf to html converters but the best i could get is clear text and symbols but no images;
using COM extension to open the .rtf in ms word and saving it as .html (i noticed that if i open up the .rtf then save it as webpage in word it creates a perfect html page) but it only changed the extension and didn't create a html page;
extracting text and image sperately: works but again being the document a technical document image placement is very important.
It's my first question here, after many research; please bear with me in case of errors.

Related

Opening HTML as Word or PDF using PHP/Laravel

I'm a little confused about PHPWord w/ DomPDF. If I generate HTML using Laravel/PHP/Blade, store it in a text file saving it with .html extension, and open it with Word (open Word, select File open, etc), I see the page as I would like to see it in Word.
But if I use PHPWord to write the HTML to a file and then open it as Word2007 for download, PHPWord strips out all the CSS etc. And it looks horrible.
Also while dompdf with the help of PHPWord (again writing the HTML to a file and being read as PDF) the PDF looks much better than the generated Word version but still has no styling.
I have reviewed the samples for each . . . all the articles I can find on here etc. Is there another way to do what I do manually--open the html in Word--for a download with PHP?
I cannot recode the blade template with the PHPWord tagging. But if the Windows laptop Word app can open my HTML and save it as a Word file just fine I would think I could do this relatively easily with PHP programmatically somehow.

Extracting a String from .doc or .docx file, removing that string, and saving file again in origin format using php

I have an invoice.doc file and want to extract a customer email address, remove it from the doc file, add a company logo on top right, and save the file in original format using php.
MS Word saves its files in a compressed format, so you won't be able to see or edit the contents without decompressing it first. If you pop it open with a regular text editor you will know what I mean.
Your best shot would probably be to use PHPWord.
Take a look at it here: http://phpword.codeplex.com/
For old .DOC documents, to extract the e-mail you could use AntiWord. To alter the document is different story tho. Perhaps using ActiveX if you are on Windows with MS Office installed.
For new .DOCX format you do have some options, because basically the document is just a zipped XML file.

edit uploaded docx file in PHP

I need to open uploaded .docx file and possibility to change values. I know, that docx file consists of xml files. So, the main question is maybe somebody know a good WYSIWYG web-based xml editor?
I know one called XOPUS, but i have no idea how to configure it. Maybe somebody knows other alternatives for that task or advices, how to put xml file to textfield, where i could change values.
There are a couple of PHP toolkits that you can use for this task, first off there's an early dev on on codeplex:
http://openxmlapi.codeplex.com/
However you may b better off with one of the more mature ones:
http://holloway.co.nz/docvert/index.html
http://www.phpdocx.com/
Both of these can convert from docx to most of the popular formats, HTML included.
Once you've converted to something like HTML, then you can use an onscreen editor such as tiny MCE:
http://www.tinymce.com/
To provide in page rich editing capabilities, before finally using the above toolkits to convert back to DOCX or any other applicable format.
Update February 2014
Since I first wrote this reply things have moved on. The open xml kits I mentioned above are still valid, however in page editing is now more of a possibility than ever using the new HTML5 content editable and edit mode attributes.
It's now insanely easy to add your own buttons (Using something like bootstrap) above a div that has a content editable attribute attached to it.
Connecting the buttons to "document.execCommand" can then send, bold, italic, underline, link & image creation, list insertion and all manner of other HTML constructions methods directly to this div without needing anything like tinyMce or another in page editor anymore.
There is full details available on the Mozilla developer network, and I am planning o do a blog post on using this stuff very soon.
Have you tried PHPWord?
One may use the DocxUtilities class of PHPDocX to do some partial editing of an existing Word document.
This class allows you for:
searching and replacing a particular string of text
searching a string of text and remove the containing paragraph or section
highlight predefined strings (search and highlight)
full merging of docx files (text, images, charts, footnotes, ...)
If that is not enough for your purposes you should then prepare a PHPDocX template to fully customized an existing Word document.

word document {tokens} replacement using PHP

I am trying to read a .doc file and find tokens like {name}, {phone}, {address} etc. now display tokens with text box and allow user to replace by inserting original data. so that .doc file will replace with actual data.how to do this using php? the color, fonts, and style of .doc should not be changed.
thanks....
This will be very tricky if you are using the old style Word documents. The new Word documents are saved in a some sort of Zip archive and therefore are much easier to edit.
You can extract this files and with some knowledge of the contents and Word WSDL you can edit the contents of the file.
Much easier is to make use of the PHPDocX Library. We are using it in a project and works like a charm. Only disadvantage is that it only works with .docx files.

How to clean up garbage text from string using PHP?

I am trying to parse a word document file. I upload the using PHP then I am trying to get contents using file_get_contents(); function but the problem is when its displayed in front end a lots of garbage code in there like
Æ�Ѐ¤d�¤d�[$\$gd®l±����„h¤d�¤d�[$\$^„hgd®l±���
&�F�¤d�¤d�[$\$gd3¡���gd3¡����„,¤d�¤d�[$\$^„,gd(E����¤d�¤d�[$\$gdÿ/��<��C��D��I��Å������O��P��‚��¡��¢��¬��­��®��Ù��ã��ó��ô�����
So my question is how can I clean up this text?
Maybe give this a shot? http://www.phpclasses.org/package/3553-PHP-Edit-Microsoft-Word-documents-using-COM-objects.html
Word documents (like docx and doc) are not straight text files - they are actually proprietary file types that do not just have the text from byte 0 - this is how they have fancy formatting and fonts. .docx files are actually archives (.zip files) that contain a myriad of XML and styles.
Your best bet is to use a text input form, or find code online that allows you to extract just the text. Or, download the doc files to your own computer and use your own copy of MS word to open it.

Categories