Is it possible to read a text file with PHP with styles?
My client doesn't want to write any code (even [b][/b]) and he has to send those text files to some translators to translate them into 4 languages.
Then i have to post them on a site. They are very large texts and i was wondering how can i deal with this to keep the format without having to read and format all of them with BBcode or HTML code directly (as they are updated very often with some changes)
I see 2 possible answers :
Strip all tags, send texts to translator and re-add style formatting tags (see strip_tags())
Write a script that converts texts into an editable file format like .docx or .odt and reverse the process when texts comes back. (there are some PHP libraries that can do that)
Related
I was wondering is there a way to convert a document(doc or docx), which contains images and text into a markdown.
Ex: Document contains an Image and description for that image
I was trying to convert that document into a markdown such as follows
<img src="doument_name/media/image1.png" width="624" height="505" />
Followed by description with markdown
When I search, I only found Markdown parser's, converters which convert text data into HTML
Doc and Docx are complex proprietary formats, and markdown is not widely used. Converting one to the other directly will be difficult. It's better to use HTML as the intermediate step.
There are many PHP solutions out there to read MS-docs, but, as you perhaps have found, they're all slightly flawed. They also don't do the conversion to HTML, they read, but don't convert, or they don't include images, etc.
As an alternative you could try an online API, like:
http://apiv2.online-convert.com/
I haven't tested this, but it could be a good solution: It converts to HTML, and you have to write, and maintain, very little code yourself.
The conversion from HTML to markdown is relatively easy, you can find examples online:
https://github.com/thephpleague/html-to-markdown
https://github.com/Elephant418/Markdownify
I am still new to programmer but i am doing just good.
This is what i attempt to achieve, get a template in word doc with some texts and blank spaces, then populate those spaces with variables coming from a form field processed in php.
e.g
**in Microsoft word file i have:**
Name:
Age:
Sex:
**Then in my php file**
I have
$name;
$age;
$sex;
so i want to populate the word doc with the variables from the processed form and also be able to download the form already populated in same .doc and also PDF format
I understand that the document needs to be downloaded from a web application and probably needs no extensive customising by the user. In that case, I would recommend using docx format instead of doc, or RTF.
In RTF you will have problems making tables look good, dito headings/footers, but it is further rather easy and portable to generate. If you choose docx, you can generate the document as XML and compress. There are various libraries for this purpose.
Since you indicate you will also need a PDF representation, I would recommend choosing a library that generates to DOCX as well as RTF and PDF. For Java, my advice would be JasperReports. For PHP, being not an enterprise platform (end of flame war), you might consider alternatives such as http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2009/12/18/7923.aspx or LiveDOCX.
If you choose to generate the two output types through different means (introducing more maintenance and possible future bugs), I would recommend generating RTF and publishing the RTF as PDF, for instance as described in rtf format to pdf
I need to open uploaded .docx file and possibility to change values. I know, that docx file consists of xml files. So, the main question is maybe somebody know a good WYSIWYG web-based xml editor?
I know one called XOPUS, but i have no idea how to configure it. Maybe somebody knows other alternatives for that task or advices, how to put xml file to textfield, where i could change values.
There are a couple of PHP toolkits that you can use for this task, first off there's an early dev on on codeplex:
http://openxmlapi.codeplex.com/
However you may b better off with one of the more mature ones:
http://holloway.co.nz/docvert/index.html
http://www.phpdocx.com/
Both of these can convert from docx to most of the popular formats, HTML included.
Once you've converted to something like HTML, then you can use an onscreen editor such as tiny MCE:
http://www.tinymce.com/
To provide in page rich editing capabilities, before finally using the above toolkits to convert back to DOCX or any other applicable format.
Update February 2014
Since I first wrote this reply things have moved on. The open xml kits I mentioned above are still valid, however in page editing is now more of a possibility than ever using the new HTML5 content editable and edit mode attributes.
It's now insanely easy to add your own buttons (Using something like bootstrap) above a div that has a content editable attribute attached to it.
Connecting the buttons to "document.execCommand" can then send, bold, italic, underline, link & image creation, list insertion and all manner of other HTML constructions methods directly to this div without needing anything like tinyMce or another in page editor anymore.
There is full details available on the Mozilla developer network, and I am planning o do a blog post on using this stuff very soon.
Have you tried PHPWord?
One may use the DocxUtilities class of PHPDocX to do some partial editing of an existing Word document.
This class allows you for:
searching and replacing a particular string of text
searching a string of text and remove the containing paragraph or section
highlight predefined strings (search and highlight)
full merging of docx files (text, images, charts, footnotes, ...)
If that is not enough for your purposes you should then prepare a PHPDocX template to fully customized an existing Word document.
I am trying to parse a word document file. I upload the using PHP then I am trying to get contents using file_get_contents(); function but the problem is when its displayed in front end a lots of garbage code in there like
Æ�Ѐ¤d�¤d�[$\$gd®l±����„h¤d�¤d�[$\$^„hgd®l±���
&�F�¤d�¤d�[$\$gd3¡���gd3¡����„,¤d�¤d�[$\$^„,gd(E����¤d�¤d�[$\$gdÿ/��<��C��D��I��Å������O��P��‚��¡��¢��¬����®��Ù��ã��ó��ô�����
So my question is how can I clean up this text?
Maybe give this a shot? http://www.phpclasses.org/package/3553-PHP-Edit-Microsoft-Word-documents-using-COM-objects.html
Word documents (like docx and doc) are not straight text files - they are actually proprietary file types that do not just have the text from byte 0 - this is how they have fancy formatting and fonts. .docx files are actually archives (.zip files) that contain a myriad of XML and styles.
Your best bet is to use a text input form, or find code online that allows you to extract just the text. Or, download the doc files to your own computer and use your own copy of MS word to open it.
I have a script that takes the articles out of the database and places them in a .txt file, but i would like to place them in a .rtf format, is there a way to convert or compile a .rtf file?
Thank in advance.
RTF is just a text format (kind of like html), so yes, you can convert it easily enough by opening the file, inserting the right rtf codes (basically, just a header), and saving it as a .rtf file. Google for the rtf spec. What would get complicated is things like making headlines bold, etc.
Seems to be a lot of information on this on Google.
http://lab.artlung.com/rtf/
Seems to possibly be what you are looking for. I found that link through this forum thread.