Word XML to HTML (with styling) - php

I have a word template (msword 2010) that I inject variables into using PHPWord, and would like to convert that into a PDF.
My thought process is to convert the word document into xml (which I have done), then turn that xml into styled html.
So far I have managed to replace the xml elements that represent line breaks and paragraphs, but am wondering if there is some code somewhere that will convert the other xml elements into styled html. I know it is unlikely to be perfect, but something close would be good.

Your best bet is to use XSLT. There are some good tutorials on the web. This page gives the code for doing this in PHP.

Related

Parse HTML and replace content in DIV

I want to know how i can find the DIV tag in a HTML page. This is because i want to replace the links inside that DIV with different links. I do not understand what exact code i require.
First, notice that PHP won't do anything client side. But you should already know it.
you should use file_get_contents to read the webpage as a string (or what is provided by a library for html parsing).
There is already a question that explain how to parse html in any way: Robust and Mature HTML Parser for PHP
If it doesn't fit your needs, try searching it on google: php html parsing, I found some libraries
For example this library I've found allows you to find all tags: http://simplehtmldom.sourceforge.net/
Notice that this is not a great approach and I suggest you change your html page to be a PHP page, and insert some code in place of A tags. This will make everything easier.
Last thing, if the html page is static (it doesn't change), you can use easily line counting to get contents from X line to Y line, put your customized A-tags and then read from J to the end of file.
Good luck anyway.

How to "read" a HTML document in PHP?

I'm facing a problem for a quite long time. Unfortunately I was not able to find the solution by my own, so I have to post my question here.
I am writting a little php script that creates a PDF file from a dynamically created HTML file.
Now I want to "parse" the html file and do a action in addiction to which tag is next in HTML.
E.g.
<div><p>Test</p></div>
My script should recognize:
First tag is a div: do function for div
Second tag is a p: do function for p
I don't know for what I should search. Regular expressions? HTML parser?
Thanks for a hint!
Try an XML parser. In PHP the SimpleXML is probably what you are looking for.
I've used several times phpQuery. That's a nice solution, although it's quite big and seems that is no longer supported (last commit > 10 months).
What you need to do is read the HTML file into a PHP variable/object
http://www.php-mysql-tutorial.com/wikis/php-tutorial/read-html-files-using-php.aspx
And then use RegEx to parse the HTML Tags and Attributes
http://www.codeproject.com/Articles/297056/Most-Important-Regular-Expression-for-parsing-HTML

Which is the best option, SimpleXml or XML Parser in PHP?

I have gone through the Stack Overflow post "Best XML Parser for PHP".
For the same question. It is mentioned that, if I need to manipulate XML files then go for DOM XML. My requirements are:
I have saved navigation in database. It is an HTML string.
I want to remove some pages or say li tags wrapping pages that user don't want to exist in his/her page. After removing the unwanted li's, I want to save the whole string back to the database.
The same navigation will be used on another page. But, the HTML will be different. It will be similar, with the ul and li, but I need to add some more divs and spans to it.
The navigation will be edited on this page and on each change (e.g. Changing page title, deleting a node/page, moving under another page as child.) a Ajax call will save the changes to another table in database.
Using the new structure, again build the navigation, which will be updated in the first navigation table.
Which will be best option?
For your use case, using XMLParser does not make much sense since you want to save back the whole file after modifying it.
SimpleXML lets you do that much easier (saveXML() method) - with XMLReader you would have to generate the XML on your own during parsing.
I'd recommend SimpleXML.
My personal preference is XML Parser but I think that the library you choose probably will not affect the solution that much. The usages of any XML manipulation library is probably largely the same.
For larger files you will want to use XML Parser because Simple XML will load the entire XML into memory to parse it whereas XML parser is stream based.

HTML Bullet points to Word XML

I need to take user input from a web page and write it to a Word document.
Im using a WYSİWYG Editor that allows bullet points with the output being a html list.
I need to then convert that to Word XML.
Any suggestions?
I have the syntax/structure for the XML bullet lists but I need to convert the html list to the XML bullet point list.
Maybe preg_replace? I'm not 100% how to do that though
If you need to write an actual word document (.docx) then a library like phpdocx or PHPWord should be able to do that for you.
Alternatively, Word is quite capable of reading HTML files.

Whats the best way to pass html embed code via rss feed to a rss parser in php?

Im trying to put an html embed code for a flash video into the rss feed, which will then be parser by a parser (magpie) on my other site. How should I encode the embed code on one side, and then decode it on the other so I can insert clean html into the DB on the receiving server?
Since RSS is XML, you might want to check out CDATA, which I believe is valid in the various RSS specs.
<summary><![CDATA[Data Here]]>
Here's the w3schools entry on it: http://www.w3schools.com/XML/xml_cdata.asp
htmlencode/htmldecode should do the trick.
Ive been using htmlentities/html_entity_decode but for some reason it doesnt work with the parser. In a normal test it works, but parser always returns html code without < > " characters.
RSS is XML. It has very specific rules for encoding HTML. If you're generating it, I'd recommend using an xml library to write the node containing HTML, to be sure you get the encoding right.
HTMLencode will only perform the escaping necessary for embedding data within HTML, XML rules are more strict.
Instead of writing your own RSS XML feed, consider using the Django syndication framework from django.contrib.syndication:
https://docs.djangoproject.com/en/dev/ref/contrib/syndication/
It also supports enclosures, which is the RSS way for embedding images or video.
For custom tags, there is also an lowlevel API which allows you to change the XML:
https://docs.djangoproject.com/en/dev/ref/contrib/syndication/#the-low-level-framework

Categories