HTML Bullet points to Word XML - php

I need to take user input from a web page and write it to a Word document.
Im using a WYSİWYG Editor that allows bullet points with the output being a html list.
I need to then convert that to Word XML.
Any suggestions?
I have the syntax/structure for the XML bullet lists but I need to convert the html list to the XML bullet point list.
Maybe preg_replace? I'm not 100% how to do that though

If you need to write an actual word document (.docx) then a library like phpdocx or PHPWord should be able to do that for you.
Alternatively, Word is quite capable of reading HTML files.

Related

jBBCode - BBcode to html

What I'm trying to do
I'm trying to use jBBCode to parse and de-parse bbcode into and from html.
The problem
When trying to get the html and turn that back in to bbcode, it just displays html.
Here is the code I'm using to try and switch html back into bbcode.
$parser = new JBBCode\Parser();
$parser->loadDefaultCodes();
$parser->parse($MYHTMLSTRING);
echo $parser->getAsBBCode();
Does anyone know what I'm doing wrong here? I'm sure it's something very simple that I haven't figured out. Any help is appreciated! :D
$parser->parse() takes as input BBCode, not HTML.
After studying the documentation, it is my understanding that this is a one-way parser:
BBCode -> HTML
I believe the design is for you to store the BBCode in your database, and then when it is time to render HTML to visitors, you parse the BBCode at that time.
This way, you are always storing the raw, editable BBCode in the database.
This is a pretty common design-pattern. For example, for applications that use the Markdown Language (instead of BBCode), they typically store raw markdown in the database and only render it to HTML at page-load time.
In Summary:
Store Raw BBCode/Text in your database
when you render a page to a visitor, you do your conversion at that time (to HTML).
$parser->parse($MyBBCode); echo $parser->getAsHTML();
If the user edits the BBCode, save that straight back to DB as BB Code
Documentation Reference

Word XML to HTML (with styling)

I have a word template (msword 2010) that I inject variables into using PHPWord, and would like to convert that into a PDF.
My thought process is to convert the word document into xml (which I have done), then turn that xml into styled html.
So far I have managed to replace the xml elements that represent line breaks and paragraphs, but am wondering if there is some code somewhere that will convert the other xml elements into styled html. I know it is unlikely to be perfect, but something close would be good.
Your best bet is to use XSLT. There are some good tutorials on the web. This page gives the code for doing this in PHP.

How to "read" a HTML document in PHP?

I'm facing a problem for a quite long time. Unfortunately I was not able to find the solution by my own, so I have to post my question here.
I am writting a little php script that creates a PDF file from a dynamically created HTML file.
Now I want to "parse" the html file and do a action in addiction to which tag is next in HTML.
E.g.
<div><p>Test</p></div>
My script should recognize:
First tag is a div: do function for div
Second tag is a p: do function for p
I don't know for what I should search. Regular expressions? HTML parser?
Thanks for a hint!
Try an XML parser. In PHP the SimpleXML is probably what you are looking for.
I've used several times phpQuery. That's a nice solution, although it's quite big and seems that is no longer supported (last commit > 10 months).
What you need to do is read the HTML file into a PHP variable/object
http://www.php-mysql-tutorial.com/wikis/php-tutorial/read-html-files-using-php.aspx
And then use RegEx to parse the HTML Tags and Attributes
http://www.codeproject.com/Articles/297056/Most-Important-Regular-Expression-for-parsing-HTML

Extract all text from a HTML page without losing context

For a translation program I am trying to get a 95% accurate text from a HTML file in order to translate the sentences and links.
For example:
<div>Overflow <span>Texts <b>go</b> here</span></div>
Should give me 2 results to translate:
Overflow
Texts <b>go</b> here
Any suggestions or commercial packages available for this problem?
I'm not exactly sure what you're asking, but look at simplehtmldom. Specifically the "Extract Contents from HTML" tab under quick start on that front page (can't link directly, sigh). With that you can extract the text of a website without all those pesky tags.

Is there a way to decode html e-mails?

I am writing support software and I figured for highlighting stuff it would be great to have HTML support.
Looking at Outlooks "HTML" I want to crawl up into the fetal position and cry!
Is there a php class to unscramble HTML emails to support basic HTML? I don't want to display the E-Mails in a frame because I want to work with the data and analyse it. I also don't want to support stupid things like changing font since its a webapp I want my webapp to say what the font is and not have some hippie who sends the support team e-mails in comic sans and yellow color. I want to support bold, italic, underlined, streched out and lists (http://dl.getdropbox.com/u/5910/Jing/2009-02-23_2100.png).
I also don't quite know the difference between rich-text and html since I always thought rich-text only allowed the functions I wanted but I seem to be able to do everything in rich-text which I can do in Html.
Also I should add I am using the Zend Framework because of the fabulous Zend_Mail
You can pipe it through htmltidy and then further filter it with something like HtmlPurifier, but of course you may strip out something that is essential to understanding the contents. That's the problem with a visual format, like html.
You can use PHP's strip_tags() function, and it's optional "allowable_tags" parameter. This will allow you to strip out all the tags that are not <em> <b> <strong> <u> etc.
About RTF vs. HTML, my understanding is that when Outlook and Exchange communicate with non-RTF compliant systems they convert RTF to HTML. I'm not sure this is always true, or how consistent that function is, but that might explain why messages sent RTF appear to be HTML.
I'm pretty sure you'll have to write your own class... there is no real class like that in the PHP documents I've seen..
Or you could use the plain-text variant attached to the e-mail. If there is no plain-text variant you could use a stripped version of the html. I think using these steps you would have a nice result:
Remove newlines
Turn </p> and <br/> into newline
Strip all html tags
Pulling out the HTML from an Outlook mail may seem scary at first, but it's only HTML tags - just a whole lot of them!
So if you just locate to a "<" and then find the next ">" you have a tag. If it is not something you want to have, like "</strong>" just throw it away and repeat Simple as that.
(I have done exactly this in a spelling and grammar checker which not only pulls out plain text from Outlook and checks it - it can then push all the user's changes back into the HTML without destroying any tags. The latter was not easy, though! ;-)

Categories