I've managed to successfuly integrate BBCode, but I was wondering say if I wanted to dynamically list all the allowed/accepted BBCode - how would I be able to do that? (as it can be tedious manually writing out...and if the BBCode ever changed I'd have to update the writing)
I current have a BBCode() function, which contains 2 arrays, one which contains the regex, and the other which contains the replacements (html), and then I return a preg_replace() of the regex array with the replacement (html) array.
Cheers and looking forward to your inputs!
Consider using a different markup language like Textile or Markdown. Simply saying that you support Markdown or Textile is decent enough; they're so widely used that users could easily look up the markup for them online.
Textile's syntax hasn't been updated since 2006, so it will likely remain very solid for years to come. Markdown's syntax hasn't been updated since 2004.
Both provide excellent PHP libraries:
http://michelf.com/projects/php-markdown/
http://textile.thresholdstate.com/
Related
Right now I'm get task to make generate contract letter function in HRMS.
I'm already using CKEditor but the result is very different since the purpose made CKEditor is not like Microsoft Word or Google Docs purpose.
So I'm having idea that I'm making the template first in Microsoft Word and use PHP function str_replace to passing the data into Microsoft Word template.
The question is :
1. With that flow, is it possible to do that?
2. If Question 1 is possible can you hit me with the sample?
Many Thanks,
Hendra
There are several Classes that can do at least part of what you are trying to do:
wrklst/docxmustache
openTBS – Tiny But Strong
PHPWord
docxtemplater pro (basic opensource / free version / MIT license available as of writing; image replacing is a commercial plugin)
docxpresso (commercial)
phpdocx (commercial)
The first 4 of these are at least partially open source and investigating the code will help you understand the process, which is not trivial with word. In addition you can check out http://officeopenxml.com for the format details.
The main problem I see is with proper HTML to openXML conversion. Meaning to convert the styling from CKEditor (which might be HTML) into the proper XML Styling, which functions quite differently and a direct translation is not trivial. Check out https://github.com/wrklst/docxmustache/blob/master/src/WrkLst/DocxMustache/HtmlConversion.php so see some basic HTML conversion on singular runs of bold, italic and underlined text.
To my knowledge there is no maintained open source package that delivers proper html to openxml conversion. If you need this and cannot write it yourself, you will probably go for one of the paid solutions.
Good luck.
Docx is a zipped format that contains some xml. If you want to build a simple replace {tag} by value system, it can already become complicated, because the {tag} is internally separated into <w:t>{</w:t><w:t>tag</w:t><w:t>}</w:t>. If you want to embed loops to iterate over an array, it becomes a real hassle.
source : https://docxtemplater.readthedocs.io/en/latest/goals.html
You could use the library I created in answer for this problem : https://github.com/open-xml-templating/docxtemplater , it works with JS in the browser or with node.js.
I'd like to work on a bbcode filter for a php website. (I'm using cakephp, it would be a bbcode helper)
I have some requirement.
Bbcodes can be nested. So something like that is valid.
[block]
[block]
[/block]
[block]
[block]
[/block]
[/block]
[/block]
Bbcodes can have 0 or more parameters.
Exemple:
[video: url="url", width="500", height="500"]Title[/video]
Bbcodes might have mutliple behaviours.
Let say, [url]text[/url] would be transformed to [url:url="text"]text[/url]
or the video bbcode would be able to choose between youtube, dailymotion....
I think it cover most of my needs. I alreay done something with regex. But my biggest problem was to match parameters. In fact, I got nested bbcode to work and bbcode with 0 parameters. But when I added a regex match for parameters it didn't match nested bbcode correctly.
"\[($tag)(=.*)\"\](.*)\[\/\1\]" // It wasn't .* but the non-gready matcher
I don't have the complete regex with me right now, But I had something that looked like that(above).
So is there a way to match bbcode efficiently with regex or something else.
The only thing I can think of is to use the visitor pattern and to split my text with each possible tags this way, I can have a bit more of control over my text parsing and I could probably validate my document so if the input text doesn't have valid bbcode. I could Notify the user with a error before saving anything.
I would use sablecc to create my text parser.
http://sablecc.org/
Any better idea? or anything that could lead to a efficient flexible bbcode parser?
Thank you and sorry for my bad english...
There are several existing libraries for parsing BBCode, it may be easier to look into those than trying to roll your own:
Here's a couple, I'm sure there are more if you look around:
PECL bbcode
PEAR HTML_BBCodeParser
Been looking into bbcode parsers myself. Most of them use regex and PHP4 and produce errors on PHP 5.2+ or don't work at all. PECL bbcode and PEAR HTML_BBCodeParser don't appear to be maintained any more (late 2012) and aren't easily installed on the shared hosting setup I have to work with. StringParser_BBCode works with some minor tweaks for 5.2+ but the method for adding new tags is clumsy, and it was last updated in 2008.
Buried on the 4th page of of a Bing search (I was getting desperate) I found jBBCode, which appears new and requires PHP 5.3. MIT Lisence. I have yet to try building custom tags, but so far it is the only one I've tried that works out of the box on a shared hosting account with PHP 5.3.
There's both a pecl and PEAR BBCode parsing library. Software's hard enough without reinventing years of work on your own.
If neither of those are an option, I'd concentrate on turning the BBCode into a valid XML string, and then using your favorite XML parsing routine on that. Very very rough idea here, but
Run the code through htmlspecialchars to escape any entities that need escaping
Transform all [ and ] characters into < and > respectively
Don't forget to account for the colon in cases like [tagname:
If the BBCode was nested properly, you should be all set to pass this string into an XML parsing object (SimpleXML, DOMDocument, etc.)
Responding to: "Any better idea?" (and I'm assuming that this was an invite not just for improvement over bbcode-specific suggestions)
We recently looked at going the bbcode route and decided on using htmlpurifier instead. This decision was based in part on the (admittedly biased probably) comparisons between various methods listed by the htmlpurifier group here and the discussion of bbcode (again, by the htmlpurifer group) here
And for the record I think your english was very good. I'm sure it's much better than I could do in your native language.
Use preg_split() with PREG_DELIM_CAPTURE flag to split source code into tags and non-tags. Then iterate over tags keeping stack of open blocks (i.e. when you see opening tag, add it to an array. When you see closing tag, remove elements from end of the array until closing tag matches opening tag.)
I'm working on a personal project to view web pages offline. The first idea that I came up with is using file_get_contents to get the contents of a specific url but this only gets the html and not the assets in that page(css, images, javascript, etc.). So I had to write regex to get the stylesheets and images in the page:
$css_pattern = '/\S*\.css"/';
$img_src_pattern = '/src=(?:"|\')?.+\.(?:gif|jpg|png|jpeg)(?:"|\')/';
preg_match_all($css_pattern, $contents, $style_matches);
preg_match_all($img_src_pattern, $contents, $img_matches);
This works but there are also images link in the css as well. And I'm still thinking how to deal with those.
There are also projects like ganon https://code.google.com/p/ganon/ and simple html parser that might make my life easier but I prefer using regex because I want to learn more about it.
The question is: is there a better way of doing this project? The app will probably have folders in which to save assets and html for each site and it will probably become unwieldy. I've heard of things like manifest file in html5 but I'm not sure if that's possible if you don't own the site. Any ideas? If there's no other way to do this then maybe you can just help me improve the regex that I have above. I basically have to use str_replace and foreach to get the stylesheets:
$stylesheets = array();
foreach($style_matches[0] as $match){
$stylesheets[] = str_replace(array('href=', '"', "'"), '', $match);
}
Thanks in advance!
I prefer using regex because I want to learn more about it.
Parsing HTML with regex is possible albeit non-trivial. A good introduction is given in the following paper:
REX: XML Shallow Parsing with Regular Expressions
The regular expressions used in that paper (REX) are not the ones used in PHP (PCRE), however you should be able to understand it if you're willing to learn, it's similar.
Following what that paper outlines and writing regular expressions in PHP on your own with some nice test-cases should be a real training camp for you digging into regular expressions.
Next to the regular expressions you also need to deal with character encodings which is another field of it's own and then adopting the parser for an encoding (if you do not re-encode before parsing).
If you're looking specifically for an HTML 5 compatible parser, it is specified as part of the HTML 5 "specification", but you can not do it precisely with regular expressions any longer in a sane way (at least as far as I know about it):
12.2 Parsing HTML documents — HTML Living Standard — Updated ca. daily
8.2 Parsing HTML documents — HTML5 — A vocabulary and associated APIs for HTML and XHTML W3C Candidate Recommendation 17 December 2012
For me that type of parsing looks like a large amount of overhead, but peek into the outline of the HTML 5 Parser and you get an idea what you could all take care of for HTML parsing nowadays. It seems like those guys and girls really needed to push anything in they could imagine. Actually the following engines/browsers have a HTML 5 Parser:
Gecko 2
Webkit
Chrome 7 (Webkit)
Opera 11.60 (Ragnarök)
IE10
From personal experience in the PHP eco-system there are not so many SGML based / "loose" / low-level / tag-soup HTML parsers. If I would write one, I would also use regular expressions for string parsing, the REX shallow parsing article has some good discussion. However I would probably only use such a low-level HTML parser to make any HTML consumable for DOMDocument or some other validation/fixing related stuff and won't use it for further parsing/document abstraction. DOMDocument is pretty powerful especially to gather links which you describe above.
For the rest of your question, you find all the elements you need to bring together outlined in diverse HTTP related RFCs, so you need to decide on your own which link resolving algorithm you want to support and how you re-map the static CSS/image/js files if you save them again. You normally then re-write the HTML as well for which DOMDocument is really handy.
Also you should store some HTTP headers inside the HTML file via the meta element. Especially for the encoding unless you don't re-encode it (which can be useful for offline reading anyway). Some of the more general Q&A suggestions for HTML authoring apply for a static cache as well.
The html5 manifest file is actually something different. The original server should have supported it. That is likely not the case (or you need to build a parser of it as well and process it). So if you create a mirror, you might want to also point out all static resources that can be stored locally for offline usage. That is some nice idea, I have not yet seen this implemented by tools like wget, so it's probably worth to play with that idea a little.
Instead of the HTML5 manifest file you might have also related to one of the following container formats:
Mozilla Archive Format - MAFF
MIME HTML - MHTML
Webarchive
Another one of these formats/extensions (here: SingleFile Chrome extension) makes use of the Data URI scheme according to wikipedia, which might be also useful in this context albeit I would not favorite it, I'd say it's better to have an algorithm that is able to re-write URLs to local file-system in a reproduce-able manner so that you can dump multiple HTML files with the same assets without fetching the assets multiple times.
What text to HTML converter for PHP would you recommend?
One of the examples would be Markdown, which is used here at SO. User just types some text into the text-box with some natural formatting: enters at the end of line, empty line at the end of paragraph, asterisk delimited bold text, etc. And this syntax is converted to HTML tags.
The simplicity is the main feature we are looking for, there does not need to be a lot of possibilities but those basic that are there should be very intuitive (automatic URL conversion to link, emoticons, paragraphs).
A big plus would be if there is WYSIWYG editor for it. Half-wysiwig just like here at SO would be even better.
Extra points would be if it would fit with Zend Framework well.
Take your pick at http://en.wikipedia.org/wiki/Lightweight_markup_language.
As for Markdown, there's one PHP parser that I've been using called PHP Markdown, and I especially like the Extra extension.
I have actually taken a stab at extending it with my own (undocumented) features. It's available at GitHub (remember that it's the extra branch I've fixed, not the masteR), if you're interested. I've intended on making it a 'proper fork' for a while, but that's another, largely offtopic, story.
The Zend Framework has a WYSIWYG editor bundled with it's Dojo integration.
http://framework.zend.com/manual/en/zend.dojo.form.html#zend.dojo.form.elements.editor
... Bring on the extra points!
There's always textile. It is widely implemented, and has a few basic similarities with Markdown. However, I have never seen a WYSIWYG editor for Textile.
You might find upflow useful.
If you want WYSIWYG, I'm a big fan of FCKeditor. It converts user input to HTML before submitting the form, not after, but has a nice PHP library for using it, and a PHP connector for handling file uploading/browsing (along with several other languages).
If you want something that can be read as plain-text but output as HTML, I vote for Markdown.
I will stick with my original idea of adopting Texy.
None of the products mentioned here actually beats it. I had problem with Texys syntax but it seems to be quite standard and is present in other products too.
It is very lightweith, supports very natural syntax and has great "half" wysiwyg editor Texyla (wiki is in Czech only)
I'd like to work on a bbcode filter for a php website. (I'm using cakephp, it would be a bbcode helper)
I have some requirement.
Bbcodes can be nested. So something like that is valid.
[block]
[block]
[/block]
[block]
[block]
[/block]
[/block]
[/block]
Bbcodes can have 0 or more parameters.
Exemple:
[video: url="url", width="500", height="500"]Title[/video]
Bbcodes might have mutliple behaviours.
Let say, [url]text[/url] would be transformed to [url:url="text"]text[/url]
or the video bbcode would be able to choose between youtube, dailymotion....
I think it cover most of my needs. I alreay done something with regex. But my biggest problem was to match parameters. In fact, I got nested bbcode to work and bbcode with 0 parameters. But when I added a regex match for parameters it didn't match nested bbcode correctly.
"\[($tag)(=.*)\"\](.*)\[\/\1\]" // It wasn't .* but the non-gready matcher
I don't have the complete regex with me right now, But I had something that looked like that(above).
So is there a way to match bbcode efficiently with regex or something else.
The only thing I can think of is to use the visitor pattern and to split my text with each possible tags this way, I can have a bit more of control over my text parsing and I could probably validate my document so if the input text doesn't have valid bbcode. I could Notify the user with a error before saving anything.
I would use sablecc to create my text parser.
http://sablecc.org/
Any better idea? or anything that could lead to a efficient flexible bbcode parser?
Thank you and sorry for my bad english...
There are several existing libraries for parsing BBCode, it may be easier to look into those than trying to roll your own:
Here's a couple, I'm sure there are more if you look around:
PECL bbcode
PEAR HTML_BBCodeParser
Been looking into bbcode parsers myself. Most of them use regex and PHP4 and produce errors on PHP 5.2+ or don't work at all. PECL bbcode and PEAR HTML_BBCodeParser don't appear to be maintained any more (late 2012) and aren't easily installed on the shared hosting setup I have to work with. StringParser_BBCode works with some minor tweaks for 5.2+ but the method for adding new tags is clumsy, and it was last updated in 2008.
Buried on the 4th page of of a Bing search (I was getting desperate) I found jBBCode, which appears new and requires PHP 5.3. MIT Lisence. I have yet to try building custom tags, but so far it is the only one I've tried that works out of the box on a shared hosting account with PHP 5.3.
There's both a pecl and PEAR BBCode parsing library. Software's hard enough without reinventing years of work on your own.
If neither of those are an option, I'd concentrate on turning the BBCode into a valid XML string, and then using your favorite XML parsing routine on that. Very very rough idea here, but
Run the code through htmlspecialchars to escape any entities that need escaping
Transform all [ and ] characters into < and > respectively
Don't forget to account for the colon in cases like [tagname:
If the BBCode was nested properly, you should be all set to pass this string into an XML parsing object (SimpleXML, DOMDocument, etc.)
Responding to: "Any better idea?" (and I'm assuming that this was an invite not just for improvement over bbcode-specific suggestions)
We recently looked at going the bbcode route and decided on using htmlpurifier instead. This decision was based in part on the (admittedly biased probably) comparisons between various methods listed by the htmlpurifier group here and the discussion of bbcode (again, by the htmlpurifer group) here
And for the record I think your english was very good. I'm sure it's much better than I could do in your native language.
Use preg_split() with PREG_DELIM_CAPTURE flag to split source code into tags and non-tags. Then iterate over tags keeping stack of open blocks (i.e. when you see opening tag, add it to an array. When you see closing tag, remove elements from end of the array until closing tag matches opening tag.)