I'm looking for a comprehensive and well maintained wiki syntax Parser for PHP, does anybody know of one? I can find some really good parsers for markdown and bbcode but am having trouble with finding a decent wiki parser.
I prefer markdown myself, but I'm writing post functions for a CMS and I'd like to give end-users a choice.
I thought about downloading a copy of MediaWiki and seeing how they do it, thoughts on this as an option?
edit:
I've already looked over the PHP parsers at http://www.mediawiki.org/wiki/Alternative_parsers and none really do everything I want. see comment on #middus answer.
See the links over here, there are quite a lot of alternate parsers for MediaWiki's wiki syntax.
http://toolserver.org/~magnus/wiki2xml/w2x.php looks promising to me.
The one in DokuWiki is relatively re-usable, at least it looks so.
Related
I want to implement a commenting system for my website. I looked around and found CKEditor to be the best WYSIWYG editor I found. I tried its bbcode output and it works perfectly. However if I use bbcode output, when I want to show the comments to the users, I should use a reliable parser to parse the bbcode to HTML. If I use HTML output, I may need to use something to prevent XSS in the comments. Which way you suggest for a simple commenting system. I already integrated CKEditor to my system and prefer a very lightweight and simple approach without so much bloat (like PEAR). Also, StackOverflow seems pretty awesome. Is it possible to use something similar for my php?
I should use a reliable parser to parse the bbcode to HTML.
PHP has a pecl BBCode extension.
Also, StackOverflow seems pretty awesome. Is it possible to use something similar for my php?
SO uses Markdown. Markdown parser in PHP is also available
Background
I'm looking to create a wiki-style website.
First I took a look at http://en.wikipedia.org/wiki/List_of_wiki_software
Wanting to use PHP and being sceptic about using plain file storage the choice was lijited down to three alternatives:
Tiki Wiki CMS Groupware
PhpWiki
MediaWiki
Correct me if I'm wrong but all of these felt very heavyweight and pretty much overkill for a rather small project.
The question
My idea was then to use some kind of existing libraries and/or tools for the history, diff and markup parts but implementing the rest myself.
Do you know of any (good) libraries and/or tools like these?
Use an existing library like Markdown for marking up wiki text. Extend it if you have to. A diff algorithm for a wiki can be as trivial as you want it to be. First result on google for php diff showed an extremely simple algorithm that will probably get you started in the right direction.
PHP Diff Algorithm
PHP Markdown
Also don't forget about Github! There are all kinds of wiki projects written in PHP on there. Like this one!
I am creating a very simple cms for my site and rather than using html, I'd like to insert content in the same kind of wiki-format that's used by the Trac project.
Do you know of any open-source php scripts/classes that I can grab and use for this?
Note: I am not trying to create a wiki site. Just that formatting aspect - like how this stack exchange site accepts wiki mark-up and renders it nicely.
After doing some more research, I think I've found it.
The Forever For Now wiki-syntax-to-html parser is pretty much the same as the formatting on the Trac project.
~I have not looked at the code yet, but its pretty likely to be cool. (like Fonzie)~
Edit - I've, now, looked at the code and its beautiful and elegant and does the job.
PHP Markdown might work for you.
I've had a look and there don't seem to be any old questions that directly address this. I also haven't found a clear solution anywhere else.
I need a way to match a tag, open to close, and return everything enclosed by the tag. The regexes I've tried have problems when tags are nested. For example, the regex <tag\b[^>]*>(.*?)</tag> will cause trouble with <tag>Some text <tag>that is nested</tag> in tags</tag>. It will match <tag>Some text <tag>that is nested</tag>.
I'm looking a solution to this. Ideally an efficient one. I've seen solutions that involve matching on start and end tags separately and keeping track of their index in the content to work out which tags go together but that seems wildly inefficient to me (if it's the only possible way then c'est la vie).
The solution must be PHP only as this is the language I have to work with. I'm parsing html snippets (think body sections from a wordpress blog and you're not too far off). If there is a better than regex solution, I'm all ears!
UPDATE:
Just to make it clear, I'm aware regexes are a poor solution but I have to do it somehow which is why the title specifically mentions better solutions.
FURTHER UPDATE:
I'm parsing snippets. Solutions should take this into account. If the parser only works on a full document or is going to add <head> etc... when I get the html back out, it's not an acceptable solution.
As always, you simply cannot parse HTML with regex because it is not a regular language. You either need to write a real HTML parser, or use a real HTML parser (that someone's already written). For reasons that should be obvious, I recommend the latter option.
Relevant questions
Robust and Mature HTML Parser for PHP
How do you parse and process HTML/XML in PHP?
Why not just use DOMDocument::loadHTML? It uses libxml under the hood which is fast and robust.
I want to use phpDocumentor classes for parsing of my own PHP documentation and working with results. I can do this manually, but I'm pretty sure that phpDocumentor could be used.
The problem is that I can't find any documentation about this. How exactly should I use phpDocumentor's classes?
Thanks for the link!
I found out that Zend_Reflection_* classes can do this job.
The PHPDocumentor Manual conatins over 700 pages as the result of running PHPDoc under PHPDoc. But i only looked for a couple of hours at it a few month ago. The code is an extreme mix of tokenizer stuff mixed with regular expressions - which makes it for example impossible for PHPDoc to give a formal BNF grammer for its doc strings.
If it is important you should consider writing your own PHP parser - at least if know a little bit about compiler construction. PHP is a pretty simple language.
phpDocumentor was written back in the PHP4 days, and I don't believe it was designed in a way to allow easy extension of itself.
Further, I'd almost guess that what you actually want to start with as a base would be the tokenizer extension in PHP, which is mostly what phpDocumentor uses internally anyway.