Hi all I have a very simple bbcode parsing system, it is currently having problems with lists within lists.
My code:
$find = array(
'/\[list\](.*?)\[\/list\]/is',
'/\[\*\](.*?)(\n|\r\n?)/is',
'/\[ul\](.*?)\[\/ul\]/is',
'/\[li\](.*?)\[\/li\]/is'
);
$replace = array(
'<ul>$1</ul>',
'<li>$1</li>',
'<ul>$1</ul>',
'<li>$1</li>'
);
$body = preg_replace($find, $replace, $body);
The problem is when you have another list inside the li tags it then completely fails to parse, screenshot showing:
This is how it should look:
I know my code is probably too simple for it but how do i adjust it so it can parse a list within a list item?
Rather than using Regular Expressions you have a couple of options..
Use PHP's BBCode Parsing extension
Do a much simpler replacement, ie. straight up replace [ul] with <ul> etc.
I'm not saying it can't be done with Regex, just that it's not the simplest option.
Here's a still-regex based replacement:
$body = '[ul][li]test[/li][li]test[/li][li]test[ul][li]lol[/li][/ul][/li][li]hehe[/li][/ul]';
$find = array(
'/\[(\/?)list\]/i',
'/\[\*\](.*?)(\n|\r\n?)/i',
'/\[(\/?)ul\]/i',
'/\[(\/?)li\]/i'
);
$replace = array(
'<$1ul>',
'<li>$1</li>',
'<$1ul>',
'<$1li>'
);
$body = preg_replace($find, $replace, $body);
Related
I already a bbcode string $mybbcode = [b]Hello word[/b] with php i want to show it with html format in html page.
ex: <div><b>hello word</b><div>
Basically that others already said to you after, but if you search in Google you'll see quicky lot of info about that, and done functions. Here is a sample:
function bbc2html($content) {
$search = array (
'/(\[b\])(.*?)(\[\/b\])/',
'/(\[i\])(.*?)(\[\/i\])/',
'/(\[u\])(.*?)(\[\/u\])/',
'/(\[ul\])(.*?)(\[\/ul\])/',
'/(\[li\])(.*?)(\[\/li\])/',
'/(\[url=)(.*?)(\])(.*?)(\[\/url\])/',
'/(\[url\])(.*?)(\[\/url\])/'
);
$replace = array (
'<strong>$2</strong>',
'<em>$2</em>',
'<u>$2</u>',
'<ul>$2</ul>',
'<li>$2</li>',
'$4',
'$2'
);
return preg_replace($search, $replace, $content);
}
Only for lazy programmers ;)
I invite you to search and decide what are the best from all code already done for you project.
You will have to use regex to convert BBCodes to HTML : http://www.php.net/manual/en/ref.pcre.php
For example :
$string = preg_replace('#\[b\](.+)\[\/b\]#iUs', '<b>$1</b>', $string);
I've found this code already for dealing with content between tags
$content_processed = preg_replace_callback(
'#\<pre\>(.+?)\<\/pre\>#s', create_function(
'$matches',
'return "<pre>".htmlentities($matches[1])."</pre>";' ), $content );
but how could I get it to just get a section of the HTML. The bit I'm looking at starts with;
click here</a></p><p><span class='title'>Soups<br />
and ends at
<div style='font-size:0.8em;'>
(The parts I've chosen are quite long because that way they are unique in the HTML.)
Do not parse html with regex. Bad, bad idea. Better use an XML parser to make it a nested object/array. That way you will be off much safer.
HOWEVER, if you use static code only on your web page (EG code that is never subject to change), you can just explode on that delimiter to chop the page in two halves, and explode again
example:
$html = file_get_contents('path/to/page.phtml');
$text = explode('click here</a></p><p><span class=\'title\'>Soups<br />', $html);
$text = explode('<div style='font-size:0.8em;'>', $text[1]);
$text = $text[0];
echo $text;
I have this replace regex (it's taken from the phpbb source code).
$match = array(
'#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="(.*?)" target\=\"_blank\">.*?</a><!\-\- \1 \-\->#',
'#<!\-\- .*? \-\->#s',
'#<.*?>#s',
);
$replace = array( '\2', '', '');
$message = preg_replace($match, $replace, $message);
If I run it through a message like this
asdfafdsfdfdsfds
<!-- m --><a class="postlink" href="http://website.com/link-is-looooooong.txt">http://website.com/link ... oooong.txt</a><!-- m -->
asdfafdsfdfdsfds4324
It returns this
asdfafdsfdfdsfds
http://website.com/link ... oooong.txt
asdfafdsfdfdsfds4324
However I would like to make it into a replace function. So I can replace the link title in a block by providing the href.
I want to provide the url, new url and new title. So I can run a regex with these variables.
$url = 'http://website.com/link-is-looooooong.txt';
$new_title = 'hello';
$new_url = 'http://otherwebsite.com/';
And it would return the same raw message but with the link changed.
<!-- m --><a class="postlink" href="http://otherwebsite.com/">hello</a><!-- m -->
I've tried tweaking it into something like this but I can't get it right. I don't know how to build up the matched result so it has the same format after replacing.
$message = preg_replace('#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="'.preg_quote($url).'" target\=\"_blank\">(.*?)</a><!\-\- \1 \-\->#', $replace, $message);
You'll find that parsing HTML with regex can be a pain and get very complex. Your best bet is to use a DOM parser, like this one, and modify the links with that instead.
You need to catch the other parts in groups as well and then use them in the replacement. try something like this:
$replace = '\1http://otherwebsite.com/\3hello\4';
$reg = '#(<!-- ([mw]) --><a (?:class="[\w-]+" )?href=")'.preg_quote($url).'("(?: target="_blank")?>).*?(</a><!-- \2 -->)#';
$message = preg_replace($reg, $replace, $message);
See here.
I'm trying to scrape a website using some regex. But the site isn't written in well formatted html. In fact, the html is horrible and not structured hardly at all. But I've managed to tackle most of it. The problem I'm encountering now is that in some emails, a span is wrapped around a random part of the email like so:
****.*******#g<span class="tournamenttext">mail.com</span>
************<span class="tournamenttext">#yahoo.com</span>
<span class="tournamenttext">**********#mail.com</span>
*******#gmail.com
Is there a way to retrieve the emails with all this inconsistency?
$string ='****.*******#g<span class="tournamenttext">mail.com</span>
************<span class="tournamenttext">#yahoo.com</span>
<span class="tournamenttext">**********#mail.com</span>
*******#gmail.com';
$pattern = "/<\/?span[^>]*>/";
$string = preg_replace($pattern, "", $string);
after that $string will be only mails
****.*******#gmail.com
************#yahoo.com
**********#mail.com
*******#gmail.com
Your code will be like this
$text[1]->innertext = "Where innertext contains something like: "<em>Local (Open)
Tournament.</em> ****.*******#g<span class="tournamenttext">mail.com</span>"
// Firstly clear spans
$pattern = "/<\/?span[^>]*>/";
$text[1]->innertext = preg_replace($pattern, "", $text[1]->innertext);
// Preg Match mail
$email_regex = "^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$"; // Just an example email match regex
preg_match($email_regex, $text[1]->innertext, $theMatch);
echo '<pre>' . print_r($theMatch, true) . '</pre>';
You could simply remove all span tags by replacing </?span[^>]*> with nothing and try your favourite email address finder on the result.
i have got a simple problem :( I need to replace text smilies with the according smiley-image. ok.. thats not really complex, but now i have to replace only smilie appereances outside of HTML Tags. short examplae:
Text:
Thats a good example :/ .. with a link inside.
i want to replace ":/" with the image of this smiley...
ok, how to do that the best way?
I won't try to create some super script but think about it.... smilies are just about always surrounded by spaces. So str replace ' :/ ' with the smiley. You could be saying "what about a smiley at the end of a sentence(where it would be used the most)". Well just check for at least one space on either the left or the right of a potential smiley.
Using the above scripts:
$smiley_array = array(
":) " => "<a href...>",
" :)" => "<a href...>",
":/ " => "<a href...>",
" :/" => "<a href...>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
If you rather not have to type everything twice you can generate the array from a single smiley array.
Why don't you just try to use some special chars around your smiley text like this maybe -:/-
This will make your smiley text some kind of unique and easy to recognize
Use preg_replace with a lookbehind assertion. Example:
$smileys = array(
':/' => '<img src="..." alt=":/">'
);
foreach ($smileys as $smile => $img) {
$text = preg_replace('#(?<!<[^<>]*)' . preg_quote($smile, '#') . '#',
$img, $text);
}
The regex should match only smileys that are not inside angle brackets. This might be slow if you have a lot of false positives.
I wouldn't know about the best way, only the way I would do it.
Build an array having the smiley codes as the keys and the link as the value. The use str_replace. Pass as "needle" an array of the keys (the smiley codes) and as "replace" an array of the values.
For instance, suppose you have something like this:
$smiley_array = array(":)" => "<a href...>",
":(" => "<a href=....>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
EDIT: In case this could accidentally replace other instances with smiley-links you should consider using regexes with preg_replace. Obviously preg_replace is slower than str_replace.
You can use regex, or the extra sloppy version of the above:
$smiley_array = array(":)" => "<a href...>",
":(" => "<a href=....>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace("://", "%%QF%%", $str);
$str = str_replace($codes, $links, $str);
$str = str_replace("%%QF%%", "://", $str);
Actually, assuming str_replace follows the array sorting...
this should work:
$smiley_array = array("://" => "%%QF%%", ":)" => "<a href...>",
":(" => "<a href=....>", "%%QF%%" => "://");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
Possible overkill (increased cpu/load), but 99.99999999% safe:
<?php
$n = new DOMDocument();
$n->loadHTML('<p>Thats a good example :/ .. with a link inside.</p>');
$x = new DOMXPath($n);
$instances = $x->query('//text()[contains(.,\':/\')]');//or use '//*[child::text()]' for all textnodes
foreach($instances as $node){
if($node instanceof DOMText && preg_match_all('/:\//',$node->wholeText,$matches,PREG_OFFSET_CAPTURE|PREG_SET_ORDER)){
foreach($matches[0] as $match){
$newnode = $node->splitText($match[1]);
$newnode->replaceData(0,strlen($match[0]),'');
$img = $n->createElement('img');
$img->setAttribute('src','smily.gif');
$img = $newnode->parentNode->insertBefore($img,$newnode);
//var_dump($match);
}
}
}
var_dump($n->saveHTML());
?>
But in reality you do not want to do this all that often, save once, show many, if you are letting users edit the html (beit in wysiwyg or elsewise, the 'return' transformation (img to text) is a whole lot lighter. Up to you to expand with different smilies (one monster regex to match them, or several smaller ones / strstr()'s for readability, and a array for smiley to src (e.g. array(':/'=>'frown.gif')) would be the way to go.