Strip phpbb bbcode - php

I want to display the most recent posts from my phpbb3 forum on my website, but without the bbcode.
so I'm trying to strip the bbcode but without succes
one of the posts for example could be:
[quote="SimonLeBon":3pwalcod]bladie bla bla[/quote:3pwalcod]bla bla bladie bla blaffsd
fsdjhgfd dgfgdffgdfg
to strip bbcodes i use the function i found via google, I've tried several other similiar functions aswell:
<?php
function stripBBCode($text_to_search) {
$pattern = '|[[\/\!]*?[^\[\]]*?]|si';
$replace = '';
return preg_replace($pattern, $replace, $text_to_search);
}
?>
This however doesn't really have any effect.

This will strip bbcode, that is valid (i.e. opening tags matching closing tags).
$str = preg_replace('/\[(\w+)=.*?:(.*?)\](.*?)\[\/\1:\2\]/', '$3', $str);
CodePad.
Reusable Function
function stripBBCode($str) {
return preg_replace('/\[(\w+)=.*?:(.*?)\](.*?)\[\/\1:\2\]/', '$3', $str);
}
Explanation
\[ match literal [.
(\w+) Match 1 or more word characters and save in capturing group 1.
= Match literal =.
.*? Match ungreedily every character except \n between = and :.
: Match literal :.
(.*?) Match ungreedily every character except \n between : and ] and save in capturing group 2.
\] Match literal ].
(.*?) Match ungreedily every character except \n between : and ] and save in capturing group 3.
\[ Match literal [.
/\1\2 Match previous capturing groups again.
\] Match literal ].

Here's the one from phpBB (slightly adjusted to be standalone):
/**
* Strips all bbcode from a text and returns the plain content
*/
function strip_bbcode(&$text, $uid = '')
{
if (!$uid)
{
$uid = '[0-9a-z]{5,}';
}
$text = preg_replace("#\[\/?[a-z0-9\*\+\-]+(?:=(?:".*"|[^\]]*))?(?::[a-z])?(\:$uid)\]#", ' ', $text);
$match = return array(
'#<!\-\- e \-\->.*?<!\-\- e \-\->#',
'#<!\-\- l \-\-><a (?:class="[\w-]+" )?href="(.*?)(?:(&|\?)sid=[0-9a-f]{32})?">.*?</a><!\-\- l \-\->#',
'#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="(.*?)">.*?</a><!\-\- \1 \-\->#',
'#<!\-\- s(.*?) \-\-><img src="\{SMILIES_PATH\}\/.*? \/><!\-\- s\1 \-\->#',
'#<!\-\- .*? \-\->#s',
'#<.*?>#s',
);
$replace = array('\1', '\1', '\2', '\1', '', '');
$text = preg_replace($match, $replace, $text);
}

Why don't you use the BBCode parsing facilities that are built in to PHP?
http://php.net/manual/en/book.bbcode.php

Nowadays, use phpbb's own function https://wiki.phpbb.com/Strip_bbcode

Related

PHP - How to modify matched pattern and replace

I have string which contain space in its html tags
$mystr = "&lt; h3&gt; hello mom ?&lt; / h3&gt;"
so i wrote regex expression for it to detect the spaces in it
$pattern = '/(?<=<)\s\w+|\s\/\s\w+|\s\/(?=>)/mi';
so next i want to modify the matches by removing space from it and replace it, so any idea how it can be done? so that i can fix my string like
"&lt;h3&gt; hello mom ?&lt;/h3&gt;"
i know there is php function pre_replace but not sure how i can modify the matches
$result = preg_replace( $pattern, $replace , $mystr );
For the specific tags like you showed, you can use
preg_replace_callback('/&lt;(?:\s*\/)?\s*\w+\s*&gt;/ui', function($m) {
return preg_replace('/\s+/u', '', $m[0]);
}, $mystr)
The regex - note the u flag to deal with Unicode chars in the string - matches
&lt; - a literal string
(?:\s*\/)? - an optional sequence of zero or more whitespaces and a / char
\s* - zero or more whitespaces
\w+ - one or more word chars
\s* - zero or more whitespaces
&gt; - a literal string.
The preg_replace('/\s+/u', '', $m[0]) line in the anonymous callback function removes all chunks of whitespaces (even those non-breaking spaces).
You could keep it simple and do:
$output = str_replace(['&lt; / ', '&lt; ', '&gt; '],
['&lt;/', '&lt;', '&gt;'], $input);

Looking for specific character in capture group

I need to replace all double quotes in any (variable) given string.
For example:
$text = 'data-caption="hello"world">';
$pattern = '/data-caption="[[\s\S]*?"|(")]*?">/';
$output = preg_replace($pattern, '"', $text);
should result in:
"hello"world"
(The above pattern is my attempt at getting it to work)
The problem is that I don't now in advance if and how many double quotes are going to be in the string.
How can i replace the " with quot; ?
You may match strings between data-caption=" and "> and then replace all " inside that match with " using a mere str_replace:
$text = 'data-caption="<element attribute1="wert" attribute2="wert">Name</element>">';
$pattern = '/data-caption="\K.*?(?=">)/';
$output = preg_replace_callback($pattern, function($m) {
return str_replace('"', '"', $m[0]);
}, $text);
print_r($output);
// => data-caption="<element attribute1="wert" attribute2="wert">Name</element>">
See the PHP demo
Details
data-caption=" - starting delimiter
\K - match reset operator
.*? - any 0+ chars other than line break chars, as few as possible
(?=">) - a positive lookahead that requires the "> substring immediately to the right of the current location.
The match is passed to the anonymous function inside preg_replace_callback (accessible via $m[0]) and that is where it is possible to replace all " symbols in a convenient way.

Exclude link starts with a character from PREG_REPLACE

This codes convert any url to clickable link:
$str = preg_replace('/(http[s]?:\/\/[^\s]*)/i', '$1', $str);
How to make it not convert when url starts with [ character? Like this:
[http://google.com
Use a negative lookbehind:
$str = preg_replace('/(?<!\[)(http[s]?:\/\/[^\s]*)/i', '$1', $str);
^^^^^^^
Then, the http... substring that is preceded with [ won't be matched.
You may enhance the pattern as
preg_replace('/(?<!\[)https?:\/\/\S*/i', '$0', $str);
that is: remove the ( and ) (the capturing group) and replace the backreferences from $1 with $0 in the replacement pattern, and mind that [^\s] = \S, but shorter. Also, [s]? = s?.

Remove contents from a string comma and bracket

I want to filter a string with my requirements
$string="my super city (name , result)";
I want only result as output.
Any experts ???
Any help would be greatly appreciated.
To perform a replacement:
$result = preg_replace( '/.*?,\s*(\w+)\).*?/', '\1', $string );
In $result you have this:
result
If you want match every not-comma characters, use this regular expression instead:
/.*?,\s*([^,]+)\).*?/
1st pattern explaination:
/
.*? zero-or-more characters
, a comma
\s* zero-or-more spaces
(\w+) Group 1: one-or-more word characters
\) closing bracket
.*? zero-or-more characters
/
regex101 demo
Try this
$full = preg_replace("/[^A-Za-z ]/", '', $string);
if you need numerics(0-9) also, then try
$full = preg_replace("/[^A-Za-z0-9 ]/", '', $string);
EDIT:
Then
$exp = explode(" ", $full);
echo $exp[count($exp)-1];

How can I remove a specific format from string with RegEx?

I have a list of string like this
$16,500,000(#$2,500)
$34,000(#$11.00)
$214,000(#$18.00)
$12,684,000(#$3,800)
How can I extract all symbols and the (#$xxxx) from these strings so that they can be like
16500000
34000
214000
12684000
\(.*?\)|\$|,
Try this.Replace by empty string.See demo.
https://regex101.com/r/vD5iH9/42
$re = "/\\(.*?\\)|\\$|,/m";
$str = "\$16,500,000(#\$2,500)\n\$34,000(#\$11.00)\n\$214,000(#\$18.00)\n\$12,684,000(#\$3,800)";
$subst = "";
$result = preg_replace($re, $subst, $str);
To remove the end (#$xxxx) characters, you could use the regex:
\(\#\$.+\)
and replace it with nothing:
preg_replace("/\(\#\$.+\)/g"), "", $myStringToReplaceWith)
Make sure to use the g (global) modifier so the regex doesn't stop after it finds the first match.
Here's a breakdown of that regex:
\( matches the ( character literally
\# matches the # character literally
\$ matches the $ character literally
.+ matches any character 1 or more times
\) matches the ) character literally
Here's a live example on regex101.com
In order to remove all of these characters:
$ , ( ) # .
From a string, you could use the regex:
\$|\,|\(|\)|#|\.
Which will match all of the characters above.
The | character above is the regex or operator, effectively making it so
$ OR , OR ( OR ) OR # OR . will be matched.
Next, you could replace it with nothing using preg_replace, and with the g (global) modifier, which makes it so the regex doesn't return on the first match:
preg_replace("/\$|\,|\(|\)|#|\./g"), "", $myStringToReplaceWith)
Here's a live example on regex101.com
So in the end, your code could look like this:
$str = preg_replace("/\(\#\$.+\)/g"), "", $str)
$str = preg_replace("/\$|\,|\(|\)|#|\./g"), "", $str)
Although it isn't in one regex, it does not use any look-ahead, or look-behind (both of which are not bad, by the way).

Categories