Removing nested bbcode (quotes) in PHP [duplicate] - php

This question already has answers here:
Remove nested quotes
(3 answers)
Closed 2 years ago.
I'm trying to remove nested quoting from my bulletin board, but I'm having some issues.
Example input:
[quote author=personX link=topic=12.msg1910#msg1910 date=1282745641]
[quote author=PersonY link=topic=12.msg1795#msg1795 date=1282727068]
The message in the original quote
[/quote]
A second message quoting the first one
[/quote]
[quote author=PersonZ link=topic=1.msg1#msg1 date=1282533805]
A random third quote
[/quote]
Example output
[quote author=personX link=topic=12.msg1910#msg1910 date=1282745641]
Message in the second quote
[/quote]
[quote author=PersonZ link=topic=1.msg1#msg1 date=1282533805]
A random third quote
[/quote]
As you can see the nested quote (The original message) is removed, along with the quote tags.
I can't seem to figure it out.
When i try
$toRemove = '(\\[)(quote)(.*?)(\\])';
$string = $txt;
$found = 0; echo preg_replace("/($toRemove)/e", '$found++ ? \'\' : \'$1\'', $string);
It removes every occurrence of of the quote tag except the first one,
But when i expand the code to:
$toRemove = '(\\[)(quote)(.*?)(\\])(.*?)(\\[\\/quote\\])';
$string = $txt;
$found = 0; echo preg_replace("/($toRemove)/e", '$found++ ? \'\' : \'$1\'', $string);
It stops doing anything at all.
Any ideas on this ?
Edit:
Thanks for your help, Haggi.
Ik keep running in to trouble though.
The while loop around
while ( $input = preg_replace_callback( '~\[quoute.*?\[/quote\]~i', 'replace_callback', $input ) ) {
// replace every occurence
}
causes the page to loop indefinitely, when removed (along with the extra u in quoute), the page doesn't do anything.
I've determined that the cause is the matching
when changed to
$input = preg_replace_callback( '/\[quote(.*?)/i', 'replace_callback', $input );
the code does start working, but when changed to
$input = preg_replace_callback( '/\[quote(.*?)\[\/quote\]/i', 'replace_callback', $input );
It stopts doing anything again.
Also, there is an issue with the undo_replace function as it never finds the stored hash, it only gives warnings about unfound indexes. The regex matching the sha1 isn't working correctly i guess.
The complete code as I have it now:
$cache = array();
$input = $txt;
function replace_callback( $matches ) {
global $cache;
$hash = sha1( $matches[0] );
$cache["hash"] = $matches[0];
return "REPLACE:$hash";
}
// replace all quotes with placeholders
$input = preg_replace_callback( '/\[quote(.*?)\[quote\]/i', 'replace_callback', $input );
function undo_replace( $matches ) {
global $cache;
return $cache[$matches[1]];
}
// restore the outer most quotes
$input = preg_replace_callback( '~REPLACE:[a-f0-9]{40}~i', 'undo_replace', $input );
// remove the references to the inner quotes
$input = preg_replace( '~REPLACE:[a-f0-9]{40}~i', '', $input );
echo $input;
Thanks again for any ideas guys :)

that the first one is the only one that stays is quite easily found out:
'$found++ ? \'\' : \'$1\''
When starting $found is undefined and evaluates to false so the $1 is returned. Then $found gets incremented to 1 ( undefined + 1 = 1 ) so it is greater that zero and every time it gets called it's further incremented. As everything that is different from zero is evaluated as true after that you always get the '' back.
What you want to do is something like this
$cache = array();
function replace_callback( $matches ) {
global $cache;
$hash = sha1sum( $matches[0] );
$cache[$hash] = $matches[0];
return "REPLACE:$hash";
}
// replace all quotes with placeholders
$count = 0;
do {
$input = preg_replace_callback( '~\[quoute.*?\[/quote\]~i', 'replace_callback', $input, -1, $count );
// replace every occurence
} while ($count > 0);
function undo_replace( $matches ) {
global $cache;
return $cache[$matches[1]];
}
// restore the outer most quotes
$input = preg_replace_callback( '~REPLACE:[a-f0-9]{40}~i', 'undo_replace', $input );
// remove the references to the inner quotes
$input = preg_replace( '~REPLACE:[a-f0-9]{40}~i', '', $input );
This code is untested as I don't habe PHP at hand to test it. If there are any errors you cannot fix, please just post them here and I will fix them.
Cheers,haggi

I've searched for couple of solutions with preg_replace for nested quotes but no one worked. So i tried my littel version according to my requirement.
$position = strrpos($string, '[/quote:'); // this will get the position of last quote
$text = substr(strip_tags($string),$position+17); // this will get the data after the last quote used.
Hope this will help someone.

Related

Disallow * in php Search

I want to suppress Searches on a database from users inputting (for example) P*.
http://www.aircrewremembered.com/DeutscheKreuzGoldDatabase/
I can't work out how to add this to the code I already have. I'm guessing using an array in the line $trimmed = str_replace("\"","'",trim($search)); is the answer, replacing the "\"" with the array, but I can't seem to find the correct way of doing this. I can get it to work if I just replace the \ with *, but then I lose the trimming of the "\" character: does this matter?
// Retrieve query variable and pass through regular expression.
// Test for unacceptable characters such as quotes, percent signs, etc.
// Trim out whitespace. If ereg expression not passed, produce warning.
$search = #$_GET['q'];
// check if wrapped in quotes
if ( preg_match( '/^(["\']).*\1$/m', $search ) === 1 ) {
$boolean = FALSE;
}
if ( escape_data($search) ) {
//trim whitespace and additional disallowed characters from the stored variable
$trimmed = str_replace("\"","'",trim($search));
$trimmed = stripslashes(str_ireplace("'","", $trimmed));
$prehighlight = stripslashes($trimmed);
$prehighlight = str_ireplace("\"", "", $prehighlight);
$append = stripslashes(urlencode($trimmed));
} else {
$trimmed = "";
$testquery = FALSE;
}
$display = stripslashes($trimmed);
You already said it yourself, just use arrays as parameters for str_repace:
http://php.net/manual/en/function.str-replace.php
$trimmed = str_replace( array("\"", "*"), array("'", ""), trim($search) );
Every element in the first array will be replaced with the cioresponding element from the second array.
For future validation and sanitation, you might want to read about this function too:
http://php.net/manual/en/function.filter-var.php
use $search=mysql_real_escape_string($search); it will remove all characters from $search which can affect your query.

Concatenate variables in a regular expression string with preg_match()

I'm using preg_match() function which determines whether a function is executed.
if( preg_match( '/^([0-9]+,){0,3}[0-9]+$/', $string ) ) { ... }
However I have to use a integer variable and integrate it in the regular expression:
$integer = 4;
if( preg_match( '/^([0-9]+,){0,' . $integer . '}[0-9]+$/', $string ) ) { ... }
but it doesn't match when it should. How is it that I can't concatenate a variable in the regex string?
Edit:
strval($integer) has solved my problem. I had to convert the integer value into a string before concatenating it (although I don't understand why):
$integer = 4;
if( preg_match( '/^([0-9]+,){0,' . strval($integer) . '}[0-9]+$/', $string ) ) { ... }
Whenever concatenating a variable into a regex pattern, you should do so by passing the variable to the preg_quote function.
However, if the variable var is, like it is in your example 4, that won't make any difference. The pattern you're using will be:
/^([0-9]+,){0,4}[0-9]+$/
In which case, if it doesn't work: check the $string value, and make sure the pattern matches. BTW, /^(\d+,){,4}\d+$/ is shorter and does the same thing.
Calling strval doesn't solve anything, AFAIK... I've tested the code without strval, using the following snippet:
$string = '1234,32';
if (preg_match( '/^([0-9]+,){0,4}[0-9]+$/', $string) )
{
echo 'matches',PHP_EOL;
$count = 4;
if (preg_match( '/^([0-9]+,){0,'.$count.'}[0-9]+$/', $string ) )
echo 'matches, too',PHP_EOL;
}
The output was, as I expected:
matches
matches, too
In your case, I'd simply write:
$count = 4;
preg_match('/^(\d+,){,'.preg_quote($count, '/').'}\d+$/', $string);
This is undeniably safer than just calling strval, because you're not accounting for possible special regex chars (+[]{}/\$^?!:<=*. and the like)

PHP str_replace have different value each time?

I want to do a str_replace for a HTML String, everytime find a match item the value will increase as well.
$link = 1;
$html = str_replace($this->link, $link, $html);
This would replace all in once, and with same string $link, i would like the $link increase every time it found an match. is it possible?
Thanks very much
You can use a regular expression to return how many replacements it does.
<?php
$string = "red green green blue red";
preg_replace('/\b(green)\b/i', '[removed]', $string, -1 , $results);
echo $results; // returns '2' as it replaces green twice with [removed]
?>
If I understand you correctly (you want each match replaced with growing integer), it would seem the comments on the question encouraging you to use preg_replace_callback would be correct:
$str = 'Hello World';
$cnt = 0;
function myCallback ( $matches ) {
global $cnt;
return ++$cnt;
}
// He12o Wor3d
echo preg_replace_callback( '/\l/', 'myCallback', $str );

Check stock tickers in string against array

Consider the following array which holds all US stock tickers, ordered by length:
$tickers = array('AAPL', 'AA', 'BRK.A', 'BRK.B', 'BAE', 'BA'); // etc...
I want to check a string for all possible matches. Tickers are written with or without a "$" concatenated to the front:
$string = "Check out $AAPL and BRK.A, BA and BAE.B - all going up!";
All tickers are to be labeled like: {TICKER:XX}. The expected output would be:
Check out {TICKER:AAPL} and {TICKER:BRK.A} and BAE.B - all going up!
So tickers should be checked against the $tickers array and matched both if they are followed by a space or a comma. Until now, I have been using the following:
preg_replace('/\$([a-zA-Z.]+)/', ' {TICKER:$1} ', $string);
so I didn't have to check against the $tickers array. It was assumed that all tickers started with "$", but this only appears to be the convention in about 80% of the cases. Hence, the need for an updated filter.
My question being: is there a simple way to adjust the regex to comply with the new requirement or do I need to write a new function, as I was planning first:
function match_tickers($string) {
foreach ($tickers as $ticker) {
// preg_replace with $
// preg_replace without $
}
}
Or can this be done in one go?
Just make the leading dollar sign optional, using ? (zero or 1 matches). Then you can check for legal trailing characters using the same technique. A better way to go about it would be to explode your input string and check/replace each substring against the ticker collection, then reconstruct the input string.
function match_tickers($string) {
$aray = explode( " ", $string );
foreach ($aray as $word) {
// extract any ticker symbol
$symbol = preg_replace( '/^\$?([A-Za-z]?\.?[A-Za-z])\W*$/', '$1', $word );
if (in_array($symbol,$tickers)) { // symbol, replace it
array_push( $replacements, preg_replace( '/^\$?([A-Za-z]?\.?[A-Za-z])(\W*)$/', '{TICKER:$1}$2', $word ) );
}
else { // not a symbol, just output it normally
array_push( $replacements, $word );
}
}
return implode( " ", $replacements );
}
I think just a slight change to your regex should do the trick:
\$?([a-zA-Z.]+)
i added "?" in front of the "$", which means that it can appear 0 or 1 times
You can use a single foreach loop on your array to replace the ticker items in your string.
$tickers = array('AAPL', 'AA', 'BRK.A', 'BRK.B', 'BAE', 'BA');
$string = 'Check out $AAPL and BRK.A, BA and BAE.B - all going up!';
foreach ($tickers as $ticker) {
$string = preg_replace('/(\$?)\b('.$ticker.')\b(?!\.[A-Z])/', '{TICKER:$2}', $string);
}
echo $string;
will output
Check out {TICKER:AAPL} and {TICKER:BRK.A}, {TICKER:BA} and BAE.B -
all going up!
Adding ? after the $ sign will also accept words, i.e. 'out'
preg_replace accepts array as a pattern, so if you change your $tickers array to:
$tickers = array('/AAPL/', '/AA/', '/BRK.A/', '/BRK.B/', '/BAE/', '/BA/');
then this should do the trick:
preg_replace($tickers, ' {TICKER:$1} ', $string);
This is according to http://php.net/manual/en/function.preg-replace.php

mb_eregi_replace multiple matches get them

$string = 'test check one two test3';
$result = mb_eregi_replace ( 'test|test2|test3' , '<$1>' ,$string ,'i');
echo $result;
This should deliver: <test> check one two <test3>
Is it possible to get, that test and test3 was found, without using another match function ?
You can use preg_replace_callback instead:
$string = 'test check one two test3';
$matches = array();
$result = preg_replace_callback('/test|test2|test3/i' , function($match) use ($matches) {
$matches[] = $match;
return '<'.$match[0].'>';
}, $string);
echo $result;
Here preg_replace_callback will call the passed callback function for each match of the pattern (note that its syntax differs from POSIX). In this case the callback function is an anonymous function that adds the match to the $matches array and returns the substitution string that the matches are to be replaced by.
Another approach would be to use preg_split to split the string at the matched delimiters while also capturing the delimiters:
$parts = preg_split('/test|test2|test3/i', $string, null, PREG_SPLIT_DELIM_CAPTURE);
The result is an array of alternating non-matching and matching parts.
As far as I know, eregi is deprecated.
You could do something like this:
<?php
$str = 'test check one two test3';
$to_match = array("test", "test2", "test3");
$rep = array();
foreach($to_match as $val){
$rep[$val] = "<$val>";
}
echo strtr($str, $rep);
?>
This too allows you to easily add more strings to replace.
Hi following function used to found the any word from string
<?php
function searchword($string, $words)
{
$matchFound = count($words);// use tha no of word you want to search
$tempMatch = 0;
foreach ( $words as $word )
{
preg_match('/'.$word.'/',$string,$matches);
//print_r($matches);
if(!empty($matches))
{
$tempMatch++;
}
}
if($tempMatch==$matchFound)
{
return "found";
}
else
{
return "notFound";
}
}
$string = "test check one two test3";
/*** an array of words to highlight ***/
$words = array('test', 'test3');
$string = searchword($string, $words);
echo $string;
?>
If your string is utf-8, you could use preg_replace instead
$string = 'test check one two test3';
$result = preg_replace('/(test3)|(test2)|(test)/ui' , '<$1>' ,$string);
echo $result;
Oviously with this kind of data to match the result will be suboptimal
<test> check one two <test>3
You'll need a longer approach than a direct search and replace with regular expressions (surely if your patterns are prefixes of other patterns)
To begin with, the code you want to enhance does not seem to comply with its initial purpose (not at least in my computer). You can try something like this:
$string = 'test check one two test3';
$result = mb_eregi_replace('(test|test2|test3)', '<\1>', $string);
echo $result;
I've removed the i flag (which of course makes little sense here). Still, you'd still need to make the expression greedy.
As for the original question, here's a little proof of concept:
function replace($match){
$GLOBALS['matches'][] = $match;
return "<$match>";
}
$string = 'test check one two test3';
$matches = array();
$result = mb_eregi_replace('(test|test2|test3)', 'replace(\'\1\')', $string, 'e');
var_dump($result, $matches);
Please note this code is horrible and potentially insecure. I'd honestly go with the preg_replace_callback() solution proposed by Gumbo.

Categories