Preg_replace do not replace everything

Preg_replace do not replace everything - php

$content = $this->comment->getContent(true);
$bbcodes = array (
'#\[cytat=(.*?)\](.*?)\[/cytat\]#' => '<div class="cytata">\\1 napisał/a </div> <div class="cytatb">\\2</div>',
'#\[cytat\](.*?)\[/cytat\]#' => '<div class="cytata">cytat</div><div class="cytatb">\\1</div>',
);
$content = preg_replace(array_keys($bbcodes), array_values($bbcodes), $content);
That preg_replace is not replacing every tag like that should.
For example if there will be only one tag [cytat]some text[/cytat] (cytat means quote in polish) then everything will be ok and the output will be
<div class="cytata">author napisał/a </div> <div class="cytatb">some text</div>
but there will be more than a one quote then preg is replacing only one tag, for example
<div class="cytata">o0skar napisał/a </div> <div class="cytatb">[cytat=o0skar]test nr2</div>[/cytat]
thats the output of the double quote, etc. Any ideas? Something wrong?
Maybe I can put preg_replace in while loop, but i dont know if preg_replace returns any variable.

For the sake of regular expressions awesomeness, let's look at this one. I had to change the pattern by 1 character. I removed one of the lazy ? and made this a preg_replace_callback
function pregcallbackfunc($matches){
$pattern = '#\[cytat=(.*?)\](.*)\[/cytat\]#';
if(preg_match($pattern, $matches[2])){
$matches[2] = preg_replace_callback($pattern,'pregcallbackfunc', $matches[2]);
}
if($matches[2]){
return '<div class="cytata">'.$matches[1].' napisał/a </div> <div class="cytatb">'.$matches[2].'</div>';
}
return '<div class="cytata">cytat</div><div class="cytatb">'.$matches[1].'</div>';
}
$content = '[cytat=o0skar][cytat=o0skar]test nr2[/cytat][/cytat]';
$content = preg_replace_callback('#\[cytat=(.*?)\](.*)\[/cytat\]#', 'pregcallbackfunc', $content);
Making this recursive will guarantee any level of nested quotes.

Related

preg_replace within the preg_replace

Right now I'm having issues replacing strings that already come out from preg_match. Lets say I have bbcode of [b]bla[/b], I have this part working with replacing [b] with <b>, but lets just say for all testing purposes that they did [b]hi [b]test[/b][/b], what ends up coming out is "hi [b]test[/b]", with everything being bolded, but the [b] won't get replaced for some reason.
Currently this is my expression: /\[b\](.*)\[\/b\]/
Sorry, I didn't show my code, I'm new to this.
// Will convert string data into readable data
function ConvertStringData2ReadableData($UglyString) {
$CheckArrays = [
"QUOTE" => "/\[quote=?(.*)\](.*)\[\/quote\]/",
"BOLD" => "/\[b\](.*)\[\/b\]/",
"ITALIC" => "/\[i\](.*)\[\/i\]/",
];
$FanceString = $UglyString;
// QUOTES
do {
$FanceString = preg_replace_callback(
$CheckArrays['QUOTE'],
function($match) {
if (is_numeric($match[1])) {
$TPID = GetThreadPoster($match[1]);
$TPUN = GetUsernameS($TPID);
$statement = ('<div class="panel panel-default"><div class="panel-heading">'.$match[2].'<br>- <b>'.$TPUN.'</b></div></div>');
} elseif (!is_numeric($match[1])) {
$statement = ('<div class="panel panel-default"><div class="panel-heading">'.$match[2].'</div></div>');
}
return $statement;
},
$FanceString,
-1,
$count
);
} while ($count > 0);
// BOLD
do {
$FanceString = preg_replace($CheckArrays['BOLD'] , "<b>$1</b>" , $FanceString, -1, $count);
} while ($count > 0);
#$FanceString = preg_replace($CheckArrays['BOLD'] , "<b>$1</b>" , $FanceString, -1);
// ITALIC
do {
$FanceString = preg_replace($CheckArrays['ITALIC'] , "<i style='all: unset; font-style: italic;'>$1</i>" , $FanceString, -1, $count);
} while ($count > 0);
return($FanceString);
}

You could do something like this:
$string = '[b]hi [b]test[/b][/b]';
do {
$string = preg_replace('/\[b\](.*)\[\/b\]/', '<b>$1</b>', $string, -1, $count);
} while ($count > 0);
Or just use #Justinas' idea (from your OT's comment) if it's OK to replace all [b] with <b> and [/b] with </b> (regardless of them being in the right order/as pairs).
Edit: you also need to change your quote regex to this:
/\[quote(?:=(\d+))?\](.*)\[\/quote\]/s
s flag allows . to match newlines (you probably want to add it to the other ones too). I also fixed the quote ID capturing part.

Because you are never going to be able to fully trust user data AND because bbcode is just as vulnerable as html to incorrect parsing by regex, you will never be 100% confident that this method will work.
Non-quote tags can just as easily be replaced by a non-regex method, so I am eliminating the pattern convolution by segmenting the logic.
I am implementing a recursive pattern for quote tags (assuming everything will be balanced) and using your do-while() technique -- I think this is the best approach. This will effectively work from outer quote inward on each iteration (while $count is positive).
Code: (Demo)
function bbcodequote2html($matches){
$text=(isset($matches[2])?$matches[2]:''); // avoid Notices
if(isset($matches[1]) && ctype_digit($matches[1])){
$TPID = "#{$matches[1]}"; // GetThreadPoster($match[1]);
$TPUN = "#{$matches[1]}"; // GetUsernameS($TPID);
$quotee="<br>- <b>$TPUN</b>";
}else{
$quotee=''; // no id value or id is non-numeric default to empty string
}
return "<div class=\"panel panel-default\"><div class=\"panel-heading\">$text$quotee</div></div>";
}
$bbcode=<<<BBCODE
[quote=2]Outer Quote[b]bold [b]nested bold[/b][/b]
[i]italic [i]nested italic[/i][/i][quote]Inner Quote 1: (no id)[/quote]
[quote=bitethatapple]Inner Quote 2[quote=1]Inner Quote 3[/quote] still inner quote 2 [quote=mickmackusa]Inner Quote 4[/quote] end of inner quote 2[/quote][/quote]
BBCODE;
$converted=str_replace(
['[b]','[/b]','[i]','[/i]'],
['<b>','</b>','<i style=\"all:unset;font-style:italic;\">','</i>'],
$bbcode
);
$tabs="\t";
do{
$converted=preg_replace_callback('~\[quote(?:=(.+?))?]((?:(?R)|.*?)+)\[/quote]~is','bbcodequote2html',$converted,-1,$count);
}while($count);
echo $converted;
It is difficult for me to display the output in a fashion that is easy to read. You may be best served to run my code on your server and check that the results render as desired.
Output:
<div class="panel panel-default"><div class="panel-heading">Outer Quote<b>bold <b>nested bold</b></b>
<i style=\"all:unset;font-style:italic;\">italic <i style=\"all:unset;font-style:italic;\">nested italic</i></i><div class="panel panel-default"><div class="panel-heading">Inner Quote 1: (no id)</div></div>
<div class="panel panel-default"><div class="panel-heading">Inner Quote 2<div class="panel panel-default"><div class="panel-heading">Inner Quote 3<br>- <b>#1</b></div></div> still inner quote 2 <div class="panel panel-default"><div class="panel-heading">Inner Quote 4</div></div> end of inner quote 2</div></div><br>- <b>#2</b></div></div>

how to regex character < and > replace like < and > in tag <code> </code>?

I have a string like bellow:
<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>
In the <code></code> tag I want to replace all the characters < and > with < and >. How should I do?
Example: <code> < div ><code>.
Please tell me if you have any ideas. Thanks all.

try below solution:
$textToScan = '<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>';
// the regex pattern (case insensitive & multiline
$search = "~<code>(.*?)</code>~is";
// first look for all CODE tags and their content
preg_match_all($search, $textToScan, $matches);
//print_r($matches);
// now replace all the CODE tags and their content with a htmlspecialchars() content
foreach($matches[1] as $match){
$replace = htmlspecialchars($match);
// now replace the previously found CODE block
$textToScan = str_replace($match, $replace, $textToScan);
}
// output result
echo $textToScan;
output:
<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>

Don't. Use htmlspecialchars. That is there only to serve that very purpose
echo htmlspecialchars("<a href='test'>Test</a>");
Output of your HTML code
<pre title="language-markup"><code>
<div title="item_content item_view_content"
itemprop="articleBody">abc</div></code></pre>
Another example based on your comment
<code>
<?php
echo htmlspecialchars('html here');?>
</code>

Use either htmlspecialchars() or htmlentities()
$string = "<html></html>"
// Do this
$encodedString = htmlentities($string);
// or
$encodedString = htmlspecialchars($string);
The difference in these two functions is that one will encode everything or better said "entities". The other will only encode special characters.
Below are some quotes from PHP.net
From the PHP documentation for htmlentities:
This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.
From the PHP documentation for htmlspecialchars:
Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.

Ok, I'm trying to fix my problem. I was successed, this is my code to resolve my problem. You can use my way or use Chetan Ameta's way bellow my answer:
function replaceString($string)
{
preg_match_all('/<code>(.*?)<\/code>/', $string, $matches);
$result = [];
foreach ($matches[1] as $key => $match) {
$result[$key] = str_replace(['<', '>'], ['<', '>'], $match);
}
return str_replace($matches[1], $result, $string);
}
$string = '<pre title="language-markup"><code><div title="item_content item_view_content" itemprop="articleBody">abc</div></code></pre>';
echo replaceString($string);
I like this place, thanks all help me, i'm so grateful. Thank again.

Truncate Text Within Specific HTML Tag

This might not even be possible but I have quite a limited knowledge of PHP so I can't figure out if it is or not.
Basically I have a string $myText and this string outputs HTML in the following format:
<p>This is the main bit of text</p>
<small> This is some additional text</small>
My aim is to limit the number of characters displayed specifically within the <p> tag, for example 10 characters.
I have been playing around with PHP substr but I can only get this to work on all of the text, not just the text in the <p> tag.
Do you know if this is possible and if it is, do you know how to do it? Any pointers at all would be appreciated.
Thank you

The simplest solution is:
<?php
$text = '
<p>This is the main bit of text</p>
<small> This is some additional text</small>';
$pos = strpos($text,'<p>');
$pos2 = strpos($text,'</p>');
$text = '<p>' . substr($text,$pos+strlen('<p>'),10).substr($text,$pos2);
echo $text;
but it will work just for first pair of <p> ... </p>
If you need more, you can use regular expressions:
<?php
$text = '
<p>This is the main bit of text</p>
<small> This is some additional text</small>
<p>
werwerwrewre
</p>';
preg_match_all('#<p>(.*)</p>#isU', $text, $matches);
foreach ($matches[1] as $match) {
$text = str_replace('<p>'.$match.'</p>', '<p>'.substr($match,0,10).'</p>', $text);
}
echo $text;
or even
<?php
$text = '
<p>This is the main bit of text</p>
<small> This is some additional text</small>
<p>
werwerwrewre
</p>';
$text = preg_replace_callback('#<p>(.*)</p>#isU', function($matches) {
$matches[1] = '<p>'.substr($matches[1],0,10).'</p>';
return $matches[1];
}, $text);
echo $text;
However in those all 3 cases, all white characters are assumed as part of the string, so if the content of <p>...</p> starts with 3 spaces and you want to display only 3 characters, you simple display only 3 spaces, nothing more. Of course it can be quite easily modified, but I mentioned it to notice that fact.
And one more thing, quite possible you will need to use multibyte version of functions to get the result, so for example instead of strpos() you should use mb_strpos() and set earlier utf-8 encoding using mb_internal_encoding('UTF-8'); to make it working

You can achieve it by a quite simple way:
<?php
$max_length = 5;
$input = "<b>example: </b><div align=left>this is a test</div><div>another very very long item</div>";
$elements_count = preg_match_all("|(<[^>]+>)(.*)(</[^>]+>)|U",
$input,
$out, PREG_PATTERN_ORDER);
for($i=0; $i<$elements_count; $i++){
echo $out[1][$i].substr($out[2][$i], 0, $max_length).$out[3][$i]."\n";
}
these will work for any tag and any class or attribute within it.
ex. input:
<b>example: </b><div align=left>this is a test</div><div>another very very long item</div>
output:
<b>examp</b>
<div align=left>this </div>
<div>anoth</div>

how do I use preg_replace_callback on unknown values?

I got some great help today with starting to understand preg_replace_callback with known values. But now I want to tackle unknown values.
$string = '<p id="keepthis"> text</p><div id="foo">text</div><div id="bar">more text</div><a id="red"href="page6.php">Page 6</a><a id="green"href="page7.php">Page 7</a>';
With that as my string, how would I go about using preg_replace_callback to remove all id's from divs and a tags but keeping the id in place for the p tag?
so from my string
<p id="keepthis"> text</p>
<div id="foo">text</div>
<div id="bar">more text</div>
<a id="red"href="page6.php">Page 6</a>
<a id="green"href="page7.php">Page 7</a>
to
<p id="keepthis"> text</p>
<div>text</div>
<div>more text</div>
Page 6
Page 7

There's no need of a callback.
$string = preg_replace('/(?<=<div|<a)( *id="[^"]+")/', ' ', $string);
Live demo
However in the use of preg_replace_callback:
echo preg_replace_callback(
'/(?<=<div|<a)( *id="[^"]+")/',
function ($match)
{
return " ";
},
$string
);
Demo

For your example, the following should work:
$result = preg_replace('/(<(a|div)[^>]*\s+)id="[^"]*"\s*/', '\1', $string);
Though in general you'd better avoid parsing HTML with regular expressions and use a proper parser instead (for example load the HTML into a DOMDocument and use the removeAttribute method, like in this answer). That way you can handle variations in markup and malformed HTML much better.

Stripping html tags using php

How can i strip html tag except the content inside the pre tag
code
$content="
<div id="wrapper">
Notes
</div>
<pre>
<div id="loginfos">asdasd</div>
</pre>
";
While using strip_tags($content,'') the html inside the pre tag too stripped of. but i don't want the html inside pre stripped off

Try :
echo strip_tags($text, '<pre>');

You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.

<?php
$document=html_entity_decode($content);
$search = array ("'<script[^>]*?>.*?</script>'si","'<[/!]*?[^<>]*?>'si","'([rn])[s]+'","'&(quot|#34);'i","'&(amp|#38);'i","'&(lt|#60);'i","'&(gt|#62);'i","'&(nbsp|#160);'i","'&(iexcl|#161);'i","'&(cent|#162);'i","'&(pound|#163);'i","'&(copy|#169);'i","'&#(d+);'e");
$replace = array ("","","\1","\"","&","<",">"," ",chr(161),chr(162),chr(163),chr(169),"chr(\1)");
$text = preg_replace($search, $replace, $document);
echo $text;
?>

$text = 'YOUR CODE HERE';
$org_text = $text;
// hide content within pre tags
$text = preg_replace( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', '$1###pre###$3', $text );
// filter content
$text = strip_tags( $text, '<pre>' );
// insert back content of pre tags
if ( preg_match_all( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', $org_text, $parts ) ) {
foreach ( $parts[2] as $code ) {
$text = preg_replace( '/###pre###/', $code, $text, 1 );
}
}
print_r( $text );

Ok!, you leave nothing but one choice: Regular Expressions... Nobody likes 'em, but they sure get the job done. First, replace the problematic text with something weird, like this:
preg_replace("#<pre>(.+?)</pre>#", "||k||", $content);
This will effectively change your
<pre> blah, blah, bllah....</pre>
for something else, and then call
strip_tags($content);
After that, you can just replace the original value in ||k||(or whatever you choose) and you'll get the desired result.

I think your content is not stored very well in the $content variable
could you check once by converting inner double quotes to single quotes
$content="
<div id='wrapper'>
Notes
</div>
<pre>
<div id='loginfos'>asdasd</div>
</pre>
";
strip_tags($content, '<pre>');

You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.
Could you please write full code. I understood, but something goes wrong. Please write full programming code

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Preg_replace do not replace everything - php

Related

preg_replace within the preg_replace

how to regex character < and > replace like < and > in tag <code> </code>?

Truncate Text Within Specific HTML Tag

how do I use preg_replace_callback on unknown values?

Stripping html tags using php

Categories

Resources