Regex Replace with Backreference modified by functions - php

I want to replace the class with the div text like this :
This: <div class="grid-flags" >FOO</div>
Becomes: <div class="iconFoo" ></div>
So the class is changed to "icon". ucfirst(strtolower(FOO)) and the text is removed
Test HTML
<div class="grid-flags" >FOO</div>
Pattern
'/class=\"grid-flags\" \>(FOO|BAR|BAZ)/e'
Replacement
'class="icon'.ucfirst(strtolower($1).'"'
This is one example of a replacement string I've tried out of seemingly hundreds. I read that the /e modifier evaluates the PHP code but I don't understand how it works in my case because I need the double quotes around the class name so I'm lost as to which way to do this.
I tried variations on the backref eg. strtolower('$1'), strtolower('\1'), strtolower('{$1}')
I've tried single and double quotes and various escaping etc and nothing has worked yet.
I even tried preg_replace_callback() with no luck
function callback($matches){
return 'class="icon"'.ucfirst(strtolower($matches[0])).'"';
}

It was difficult for me to try to work out what you meant, but I think you want something like this:
preg_replace('/class="grid-flags" \>(FOO|BAR|BAZ)/e',
'\'class="icon\'.ucfirst(strtolower("$1")).\'">\'',
$text);
Output for your example input:
<div class="iconFoo"></div>
If this isn't what you want, could you please give us some example inputs and outputs?
And I have to agree that this would be easier with an HTML parser.

Instead of using the e(valuate) option you can use preg_replace_callback().
$text = '<div class="grid-flags" >FOO</div>';
$pattern = '/class="grid-flags" >(FOO|BAR|BAZ)/';
$myCB = function($cap) {
return 'class="icon'.ucfirst($cap[1]).'" >';
};
echo preg_replace_callback($pattern, $myCB, $text);
But instead of using regular expressions you might want to consider a more suitable parser for html like simple_html_dom or php's DOM extension.

This works for me
$html = '<div class="grid-flags" >FOO</div>';
echo preg_replace_callback(
'/class *= *\"grid-flags\" *\>(FOO|BAR|BAZ)/'
, create_function( '$matches', 'return \'class="icon\' . ucfirst(strtolower($matches[1])) .\'">\'.$matches[1];' )
, $html
);
Just be aware of the problems of parsing HTML with regex.

Related

how do I use preg_replace_callback on unknown values?

I got some great help today with starting to understand preg_replace_callback with known values. But now I want to tackle unknown values.
$string = '<p id="keepthis"> text</p><div id="foo">text</div><div id="bar">more text</div><a id="red"href="page6.php">Page 6</a><a id="green"href="page7.php">Page 7</a>';
With that as my string, how would I go about using preg_replace_callback to remove all id's from divs and a tags but keeping the id in place for the p tag?
so from my string
<p id="keepthis"> text</p>
<div id="foo">text</div>
<div id="bar">more text</div>
<a id="red"href="page6.php">Page 6</a>
<a id="green"href="page7.php">Page 7</a>
to
<p id="keepthis"> text</p>
<div>text</div>
<div>more text</div>
Page 6
Page 7
There's no need of a callback.
$string = preg_replace('/(?<=<div|<a)( *id="[^"]+")/', ' ', $string);
Live demo
However in the use of preg_replace_callback:
echo preg_replace_callback(
'/(?<=<div|<a)( *id="[^"]+")/',
function ($match)
{
return " ";
},
$string
);
Demo
For your example, the following should work:
$result = preg_replace('/(<(a|div)[^>]*\s+)id="[^"]*"\s*/', '\1', $string);
Though in general you'd better avoid parsing HTML with regular expressions and use a proper parser instead (for example load the HTML into a DOMDocument and use the removeAttribute method, like in this answer). That way you can handle variations in markup and malformed HTML much better.

preg_replace id="my_id" with id=my_id

I have a HTML string containing regular HTML with ID:s and classes in the following way:
<div id="my_id"></div>
I want to use a preg_replace in PHP to minify the strings in this way:
<div id=my_id></div>
In other words I want to remove the wrapping quote characters. How do I do this?
Well, if quotes really matter, then try this:
$str = '<div id="my_id">say "good"</div><div id=\'sdafsdaf\'>la\'la</div>';
$str = preg_replace('/(<[^>]+\sid=)([\'"])([^\'"]+)\2/', '$1$3', $str);
// <div id=my_id>say "good"</div><div id=sdafsdaf>la'la</div>
var_dump($str);
But I do think save these "quotes" will just bring few benefits, 3% maybe, but compression methods, like gzip, might save 70% normally.
Try this:
preg_replace('/"/', '', $matches)

Keep all html whitespaces in php mysql

i want to know how to keep all whitespaces of a text area in php (for send to database), and then echo then back later. I want to do it like stackoverflow does, for codes, which is the best approach?
For now i using this:
$text = str_replace(' ', '&nbs p;', $text);
It keeps the ' ' whitespaces but i won't have tested it with mysql_real_escape and other "inject prevent" methods together.
For better understanding, i want to echo later from db something like:
function jack(){
var x = "blablabla";
}
Thanks for your time.
Code Blocks
If you're trying to just recreate code blocks like:
function test($param){
return TRUE;
}
Then you should be using <pre></pre> tags in your html:
<pre>
function test($param){
return TRUE;
}
</pre>
As plain html will only show one space even if multiple spaces/newlines/tabs are present. Inside of pre tags spaces will be shown as is.
At the moment your html will look something like this:
function test($param){
return TRUE;
}
Which I would suggest isn't desirable...
Escaping
When you use mysql_real_escape you will convert newlines to plain text \n or \r\n. This means that your code would output something like:
function test($param){\n return TRUE;\n}
OR
<pre>function test($param){\n return TRUE;\n}</pre>
To get around this you have to replace the \n or \r\n strings to newline characters.
Assuming that you're going to use pre tags:
echo preg_replace('#(\\\r\\\n|\\\n)#', "\n", $escapedString);
If you want to switch to html line breaks instead you'd have to switch "\n" to <br />. If this were the case you'd also want to switch out space characters with - I suggest using the pre tags.
try this, works excellently
$string = nl2br(str_replace(" ", " ", $string));
echo "$string";

Removing all links and divs of certain class

Lets say I have the following string (from a much larger string with multiple similiar strings)
$str = '<div class='testdiv remove'>randomtext</div>
<div class='testdiv'>randomtext randomtext</div>';
The class 'remove' was added through a javascript function. How would I remove all elements of the class 'remove' and all links so that the string becomes this:
$str = '<div class='testdiv'>randomtext </div>';
I can't use jquery to remove these tags since I have to feed this into a php library function. How would I remove these?
Use a dom parser http://simplehtmldom.sourceforge.net/
use regular expression :)
$pattern = "/(?:<div class='testdiv remove'>[\s\S]+?</div>|<a[^>]+>[^<]+</a>)/i"
$str = preg_replace($pattern, "", $str);

Remove all text between <hr> and <embed> tag?

<hr>I want to remove this text.<embed src="stuffinhere.html"/>
I tried using regex but nothing works.
Thanks in advance.
P.S. I tried this: $str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str)
You'll get a lot of advice to use an HTML parser for this kind of thing. You should do that.
The rest of this answer is for when you've decided that the HTML parser is too slow, doesn't handle ill formed (i.e. standard in the wild) HTML, or is a pain in the ass to integrate into the system you don't control. I created the following small shell script
$str = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><embed src="stuffinhere.html"/>"
and it did remove the text, so I'd check your source documents and any other PHP code around your RegEx. You're not feeding preg_replace the string you think you are. My best guess is your source document has irregular case, or there's whitespace between the <hr /> and <embed>. Try the following regular expression instead.
$str = '<hr>I want to remove
this text.
<EMBED src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#si', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><EMBED src="stuffinhere.html"/>"
The "i" modifier says "make this search case insensitive". The "s" modifier says "the [.] character should also match my platform's line break/carriage return sequence"
But use a proper parser if you can. Seriously.
I think the code is self-explanatory and pretty easy to understand since it does not use regex (and it might be faster)...
$start='<hr>';
$end='<embed src="stuff...';
$str=' html here... ';
function between($t1,$t2,$page) {
$p1=stripos($page,$t1);
if($p1!==false) {
$p2=stripos($page,$t2,$p1+strlen($t1));
} else {
return false;
}
return substr($page,$p1+strlen($t1),$p2-$p1-strlen($t1));
}
$found=between($start,$end,$str);
while($found!==false) {
$str=str_replace($start.$found.$end,$start.$end,$str);
$found=between($start,$end,$str);
}
// do something with $str here...
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed.*?>)#', '$1$2', $text);
echo $text;
If you want to hard code src in embed tag:
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed src="stuffinhere.html"/>)#', '$1$2', $text);
echo $text;

Categories