I have a HTML string containing regular HTML with ID:s and classes in the following way:
<div id="my_id"></div>
I want to use a preg_replace in PHP to minify the strings in this way:
<div id=my_id></div>
In other words I want to remove the wrapping quote characters. How do I do this?
Well, if quotes really matter, then try this:
$str = '<div id="my_id">say "good"</div><div id=\'sdafsdaf\'>la\'la</div>';
$str = preg_replace('/(<[^>]+\sid=)([\'"])([^\'"]+)\2/', '$1$3', $str);
// <div id=my_id>say "good"</div><div id=sdafsdaf>la'la</div>
var_dump($str);
But I do think save these "quotes" will just bring few benefits, 3% maybe, but compression methods, like gzip, might save 70% normally.
Try this:
preg_replace('/"/', '', $matches)
Related
I got some great help today with starting to understand preg_replace_callback with known values. But now I want to tackle unknown values.
$string = '<p id="keepthis"> text</p><div id="foo">text</div><div id="bar">more text</div><a id="red"href="page6.php">Page 6</a><a id="green"href="page7.php">Page 7</a>';
With that as my string, how would I go about using preg_replace_callback to remove all id's from divs and a tags but keeping the id in place for the p tag?
so from my string
<p id="keepthis"> text</p>
<div id="foo">text</div>
<div id="bar">more text</div>
<a id="red"href="page6.php">Page 6</a>
<a id="green"href="page7.php">Page 7</a>
to
<p id="keepthis"> text</p>
<div>text</div>
<div>more text</div>
Page 6
Page 7
There's no need of a callback.
$string = preg_replace('/(?<=<div|<a)( *id="[^"]+")/', ' ', $string);
Live demo
However in the use of preg_replace_callback:
echo preg_replace_callback(
'/(?<=<div|<a)( *id="[^"]+")/',
function ($match)
{
return " ";
},
$string
);
Demo
For your example, the following should work:
$result = preg_replace('/(<(a|div)[^>]*\s+)id="[^"]*"\s*/', '\1', $string);
Though in general you'd better avoid parsing HTML with regular expressions and use a proper parser instead (for example load the HTML into a DOMDocument and use the removeAttribute method, like in this answer). That way you can handle variations in markup and malformed HTML much better.
How can i strip html tag except the content inside the pre tag
code
$content="
<div id="wrapper">
Notes
</div>
<pre>
<div id="loginfos">asdasd</div>
</pre>
";
While using strip_tags($content,'') the html inside the pre tag too stripped of. but i don't want the html inside pre stripped off
Try :
echo strip_tags($text, '<pre>');
You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.
<?php
$document=html_entity_decode($content);
$search = array ("'<script[^>]*?>.*?</script>'si","'<[/!]*?[^<>]*?>'si","'([rn])[s]+'","'&(quot|#34);'i","'&(amp|#38);'i","'&(lt|#60);'i","'&(gt|#62);'i","'&(nbsp|#160);'i","'&(iexcl|#161);'i","'&(cent|#162);'i","'&(pound|#163);'i","'&(copy|#169);'i","'&#(d+);'e");
$replace = array ("","","\1","\"","&","<",">"," ",chr(161),chr(162),chr(163),chr(169),"chr(\1)");
$text = preg_replace($search, $replace, $document);
echo $text;
?>
$text = 'YOUR CODE HERE';
$org_text = $text;
// hide content within pre tags
$text = preg_replace( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', '$1###pre###$3', $text );
// filter content
$text = strip_tags( $text, '<pre>' );
// insert back content of pre tags
if ( preg_match_all( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', $org_text, $parts ) ) {
foreach ( $parts[2] as $code ) {
$text = preg_replace( '/###pre###/', $code, $text, 1 );
}
}
print_r( $text );
Ok!, you leave nothing but one choice: Regular Expressions... Nobody likes 'em, but they sure get the job done. First, replace the problematic text with something weird, like this:
preg_replace("#<pre>(.+?)</pre>#", "||k||", $content);
This will effectively change your
<pre> blah, blah, bllah....</pre>
for something else, and then call
strip_tags($content);
After that, you can just replace the original value in ||k||(or whatever you choose) and you'll get the desired result.
I think your content is not stored very well in the $content variable
could you check once by converting inner double quotes to single quotes
$content="
<div id='wrapper'>
Notes
</div>
<pre>
<div id='loginfos'>asdasd</div>
</pre>
";
strip_tags($content, '<pre>');
You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.
Could you please write full code. I understood, but something goes wrong. Please write full programming code
<hr>I want to remove this text.<embed src="stuffinhere.html"/>
I tried using regex but nothing works.
Thanks in advance.
P.S. I tried this: $str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str)
You'll get a lot of advice to use an HTML parser for this kind of thing. You should do that.
The rest of this answer is for when you've decided that the HTML parser is too slow, doesn't handle ill formed (i.e. standard in the wild) HTML, or is a pain in the ass to integrate into the system you don't control. I created the following small shell script
$str = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><embed src="stuffinhere.html"/>"
and it did remove the text, so I'd check your source documents and any other PHP code around your RegEx. You're not feeding preg_replace the string you think you are. My best guess is your source document has irregular case, or there's whitespace between the <hr /> and <embed>. Try the following regular expression instead.
$str = '<hr>I want to remove
this text.
<EMBED src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#si', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><EMBED src="stuffinhere.html"/>"
The "i" modifier says "make this search case insensitive". The "s" modifier says "the [.] character should also match my platform's line break/carriage return sequence"
But use a proper parser if you can. Seriously.
I think the code is self-explanatory and pretty easy to understand since it does not use regex (and it might be faster)...
$start='<hr>';
$end='<embed src="stuff...';
$str=' html here... ';
function between($t1,$t2,$page) {
$p1=stripos($page,$t1);
if($p1!==false) {
$p2=stripos($page,$t2,$p1+strlen($t1));
} else {
return false;
}
return substr($page,$p1+strlen($t1),$p2-$p1-strlen($t1));
}
$found=between($start,$end,$str);
while($found!==false) {
$str=str_replace($start.$found.$end,$start.$end,$str);
$found=between($start,$end,$str);
}
// do something with $str here...
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed.*?>)#', '$1$2', $text);
echo $text;
If you want to hard code src in embed tag:
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed src="stuffinhere.html"/>)#', '$1$2', $text);
echo $text;
I want to change
<lang class='brush:xhtml'>test</lang>
to
<pre class='brush:xhtml'>test</pre>
my code like that.
<?php
$content="<lang class='brush:xhtml'>test</lang>";
$pattern=array();
$replace=array();
$pattern[0]="/<lang class=([A-Za-z='\":])* </";
$replace[0]="<pre $1>";
$pattern[1]="/<lang>/";
$replace[1]="</pre>";
echo preg_replace($pattern, $replace,$content);
?>
but it's not working. How to change my code or something wrong in my code ?
There's quite a few problems:
Pattern 0 has the * outside the group, so the group only matches one character
Pattern 0 doesn't include the class= in the group, and the replacement doesn't have it either, so there won't be a class= in the replaced string
Pattern 0 has a space after the class, but there isn't one in the content string
Pattern 1 looks for lang instead of /lang
This will work:
$pattern[0]="/<lang (class=[A-Za-z='\":]*) ?>/";
$replace[0]="<pre $1>";
$pattern[1]="/<\/lang>/";
$replace[1]="</pre>";
How bout without regex? :)
<?php
$content="<lang class='brush:xhtml'>test</lang>";
$content = html_entity_decode($content);
$content = str_replace('lang','pre',$content);
echo $content;
?>
Using preg_replace is a lot faster than str_replace.
$str = preg_replace("/<lang class=([A-Za-z'\":]+)>(.*?)<\/lang>/", "<pre class=$1>$2</pre>", $str);
Execution time: 0.039815s
[preg_replace]
Time: 0.009518s (23.9%)
[str_replace]
Time: 0.030297s (76.1%)
Test Comparison:
[preg_replace]
compared with.........str_replace 218.31% faster
So preg_replace is 218.31% faster than the str_replace method mentioned above. Each tested 1000 times.
I want to replace the class with the div text like this :
This: <div class="grid-flags" >FOO</div>
Becomes: <div class="iconFoo" ></div>
So the class is changed to "icon". ucfirst(strtolower(FOO)) and the text is removed
Test HTML
<div class="grid-flags" >FOO</div>
Pattern
'/class=\"grid-flags\" \>(FOO|BAR|BAZ)/e'
Replacement
'class="icon'.ucfirst(strtolower($1).'"'
This is one example of a replacement string I've tried out of seemingly hundreds. I read that the /e modifier evaluates the PHP code but I don't understand how it works in my case because I need the double quotes around the class name so I'm lost as to which way to do this.
I tried variations on the backref eg. strtolower('$1'), strtolower('\1'), strtolower('{$1}')
I've tried single and double quotes and various escaping etc and nothing has worked yet.
I even tried preg_replace_callback() with no luck
function callback($matches){
return 'class="icon"'.ucfirst(strtolower($matches[0])).'"';
}
It was difficult for me to try to work out what you meant, but I think you want something like this:
preg_replace('/class="grid-flags" \>(FOO|BAR|BAZ)/e',
'\'class="icon\'.ucfirst(strtolower("$1")).\'">\'',
$text);
Output for your example input:
<div class="iconFoo"></div>
If this isn't what you want, could you please give us some example inputs and outputs?
And I have to agree that this would be easier with an HTML parser.
Instead of using the e(valuate) option you can use preg_replace_callback().
$text = '<div class="grid-flags" >FOO</div>';
$pattern = '/class="grid-flags" >(FOO|BAR|BAZ)/';
$myCB = function($cap) {
return 'class="icon'.ucfirst($cap[1]).'" >';
};
echo preg_replace_callback($pattern, $myCB, $text);
But instead of using regular expressions you might want to consider a more suitable parser for html like simple_html_dom or php's DOM extension.
This works for me
$html = '<div class="grid-flags" >FOO</div>';
echo preg_replace_callback(
'/class *= *\"grid-flags\" *\>(FOO|BAR|BAZ)/'
, create_function( '$matches', 'return \'class="icon\' . ucfirst(strtolower($matches[1])) .\'">\'.$matches[1];' )
, $html
);
Just be aware of the problems of parsing HTML with regex.