Okay, so my question is pretty simple. I hope the answer is too.
Let's say I have the following php string:
<!DOCTYPE html>
<html>
<head>
<title>test file</title>
</head>
<body>
<div id="dynamicContent">
<myTag>PART_ONE</myTag>
<myTag>PART_TWO </myTag>
<myTag> PART_THREE</myTag>
<myTag> PART_FOUR </myTag>
</div>
</body>
</html>
Let's say this is $content.
Now, you can see I have 4 custom tags (myTag) with one word content. (PART_ONE, PART_TWO, etc.)
I want to replace those 4 with 4 different strings. Those latter 4 strings are in an array:
$replace = array("PartOne", "PartTwo", "PartThree", "PartFour");
I did this but it doesn't work succesfully:
$content = preg_replace("/<myTag>(.*?)<\/myTag>/s", $replace, $content);
So, I want to search for myTags (it finds 4) and replace it with one entry of the array. The first occurrence should be replaced by $replace[0], the second by $replace[1], etc.
Then, it will return the "new" content as a string (not as an array) so I can use it for further parsing.
How should I realize this?
Something like the following should work:
$replace = array("PartOne", "PartTwo", "PartThree", "PartFour");
if (preg_match_all("/(<myTag>)(.*?)(<\/myTag>)/s", $content, $matches)) {
for ($i = 0; $i < count($matches[0]); $i++) {
$content = str_replace($matches[0][$i], $matches[1][$i] . $replace[$i] . $matches[3][$i], $content);
}
}
One approach would be to loop over each element in the array you want to replace with; replace the words myTag with myDoneTag or something for each one you finished, so you find the next one. Then you can always put back myTag at the end, and you have your string:
for(ii=0; ii<4; ii++) {
$content = preg_replace("/<myTag>.*<\/myTag>/s", "<myDoneTag>".$replace[ii]."<\/myDoneTag>", $content, 1);
}
$content = preg_replace("/myDoneTag/s", "myTag", $content);
With regexes, you could something like this:
$replaces = array('foo','bar','foz','bax');
$callback = function($match) use ($replaces) {
static $counter = 0;
$return = $replaces[$counter % count($replaces)];
$counter++;
return $return;
};
var_dump(preg_replace_callback('/a/',$callback, 'a a a a a '));
But really, when searching for tags in html or xml, you want a parser:
$html = '<!DOCTYPE html>
<html>
<head>
<title>test file</title>
</head>
<body>
<div id="dynamicContent">
<myTag>PART_ONE</myTag>
<myTag>PART_TWO </myTag>
<myTag> PART_THREE</myTag>
<myTag> PART_FOUR </myTag>
</div>
</body>
</html>';
$d = new DOMDocument();
$d->loadHTML($html);
$counter = 0;
foreach($d->getElementsByTagName('mytag') as $node){
$node->nodeValue = $replaces[$counter++ % count($replaces)];
}
echo $d->saveHTML();
This should be the syntax you're looking for:
$patterns = array('/PART_ONE/', '/PART_TWO/', '/PART_THREE/', '/PART_FOUR/');
$replaces = array('part one', 'part two', 'part three', 'part four');
preg_replace($patterns, $replaces, $text);
But be warned, these are run sequentially so if the text for 'PART_ONE` contains the text 'PART_TWO' that will be subsequently replaced.
Related
I have a HTML template (as a single string) that contains a various number of keys enclosed in ### characters. For instance, these keys could be ###textItem1###, ###textItem2### and so on ...
Now, how do I find all keys that are enclosed in ### in that HTML template/string? I want to read the keys, save them in an array and then loop through the array in order to replace the keys by a proper text item (that is also represented by the same key, but in another array).
I'm working with PHP.
Thanks!
You can use regular expressions with PHP's preg_match_all function:
$pattern = '/###(.+?)###/';
$string = 'This is a text with ###textItem1### and ###textItem2### in it. It also has ###textItem3### and ###textItem4### as well';
preg_match_all($pattern, $string, $matches);
print_r($matches[1]);
PHPFiddle Link: http://phpfiddle.org/main/code/psad-tq9r
This Works too.
$string = 'hello, this is [#firstname], i am [#age] years old';
preg_match_all('~\[#(.+?)\]~', $string, $matches);
var_dump( $matches );
You can create a custom function like this
function getdatabetween($string, $start, $end){
$sp = strpos($string, $start)+strlen($start);
$ep = strpos($string, $end)-strlen($start);
$data = trim(substr($string, $sp, $ep));
return trim($data);
}
echo getdatabetween(" ###textItem1###","###", "###");
You can do this with preg_match_all
for example this is your template code
<?php
$string = '
<html>
<head>
<title>###title###</title>
</head>
<body>
###content###
</body>
</html>
';
and this is the data that you want to replace
$data = array("title" => 'hello world', 'content' => 'Page content here ....');
you can replace it like this
function getTemplate($string, $data){
preg_match_all("/[###]{3}+[a-z0-9_-]+[###]{3}/i", $string, $matches);
foreach ($matches[0] as $key => $match) {
$string = str_replace($match, $data[str_replace('#', '', $match)], $string);
}
return $string;
}
echo getTemplate($string, $data);
output
<html>
<head>
<title>hello world</title>
</head>
<body>
Page content here ....
</body>
</html>
This question already has answers here:
extract image src from text?
(3 answers)
Closed 9 years ago.
"Consider this as example and extract the image name from this string"(this part is also included as a question".
<img src='http://www.example.com/Imagename.jpg' height='400px' width='600px'>
------->
work on the material provided above along with the sentence ... i hav combination of string and tag ... and i want to extract only the filename from all these stuff .
How to do this ?
You can use a simple regex to get it from the string, or use SimpleXMLElement
Example
<?php
$input = "<img src='http://www.example.com/Imagename.jpg' height='400px' width='600px' />";
$element= new SimpleXMLElement($input);
var_dump(basename((string) $element->attributes()->src));
Will output the desired result: string 'Imagename.jpg' (length=13)
Example using regular expression (i recommend use DOM or SimpleXMLElement like i've posted), but, if you want a simpler way like using a simple regex, you can do that:
<?php
preg_match("/src='(?P<url>([^']*?))'/", $input, $matches);
$filename = isset($matches['url']) ? basename($matches['url']) : null;
var_dump($filename);
It will produce the same output.
And a full example if you want scrap all HTML, you can do that:
<?php
$html = <<<HTML
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
</head>
<body>
lalala
<img src='http://www.example.com/Imagename.jpg' height='400px' width='600px' />
</body>
</html>
HTML;
function withEachImage ($html, callable $callback) {
libxml_use_internal_errors(true);
$document = new DOMDocument();
$document->loadHTML($html);
foreach ($document->getElementsByTagName('img') as $img) {
call_user_func_array($callback, array($img->getAttribute('src')));
}
}
withEachImage($html, function ($src) {
echo basename($src);
});
There are a number of ways to go about doing this, assuming that the image filename could change, or you may alternate between " and ', this is how i would do it;
function getFileName($myImage){
$myArray = explode(" ",$myImage);
$subArray = explode("/",$myArray[1]);
return substr($subArray[count($subArray)-1],0,-1);
}
$source = "<img src='http://www.example.com/Imagename.jpg' height='400px' width='600px'>";
$myImg = getFileName($source);
print $myImg;
Code has been tested and works
Output: Imagename.jpg
you could even make it check for 'src' so it doesnt matter what order the parameters are in:
function getFileName($myImage){
$myArray = explode(" ",$myImage);
$useKey = 0;
foreach($myArray as $key => $value){
if(substr($value,0,3)=="src"){
$useKey = $key;
}
}
$subArray = explode("/",$myArray[$useKey]);
return substr($subArray[count($subArray)-1],0,-1);
}
$source = "<img height='400px' src='http://www.example.com/Imagename.jpg' width='600px'>";
$myImg = getFileName($source);
print $myImg;
I am using HTML Purifier (http://htmlpurifier.org/)
I just want to remove <script> tags only.
I don't want to remove inline formatting or any other things.
How can I achieve this?
One more thing, it there any other way to remove script tags from HTML
Because this question is tagged with regex I'm going to answer with poor man's solution in this situation:
$html = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $html);
However, regular expressions are not for parsing HTML/XML, even if you write the perfect expression it will break eventually, it's not worth it, although, in some cases it's useful to quickly fix some markup, and as it is with quick fixes, forget about security. Use regex only on content/markup you trust.
Remember, anything that user inputs should be considered not safe.
Better solution here would be to use DOMDocument which is designed for this.
Here is a snippet that demonstrate how easy, clean (compared to regex), (almost) reliable and (nearly) safe is to do the same:
<?php
$html = <<<HTML
...
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$script = $dom->getElementsByTagName('script');
$remove = [];
foreach($script as $item)
{
$remove[] = $item;
}
foreach ($remove as $item)
{
$item->parentNode->removeChild($item);
}
$html = $dom->saveHTML();
I have removed the HTML intentionally because even this can bork.
Use the PHP DOMDocument parser.
$doc = new DOMDocument();
// load the HTML string we want to strip
$doc->loadHTML($html);
// get all the script tags
$script_tags = $doc->getElementsByTagName('script');
$length = $script_tags->length;
// for each tag, remove it from the DOM
for ($i = 0; $i < $length; $i++) {
$script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
}
// get the HTML string back
$no_script_html_string = $doc->saveHTML();
This worked me me using the following HTML document:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>
hey
</title>
<script>
alert("hello");
</script>
</head>
<body>
hey
</body>
</html>
Just bear in mind that the DOMDocument parser requires PHP 5 or greater.
$html = <<<HTML
...
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$tags_to_remove = array('script','style','iframe','link');
foreach($tags_to_remove as $tag){
$element = $dom->getElementsByTagName($tag);
foreach($element as $item){
$item->parentNode->removeChild($item);
}
}
$html = $dom->saveHTML();
A simple way by manipulating string.
function stripStr($str, $ini, $fin)
{
while (($pos = mb_stripos($str, $ini)) !== false) {
$aux = mb_substr($str, $pos + mb_strlen($ini));
$str = mb_substr($str, 0, $pos);
if (($pos2 = mb_stripos($aux, $fin)) !== false) {
$str .= mb_substr($aux, $pos2 + mb_strlen($fin));
}
}
return $str;
}
Shorter:
$html = preg_replace("/<script.*?\/script>/s", "", $html);
When doing regex things might go wrong, so it's safer to do like this:
$html = preg_replace("/<script.*?\/script>/s", "", $html) ? : $html;
So that when the "accident" happen, we get the original $html instead of empty string.
this is a merge of both ClandestineCoder & Binh WPO.
the problem with the script tag arrows is that they can have more than one variant
ex. (< = < = <) & ( > = > = >)
so instead of creating a pattern array with like a bazillion variant,
imho a better solution would be
return preg_replace('/script.*?\/script/ius', '', $text)
? preg_replace('/script.*?\/script/ius', '', $text)
: $text;
this will remove anything that look like script.../script regardless of the arrow code/variant and u can test it in here https://regex101.com/r/lK6vS8/1
Try this complete and flexible solution. It works perfectly, and is based in-part by some previous answers, but contains additional validation checks, and gets rid of additional implied HTML from the loadHTML(...) function. It is divided into two separate functions (one with a previous dependency so don't re-order/rearrange) so you can use it with multiple HTML tags that you would like to remove simultaneously (i.e. not just 'script' tags). For example removeAllInstancesOfTag(...) function accepts an array of tag names, or optionally just one as a string. So, without further ado here is the code:
/* Remove all instances of a particular HTML tag (e.g. <script>...</script>) from a variable containing raw HTML data. [BEGIN] */
/* Usage Example: $scriptless_html = removeAllInstancesOfTag($html, 'script'); */
if (!function_exists('removeAllInstancesOfTag'))
{
function removeAllInstancesOfTag($html, $tag_nm)
{
if (!empty($html))
{
$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'); /* For UTF-8 Compatibility. */
$doc = new DOMDocument();
$doc->loadHTML($html,LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD|LIBXML_NOWARNING);
if (!empty($tag_nm))
{
if (is_array($tag_nm))
{
$tag_nms = $tag_nm;
unset($tag_nm);
foreach ($tag_nms as $tag_nm)
{
$rmvbl_itms = $doc->getElementsByTagName(strval($tag_nm));
$rmvbl_itms_arr = [];
foreach ($rmvbl_itms as $itm)
{
$rmvbl_itms_arr[] = $itm;
}
foreach ($rmvbl_itms_arr as $itm)
{
$itm->parentNode->removeChild($itm);
}
}
}
else if (is_string($tag_nm))
{
$rmvbl_itms = $doc->getElementsByTagName($tag_nm);
$rmvbl_itms_arr = [];
foreach ($rmvbl_itms as $itm)
{
$rmvbl_itms_arr[] = $itm;
}
foreach ($rmvbl_itms_arr as $itm)
{
$itm->parentNode->removeChild($itm);
}
}
}
return $doc->saveHTML();
}
else
{
return '';
}
}
}
/* Remove all instances of a particular HTML tag (e.g. <script>...</script>) from a variable containing raw HTML data. [END] */
/* Remove all instances of dangerous and pesky <script> tags from a variable containing raw user-input HTML data. [BEGIN] */
/* Prerequisites: 'removeAllInstancesOfTag(...)' */
if (!function_exists('removeAllScriptTags'))
{
function removeAllScriptTags($html)
{
return removeAllInstancesOfTag($html, 'script');
}
}
/* Remove all instances of dangerous and pesky <script> tags from a variable containing raw user-input HTML data. [END] */
And here is a test usage example:
$html = 'This is a JavaScript retention test.<br><br><span id="chk_frst_scrpt">Congratulations! The first \'script\' tag was successfully removed!</span><br><br><span id="chk_secd_scrpt">Congratulations! The second \'script\' tag was successfully removed!</span><script>document.getElementById("chk_frst_scrpt").innerHTML = "Oops! The first \'script\' tag was NOT removed!";</script><script>document.getElementById("chk_secd_scrpt").innerHTML = "Oops! The second \'script\' tag was NOT removed!";</script>';
echo removeAllScriptTags($html);
I hope my answer really helps someone. Enjoy!
An example modifing ctf0's answer. This should only do the preg_replace once but also check for errors and block char code for forward slash.
$str = '<script> var a - 1; </script>';
$pattern = '/(script.*?(?:\/|/|/)script)/ius';
$replace = preg_replace($pattern, '', $str);
return ($replace !== null)? $replace : $str;
If you are using php 7 you can use the null coalesce operator to simplify it even more.
$pattern = '/(script.*?(?:\/|/|/)script)/ius';
return (preg_replace($pattern, '', $str) ?? $str);
function remove_script_tags($html){
$dom = new DOMDocument();
$dom->loadHTML($html);
$script = $dom->getElementsByTagName('script');
$remove = [];
foreach($script as $item){
$remove[] = $item;
}
foreach ($remove as $item){
$item->parentNode->removeChild($item);
}
$html = $dom->saveHTML();
$html = preg_replace('/<!DOCTYPE.*?<html>.*?<body><p>/ims', '', $html);
$html = str_replace('</p></body></html>', '', $html);
return $html;
}
Dejan's answer was good, but saveHTML() adds unnecessary doctype and body tags, this should get rid of it. See https://3v4l.org/82FNP
I would use BeautifulSoup if it's available. Makes this sort of thing very easy.
Don't try to do it with regexps. That way lies madness.
I had been struggling with this question. I discovered you only really need one function. explode('>', $html); The single common denominator to any tag is < and >. Then after that it's usually quotation marks ( " ). You can extract information so easily once you find the common denominator. This is what I came up with:
$html = file_get_contents('http://some_page.html');
$h = explode('>', $html);
foreach($h as $k => $v){
$v = trim($v);//clean it up a bit
if(preg_match('/^(<script[.*]*)/ius', $v)){//my regex here might be questionable
$counter = $k;//match opening tag and start counter for backtrace
}elseif(preg_match('/([.*]*<\/script$)/ius', $v)){//but it gets the job done
$script_length = $k - $counter;
$counter = 0;
for($i = $script_length; $i >= 0; $i--){
$h[$k-$i] = '';//backtrace and clear everything in between
}
}
}
for($i = 0; $i <= count($h); $i++){
if($h[$i] != ''){
$ht[$i] = $h[$i];//clean out the blanks so when we implode it works right.
}
}
$html = implode('>', $ht);//all scripts stripped.
echo $html;
I see this really only working for script tags because you will never have nested script tags. Of course, you can easily add more code that does the same check and gather nested tags.
I call it accordion coding. implode();explode(); are the easiest ways to get your logic flowing if you have a common denominator.
This is a simplified variant of Dejan Marjanovic's answer:
function removeTags($html, $tag) {
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach (iterator_to_array($dom->getElementsByTagName($tag)) as $item) {
$item->parentNode->removeChild($item);
}
return $dom->saveHTML();
}
Can be used to remove any kind of tag, including <script>:
$scriptlessHtml = removeTags($html, 'script');
use the str_replace function to replace them with empty space or something
$query = '<script>console.log("I should be banned")</script>';
$badChar = array('<script>','</script>');
$query = str_replace($badChar, '', $query);
echo $query;
//this echoes console.log("I should be banned")
?>
Basically I want to turn a string like this:
<code> <div> blabla </div> </code>
into this:
<code> <div> blabla </div> </code>
How can I do it?
The use case (bc some people were curious):
A page like this with a list of allowed HTML tags and examples. For example, <code> is a allowed tag, and this would be the sample:
<code><?php echo "Hello World!"; ?></code>
I wanted a reverse function because there are many such tags with samples that I store them all into a array which I iterate in one loop, instead of handling each one individually...
My version using regular expressions:
$string = '<code> <div> blabla </div> </code>';
$new_string = preg_replace(
'/(.*?)(<.*?>|$)/se',
'html_entity_decode("$1").htmlentities("$2")',
$string
);
It tries to match every tag and textnode and then apply htmlentities and html_entity_decode respectively.
There isn't an existing function, but have a look at this.
So far I've only tested it on your example, but this function should work on all htmlentities
function html_entity_invert($string) {
$matches = $store = array();
preg_match_all('/(&(#?\w){2,6};)/', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $i => $match) {
$key = '__STORED_ENTITY_' . $i . '__';
$store[$key] = html_entity_decode($match[0]);
$string = str_replace($match[0], $key, $string);
}
return str_replace(array_keys($store), $store, htmlentities($string));
}
Update:
Thanks to #Mike for taking the time to test my function with other strings. I've updated my regex from /(\&(.+)\;)/ to /(\&([^\&\;]+)\;)/ which should take care of the issue he raised.
I've also added {2,6} to limit the length of each match to reduce the possibility of false positives.
Changed regex from /(\&([^\&\;]+){2,6}\;)/ to /(&([^&;]+){2,6};)/ to remove unnecessary excaping.
Whooa, brainwave! Changed the regex from /(&([^&;]+){2,6};)/ to /(&(#?\w){2,6};)/ to reduce probability of false positives even further!
Replacing alone will not be good enough for you. Whether it be regular expressions or simple string replacing, because if you replace the < > signs then the < and > signs or vice versa you will end up with one encoding/decoding (all < and > or all < and > signs).
So if you want to do this, you will have to parse out one set (I chose to replace with a place holder) do a replace then put them back in and do another replace.
$str = "<code> <div> blabla </div> </code>";
$search = array("<",">",);
//place holder for < and >
$replace = array("[","]");
//first replace to sub out < and > for [ and ] respectively
$str = str_replace($search, $replace, $str);
//second replace to get rid of original < and >
$search = array("<",">");
$replace = array("<",">",);
$str = str_replace($search, $replace, $str);
//third replace to turn [ and ] into < and >
$search = array("[","]");
$replace = array("<",">");
$str = str_replace($search, $replace, $str);
echo $str;
I think i have a small sollution, why not break html tags into an array and then compare and change if needed?
function invertHTML($str) {
$res = array();
for ($i=0, $j=0; $i < strlen($str); $i++) {
if ($str{$i} == "<") {
if (isset($res[$j]) && strlen($res[$j]) > 0){
$j++;
$res[$j] = '';
} else {
$res[$j] = '';
}
$pos = strpos($str, ">", $i);
$res[$j] .= substr($str, $i, $pos - $i+1);
$i += ($pos - $i);
$j++;
$res[$j] = '';
continue;
}
$res[$j] .= $str{$i};
}
$newString = '';
foreach($res as $html){
$change = html_entity_decode($html);
if($change != $html){
$newString .= $change;
} else {
$newString .= htmlentities($html);
}
}
return $newString;
}
Modified .... with no errors.
So, although other people on here have recommended regular expressions, which may be the absolute right way to go ... I wanted to post this, as it is sufficient for the question you asked.
Assuming that you are always using html'esque code:
$str = '<code> <div> blabla </div> </code>';
xml_parse_into_struct(xml_parser_create(), $str, $nodes);
$xmlArr = array();
foreach($nodes as $node) {
echo htmlentities('<' . $node['tag'] . '>') . html_entity_decode($node['value']) . htmlentities('</' . $node['tag'] . '>');
}
Gives me the following output:
<CODE> <div> blabla </div> </CODE>
Fairly certain that this wouldn't support going backwards again .. as other solutions posted, would, in the sense of:
$orig = '<code> <div> blabla </div> </code>';
$modified = '<CODE> <div> blabla </div> </CODE>';
$modifiedAgain = '<code> <div> blabla </div> </code>';
I'd recommend using a regular expression, e.g. preg_replace():
http://www.php.net/manual/en/function.preg-replace.php
http://www.webcheatsheet.com/php/regular_expressions.php
http://davebrooks.wordpress.com/2009/04/22/php-preg_replace-some-useful-regular-expressions/
Edit: It appears that I haven't fully answered your question. There is no built-in PHP function to do what you want, but you can do find and replace with regular expressions or even simple expressions: str_replace, preg_replace
Searching stackoverflow i've found a answer for my need, but I can't figure out how to use it exactly if someone could give me a hint It would be appreciated !
Here's my need, I'm using wordpress and I would to put automatic ID to <...> tags so I found "mario" who answer this:
If you have a coherent input like
that, then you can use regular
expressions. In this case it's both
very acceptable and simple:
$html = preg_replace_callback("#<(h[1-6])>(.*?)</\\1>#", "retitle", $html);
function retitle($match) {
list($_unused, $h2, $title) = $match;
$id = strtolower(strtr($title, " .", "--"));
return "<$h2 id='$id'>$title</$h2>"; }
The id conversion needs a bit more work. And to make the regex more reliable the innter text match pattern (.*?) could be written as ([^<>]*) for example.
H2 tag auto ID in php string
So i've tryed to apply this to my script, but that doesn't work well at all, here is my code
<?php
$html = get_the_content();
$html = preg_replace_callback("#<(h[1-6])>(.*?)</\\1>#", "retitle", $html);
function retitle($match) {
list($_unused, $h2, $title) = $match;
$id = strtolower(strtr($title, " .", "--"));
return "<$h2 id='$id'>$title</$h2>";
}
if(have_posts()) : while(have_posts()) : the_post(); //VĂ©rifie que le contenu existe
echo $html;
endwhile;
endif;
?>
Don't use regex to solve that problem. Using domdocument:
if (empty($content)) return '';
$dom = new DomDocument();
libxml_use_internal_errors(true)
$html = '<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>'.$content.'</body>
</html>';
$dom->loadHtml($html);
$hTAGs = $dom->getElementsByTagName($tag);
foreach ($hTAGs as $hTAG) {
if (!$hTAG->hasAttribute('id')) {
$title = $hTAG->nodeValue;
$id = iconv('UTF-8', 'ASCII//TRANSLIT', $title);
$id = preg_replace('/[^a-zA-Z0-9-\s]/', '', $id);
$hTAG->setAttribute('id', $id);
}
}
$content = '';
$children = $dom->getElementsByTagName('body')->item(0)->childNodes;
foreach ($children as $child) {
$content .= $dom->saveXml($child);
}
return $content;
Never, ever use RegEx for HTML, ok? Just accept this. Or read the numerous posts on here why not.
DOMDocument is ugly and evil. Use simple_html_dom instead, it's much simpler:
include 'simple_html_dom.php';
$html = str_get_html('<h2>hello</h2><h3>world</h3><h2 id='123'>how r ya</h2>');
$h2s = $html->find("h2");
foreach($h2s as $h2)
{
if(!$h2->hasAttribute("id")) $h2->id = "title";
}
echo $html->save();