Php regex replace elements - php

im trying to change html elements inside PHP.
first is to replace textarea with h1.
Thing needs to be replaced looks something like this:
<textarea class="head" id="hd_x">Random headline</textarea>
Im trying to change to this:
<h1 class="head" id="hd_x">Random headline</h1>
Random headline can be- Dog like cats, Cats dont like dogs.
X in id can be number- hd_1, hd_2 and so on( but i think it is no needed to be touched, so it can be ignored ).
Second is need to replace textarea with p. Original looks something like this:
<textarea class="text" id="txt_x">Random text</textarea>
Im trying to change to this:
<p class="text" id="txt_x">Random text</h1>
Random text and X here works same as on first one
If you can figure out what im trying to do and it is possible and short then tt would be nice if you help me to do only the H1 part. I think i can figure <p> (2nd) part out it.
I tryed to do it with str_replace but the problem is that then it is always replacing </textarea> with </h1> or with </p>
Thank you
My idea is is that i need 2 separate preg_replace. One of them recognizes this part:
<textarea class="head"
knows it needs to replace with :
<h1 class="head"
Thems skips over this part:
id="hd_x">Random headline
then it preg_replace recognizes again this one:
</textarea>
and replaces with:
</h1>
Trying to make it short. Finds by this(???? is part that should be ignored and left untouched):
<textarea class="head" ??????????????????</textarea>
and replaced with(????? is part that was untouched):
class="head" i think is needed cause preg_replace pattern figures out this way that it need to replace with h1 not with p.

You should not use RegEx to change HTML elements. The DOM recognizes the structure and xpath makes it easy to do what you want:
$html = <<<'HTML'
<html>
<body>
<textarea class="head" id="hd_x">Random headline</textarea>
<textarea class="text" id="hd_x">Random headline</textarea>
</body>
</html>
HTML;
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$names = array(
'head' => 'h1', 'text' => 'p'
);
$nodes = $xpath->evaluate('//textarea[#class="head" or #class="text"]');
foreach ($nodes as $node) {
// create the new node depending on the class attribute
$type = $node->getAttribute('class');
$newNode = $dom->createElement($names[$type]);
// fetch all attributes of the current node
$attributes = $xpath->evaluate('#*', $node);
// and append them to the new node
foreach ($attributes as $attribute) {
$newNode->appendChild($attribute);
}
// replace the current node with the new node
$node->parentNode->replaceChild($newNode, $node);
}
var_dump($dom->saveHtml());

Related

How can I strip html tags except some of them?

I need to remove all html codes from a php string except:
<p>
<em>
<small>
You know, strip_tags() function is good, but it strips all html tags, how can I tell it remove all html except those tags above?
You should check out the manual: Example #1 strip_tags() example
Syntax: strip_tags ( Your-string, Allowable-Tags )
If you pass the second parameter, these tags will not be stripped.
strip_tags($string, '<p><em><small>');
According to your comment, you want to remove HTML elements only if they have some class or attribute. You'll need to build up a DOM then:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>I will be deleted as well</p>
<p>But keep this</p>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$xpath = new DOMXPath($dom);
$elements_to_be_removed = $xpath->query("//*[count(#*)>0]");
foreach ($elements_to_be_removed as $element) {
$element->parentNode->removeChild($element);
}
// just to check
echo $dom->saveHTML();
?>
To change which elements shall be removed, you'll need to change the query, ie to remove all elements with the class myclass, it must read "//*[class='myclass']".

PHP preg_match_all - group without returning a match

How would I get content from HTML between h3 tags inside an element that has class pricebox? For example, the following string fragment
<!-- snip a lot of other html content -->
<div class="pricebox">
<div class="misc_info">Some misc info</div>
<h3>599.99</h3>
</div>
<!-- snip a lot of other html content -->
The catch is 599.99 has to be the first match returned, that is if the function call is
preg_match_all($regex,$string,$matches)
the 599.99 has to be in $matches[0][1] (because I use the same script to get numbers from dissimilar looking strings with different $regex - the script looks for the first match).
Try using XPath; definitely NOT RegEx.
Code :
$html = new DOMDocument();
#$html->loadHtmlFile('http://www.path.to/your_html_file_html');
$xpath = new DOMXPath( $html );
$nodes = $xpath->query("//div[#class='pricebox']/h3");
foreach ($nodes as $node)
{
echo $node->nodeValue."";
}

DOMXPath union extract with PHP

I'm trying to get img and the div which is coming after the div which contains that img, all in one query.
So I did this:
$nodes = $xpath->query('//div[starts-with(#id, "someid")]/img |
//div[starts-with(#id, "someid")]/following-sibling::div[#class="spec_class"][1]/text()');
Now, I'm able to get the attributes of img tag, but I can't get the text of the following sibling. If I separate the query (two queries - first for the img and second query for the sibling) it works. But how can I do this with only one query? By the way, there is no error in the syntax. But somehow the union doesn't work or maybe I'm not extracting the sibling content right.
Here's the markup (which repeats many times with another text and id="someid_%randomNumber%)
<div id="someid_1">
<img src="link_to_image.png" />
...some text...
</div>
<div>...another text...</div>
<div class="spec_class">
...Important text...
</div>
I want to get in one query both link_to_image.png and ...Important text...
Your query seems correct.
Example XML:
<div>
<div id="someid-1"><img src="foo"/></div>
<div class="spec_class">bar</div>
<div class="spec_class">baz</div>
</div>
Example PHP Code:
$dom = new DOMDocument;
$dom->loadXml($xhtml);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//div…') as $node) {
echo $dom->saveXML($node);
}
Outputs (demo):
<img src="foo"/>bar
Note that you will have to iterate the DOMNodeList returned by the XPath query.

PHP DOMDocument: insertBefore, how to make it work?

I would like to place a new node element, before a given element. I'm using insertBefore for that, without success!
Here's the code,
<DIV id="maindiv">
<!-- I would like to place the new element here -->
<DIV id="child1">
<IMG />
<SPAN />
</DIV>
<DIV id="child2">
<IMG />
<SPAN />
</DIV>
//$div is a new div node element,
//The code I'm trying, is the following:
$maindiv->item(0)->parentNode->insertBefore( $div, $maindiv->item(0) );
//Obs: This code asctually places the new node, before maindiv
//$maindiv object(DOMNodeList)[5], from getElementsByTagName( 'div' )
//echo $maindiv->item(0)->nodeName gives 'div'
//echo $maindiv->item(0)->nodeValue gives the correct data on that div 'some random text'
//this code actuall places the new $div element, before <DIV id="maindiv>
http://pastie.org/1070788
Any kind of help is appreciated, thanks!
If maindiv is from getElementsByTagName(), then $maindiv->item(0) is the div with id=maindiv. So your code is working correctly because you're asking it to place the new div before maindiv.
To make it work like you want, you need to get the children of maindiv:
$dom = new DOMDocument();
$dom->load($yoursrc);
$maindiv = $dom->getElementById('maindiv');
$items = $maindiv->getElementsByTagName('DIV');
$items->item(0)->parentNode->insertBefore($div, $items->item(0));
Note that if you don't have a DTD, PHP doesn't return anything with getElementsById. For getElementsById to work, you need to have a DTD or specify which attributes are IDs:
foreach ($dom->getElementsByTagName('DIV') as $node) {
$node->setIdAttribute('id', true);
}
From scratch, this seems to work too:
$str = '<DIV id="maindiv">Here is text<DIV id="child1"><IMG /><SPAN /></DIV><DIV id="child2"><IMG /><SPAN /></DIV></DIV>';
$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName("div");
$divs->item(0)->appendChild($doc->createElement("div", "here is some content"));
print_r($divs->item(0)->nodeValue);
Found a solution:
$child = $maindiv->item(0);
$child->insertBefore( $div, $child->firstChild );
I don't know how much sense this makes, but well, it worked.

php: how can I work with html as xml ? how do i find specific nodes and get the text inside these nodes?

Lets say i have the following web page:
<html>
<body>
<div class="transform">
<span>1</span>
</div>
<div class="transform">
<span>2</span>
</div>
<div class="transform">
<span>3</span>
</div>
</body>
</html>
I would like to find all div elements that contain the class transform and to fetch the text in each div element ?
I know I can do that easily with regular expressions, but i would like to know how can I do that without regular expressions, but parsing the xml and finding the required nodes i need.
update
i know that in this example i can just iterate through all the divs. but this is an example just to illustrate what i need.
in this example i need to query for divs that contain the attribute class=transform
thanks!
Could use SimpleXML - see the example below:
$string = "<?xml version='1.0'?>
<html>
<body>
<div class='transform'>
<span>1</span>
</div>
<div>
<span>2</span>
</div>
<div class='transform'>
<span>3</span>
</div>
</body>
</html>";
$xml = simplexml_load_string($string);
$result = $xml->xpath("//div[#class = 'transform']");
foreach($result as $node) {
echo "span " . $node->span . "<br />";
}
Updated it with xpath...
You can use xpath to address the items. For that particular query, you'd use:
div[contains(concat(" ",#class," "), concat(" ","transform"," "))]
Full PHP example:
<?php
$document = new DomDocument();
$document->loadHtml($html);
$xpath = new DomXPath($document);
foreach ($xpath->query('div[contains(concat(" ",#class," "), concat(" ","transform"," "))]') as $div) {
var_dump($div);
}
If you know CSS, here's a handy CSS-selector to XPath-expression mapping: http://plasmasturm.org/log/444/ -- You can find the above example listed there, as well as other common queries.
If you use it a lot, you might find my csslib library handy. It offers a wrapper csslib_DomCssQuery, which is similar to DomXPath, but using CSS-selectors instead.
ok what i wanted can be easily achieved using php xpath:
example:
http://ditio.net/2008/12/01/php-xpath-tutorial-advanced-xml-part-1/

Categories