PHP DOMDocument: insertBefore, how to make it work? - php

I would like to place a new node element, before a given element. I'm using insertBefore for that, without success!
Here's the code,
<DIV id="maindiv">
<!-- I would like to place the new element here -->
<DIV id="child1">
<IMG />
<SPAN />
</DIV>
<DIV id="child2">
<IMG />
<SPAN />
</DIV>
//$div is a new div node element,
//The code I'm trying, is the following:
$maindiv->item(0)->parentNode->insertBefore( $div, $maindiv->item(0) );
//Obs: This code asctually places the new node, before maindiv
//$maindiv object(DOMNodeList)[5], from getElementsByTagName( 'div' )
//echo $maindiv->item(0)->nodeName gives 'div'
//echo $maindiv->item(0)->nodeValue gives the correct data on that div 'some random text'
//this code actuall places the new $div element, before <DIV id="maindiv>
http://pastie.org/1070788
Any kind of help is appreciated, thanks!

If maindiv is from getElementsByTagName(), then $maindiv->item(0) is the div with id=maindiv. So your code is working correctly because you're asking it to place the new div before maindiv.
To make it work like you want, you need to get the children of maindiv:
$dom = new DOMDocument();
$dom->load($yoursrc);
$maindiv = $dom->getElementById('maindiv');
$items = $maindiv->getElementsByTagName('DIV');
$items->item(0)->parentNode->insertBefore($div, $items->item(0));
Note that if you don't have a DTD, PHP doesn't return anything with getElementsById. For getElementsById to work, you need to have a DTD or specify which attributes are IDs:
foreach ($dom->getElementsByTagName('DIV') as $node) {
$node->setIdAttribute('id', true);
}

From scratch, this seems to work too:
$str = '<DIV id="maindiv">Here is text<DIV id="child1"><IMG /><SPAN /></DIV><DIV id="child2"><IMG /><SPAN /></DIV></DIV>';
$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName("div");
$divs->item(0)->appendChild($doc->createElement("div", "here is some content"));
print_r($divs->item(0)->nodeValue);

Found a solution:
$child = $maindiv->item(0);
$child->insertBefore( $div, $child->firstChild );
I don't know how much sense this makes, but well, it worked.

Related

How to select 2nd element with same tag using dom xpath?

I have layout like this:
<div class="fly">
<img src="a.png" class="badge">
<img class="aye" data-original="b.png" width="130" height="253" />
<div class="to">
<h4>Fly To The Moon</h4>
<div class="clearfix">
<div class="the">
<h4>**Wow**</h4>
</div>
<div class="moon">
<h4>**Great**</h4>
</div>
</div>
</div>
</div>
First I get query from xpath :
$a = $xpath->query("//div[#class='fly']""); //to get all elements in class fly
foreach ($a as $p) {
$t = $p->getElementsByTagName('img');
echo ($t->item(0)->getAttributes('data-original'));
}
When I run the code, it will produced 0 result. After I trace I found that <img class="badge"> is processed first. I want to ask, how can I get data-original value from <img class="aye">and also get the value "Wow" and "Great" from <h4> tag?
Thank you,
Alernatively, you could use another xpath query on that to add on your current code.
To get the attribute, use ->getAttribute():
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$parent_div = $xpath->query("//div[#class='fly']"); //to get all elements in class fly
foreach($parent_div as $div) {
$aye = $xpath->query('./img[#class="aye"]', $div)->item(0)->getAttribute('data-original');
echo $aye . '<br/>'; // get the data-original
$others = $xpath->query('./div[#class="to"]/div[#class="clearfix"]', $div)->item(0);
foreach($xpath->query('./div/h4', $others) as $node) {
echo $node->nodeValue . '<br/>'; // echo the two h4 values
}
echo '<hr/>';
}
Sample Output
Thank you for your code!
I try the code but it fails, I don't know why. So, I change a bit of your code and it works!
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$parent_div = $xpath->query("//div[#class='fly']"); //to get all elements in class fly
foreach($parent_div as $div) {
$aye = $xpath->query('**descendant::**img[#class="aye"]', $div)->item(0)->getAttribute('data-original');
echo $aye . '<br/>'; // get the data-original
$others = $xpath->query('**descendant::**div[#class="to"]/div[#class="clearfix"]', $div)->item(0);
foreach($xpath->query('.//div/h4', $others) as $node) {
echo $node->nodeValue . '<br/>'; // echo the two h4 values
}
echo '<hr/>';
}
I have no idea what is the difference between ./ and descendant but my code works fine using descendant.
given the following XML:
<div class="fly">
<img src="a.png" class="badge">
<img class="aye" data-original="b.png" width="130" height="253" />
<div class="to">
<h4>Fly To The Moon</h4>
<div class="clearfix">
<div class="the">
<h4>**Wow**</h4>
</div>
<div class="moon">
<h4>**Great**</h4>
</div>
</div>
</div>
</div>
you asked:
how can I get data-original value from <img class="aye">and also get the value "Wow" and "Great" from <h4> tag?
With XPath you can obtain the values as string directly:
string(//div[#class='fly']/img/#data-original)
This is the string from the first data-original attribute of an img tag within all divs with class="fly".
string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])
string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])
These are the string values of first and second <h4> tag that is not followed on it's own level by another <h4> tag within all divs class="fly".
This looks a bit like standing in the way right now, but with iteration, those parts in front will not be needed any longer soon because the xpath then will be relative:
//div[#class='fly']
string(./img/#data-original)
string(.//h4[not(following-sibling::*//h4)][1])
string(.//h4[not(following-sibling::*//h4)][2])
To use xpath string(...) expressions in PHP you must use DOMXPath::evaluate() instead of DOMXPath::query(). This would then look like the following:
$aye = $xpath->evaluate("string(//div[#class='fly']/img/#data-original)");
$h4_1 = $xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])");
$h4_2 = $xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])");
A full example with iteration and output:
// all <div> tags with class="fly"
$divs = $xpath->evaluate("//div[#class='fly']");
foreach ($divs as $div) {
// the first data-original attribute of an <img> inside $div
echo $xpath->evaluate("string(./img/#data-original)", $div), "<br/>\n";
// all <h4> tags anywhere inside the $div
$h4s = $xpath->evaluate('.//h4[not(following-sibling::*//h4)]', $div);
foreach ($h4s as $h4) {
echo $h4->nodeValue, "<br/>\n";
}
}
As the example shows, you can use evaluate as well for node-lists, too. Obtaining the values from all <h4> tags it not with string() any longer as there could be more than just two I assume.
Online Demo including special string output (just exemplary):
echo <<<HTML
{$xpath->evaluate("string(//div[#class='fly']/img/#data-original)")}<br/>
{$xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])")}<br/>
{$xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])")}<br/>
<hr/>
HTML;

Get div from external page, then delete an another div from it

I need a little help, with getting content from external webpages.
I need to get a div, and then delete another div from inside it. This is my code, can someone help me?
This is the relevant portion of my XML code:
<html>
...
<body class="domain-4 page-product-detail" > ...
<div id="informacio" class="htab-fragment"> <!-- must select this -->
<h2 class="description-heading htab-name">Utazás leírása</h2>
<div class="htab-mobile tab-content">
<p class="tab-annot">* Hivatalos ismertető</p>
<div id="trip-detail-question"> <!-- must delete this -->
<form> ...</form>
</div>
<h3>USP</h3><p>Nagy, jól szervezett és családbarát ...</p>
<div class="message warning-message">
<p>Az árak már minden aktuális kedvezményt tartalmaznak!</p>
<span class="ico"></span>
</div>
</div>
</div>
...
</body>
</html>
I need to get the div with id="informacio", and after that I need to delete the div id="trip-detail-question" from it including the form it contains.
This is my code, but its not working correctly :(.
function get_content($url){
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$query = "//div[#id='informacio']";
$entries = $xpath->query($query)->item(0);
foreach($xpath->query("div[#id='trip-detail-question']", $entries) as $node)
$node->parentNode->removeChild($node);
$var = $doc->saveXML($entries);
return $var;
}
Your second XPath expression is incorrect. It tries to select a div in the context of the div you selected previously as its child node. You are trying to select:
//div[#id='informacio']/div[#id='trip-detail-question']
and that node does not exist. You want this node:
//div[#id='informacio']/div/div[#id='trip-detail-question']
which you can also select like this (allowing any element, not just div):
//div[#id='informacio']/*/div[#id='trip-detail-question']
or (allowing more than one nesting levels)
//div[#id='informacio']//div[#id='trip-detail-question']
In the context of the first div, the correct XPath expression would be:
.//div[#id='trip-detail-question']
If you change it in your code, it should work:
foreach($xpath->query(".//div[#id='trip-detail-question']", $entries) as $node)
$node->parentNode->removeChild($node);

Remove the last child in DomXPath

Current structure looks like
<div class="...">
//more html
<div class="message-right">
<div class="item1"> //more html </div>
<div class="item2"> //more html </div>
<div class="item3"> //more html </div>
</div>
//more html
</div>
I want to be able to get the html content inside the class 'message-right', and remove the last child. (In this case 'item3')
I should be left with the html from 'item1' and 'item2'
So far I have
$dom = new DomDocument();
#$dom->loadHTML($html);
$finder = new DomXPath($dom);
$classname = "message-right";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
//this is where I am stuck, need to remove the last child, 'item3'
//this returns the html from 'message-right'
$html = $nodes->item(0)->c14n()
Fetch the last child element (XPath will make that easier) and delete it.
$delete = $finder->query("./*[last()]", $nodes->item(0))->item(0);
$delete->parentNode->removeChild($delete);
Depending on what you really need you might want to fetch (and subsequently delete) that element directly using
//*[contains(#class, '$classname')]/*[last()]

Php regex replace elements

im trying to change html elements inside PHP.
first is to replace textarea with h1.
Thing needs to be replaced looks something like this:
<textarea class="head" id="hd_x">Random headline</textarea>
Im trying to change to this:
<h1 class="head" id="hd_x">Random headline</h1>
Random headline can be- Dog like cats, Cats dont like dogs.
X in id can be number- hd_1, hd_2 and so on( but i think it is no needed to be touched, so it can be ignored ).
Second is need to replace textarea with p. Original looks something like this:
<textarea class="text" id="txt_x">Random text</textarea>
Im trying to change to this:
<p class="text" id="txt_x">Random text</h1>
Random text and X here works same as on first one
If you can figure out what im trying to do and it is possible and short then tt would be nice if you help me to do only the H1 part. I think i can figure <p> (2nd) part out it.
I tryed to do it with str_replace but the problem is that then it is always replacing </textarea> with </h1> or with </p>
Thank you
My idea is is that i need 2 separate preg_replace. One of them recognizes this part:
<textarea class="head"
knows it needs to replace with :
<h1 class="head"
Thems skips over this part:
id="hd_x">Random headline
then it preg_replace recognizes again this one:
</textarea>
and replaces with:
</h1>
Trying to make it short. Finds by this(???? is part that should be ignored and left untouched):
<textarea class="head" ??????????????????</textarea>
and replaced with(????? is part that was untouched):
class="head" i think is needed cause preg_replace pattern figures out this way that it need to replace with h1 not with p.
You should not use RegEx to change HTML elements. The DOM recognizes the structure and xpath makes it easy to do what you want:
$html = <<<'HTML'
<html>
<body>
<textarea class="head" id="hd_x">Random headline</textarea>
<textarea class="text" id="hd_x">Random headline</textarea>
</body>
</html>
HTML;
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$names = array(
'head' => 'h1', 'text' => 'p'
);
$nodes = $xpath->evaluate('//textarea[#class="head" or #class="text"]');
foreach ($nodes as $node) {
// create the new node depending on the class attribute
$type = $node->getAttribute('class');
$newNode = $dom->createElement($names[$type]);
// fetch all attributes of the current node
$attributes = $xpath->evaluate('#*', $node);
// and append them to the new node
foreach ($attributes as $attribute) {
$newNode->appendChild($attribute);
}
// replace the current node with the new node
$node->parentNode->replaceChild($newNode, $node);
}
var_dump($dom->saveHtml());

How to scrape html contents of one div by id using php

The page on another of my domains which I'd like to scrape one div from contains:
<div id="thisone">
<p>Stuff</p>
</div>
<div id="notthisone">
<p>More stuff</p>
</div>
Using this php...
<?php
$page = file_get_contents('http://thisite.org/source.html');
$doc = new DOMDocument();
$doc->loadHTML($page);
foreach ($doc->getElementsByTagName('div') as $node) {
echo $doc->saveHtml($node), PHP_EOL;
}
?>
...gives me all divs on http://thisite.org/source.html, with html. However, I only want to pull through the div with an id of "thisone" but using:
foreach ($doc->getElementById('thisone') as $node) {
doesn't bring up anything.
$doc->getElementById('thisone');// returns a single element with id this one
Try $node=$doc->getElementById('thisone'); and then print $node
On a side note, you can use phpQuery for a jquery like syntext: pq("#thisone")
$doc->getElementById('thisone') returns a single DOMElement, not an array, so you can't iterate through it
just do:
$node = $doc->getElementById('thisone');
echo $doc->saveHtml($node), PHP_EOL;
Look at PHP manual http://php.net/manual/en/domdocument.getelementbyid.php
getElementByID returns an element or NULL. Not an array and therefore you can't iterate over it.
Instead do this
<?php
$page = file_get_contents('example.html');
$doc = new DOMDocument();
$doc->loadHTML($page);
$node = $doc->getElementById('thisone');
echo $doc->saveHtml($node), PHP_EOL;
?>
On running
php edit.php you get something like this
<div id="thisone">
<p>Stuff</p>
</div>

Categories