Remove the last child in DomXPath - php

Current structure looks like
<div class="...">
//more html
<div class="message-right">
<div class="item1"> //more html </div>
<div class="item2"> //more html </div>
<div class="item3"> //more html </div>
</div>
//more html
</div>
I want to be able to get the html content inside the class 'message-right', and remove the last child. (In this case 'item3')
I should be left with the html from 'item1' and 'item2'
So far I have
$dom = new DomDocument();
#$dom->loadHTML($html);
$finder = new DomXPath($dom);
$classname = "message-right";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
//this is where I am stuck, need to remove the last child, 'item3'
//this returns the html from 'message-right'
$html = $nodes->item(0)->c14n()

Fetch the last child element (XPath will make that easier) and delete it.
$delete = $finder->query("./*[last()]", $nodes->item(0))->item(0);
$delete->parentNode->removeChild($delete);
Depending on what you really need you might want to fetch (and subsequently delete) that element directly using
//*[contains(#class, '$classname')]/*[last()]

Related

Create a Container Div with DOM/XPATH

I just created a new DOM XPATH OBJECT.
And, after a couple operations I've storage my result with SaveHtml
$String[] = $dom->saveHTML();
And then, Ive just put the content inside a file.
file_put_contents($filename, $string);
The Html Structure is someting like this.
<div if="rand11">
</div>
<div if="rand24">
</div>
<div if="rand51">
</div>
There is some methods in order to create new divs. You can use ->createElement. Also, you can place this new element with ->parentNode->insertBefore but its no possible to create a container div, like this.
<div if="container-div">
<div if="rand11">
</div>
<div if="rand24">
</div>
<div if="rand51">
</div> </div>
I tried multiples ways to do it without success.
So, I have a couple questions:
1. It is possible to create a Container Div modifying the Dom directly?
2. It is possible to adding a new Html element of an Array that contains $dom->saveHTML(); data?
Sure it is. Most of it is actually uses the same methods. For example appendChild()/insertBefore() are not just used for new nodes, they can move existing nodes.
$html = <<<'HTML'
<div if="rand11"></div>
<div if="rand24"></div>
<div if="rand51"></div>
HTML;
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
// fetch the nodes that should be put inside the container
$nodes = $xpath->evaluate('//div[starts-with(#if, "rand")]');
// validate that here is at least one node
if ($first = $nodes[0]) {
// create the container and insert it before the first
$container = $first->parentNode->insertBefore(
$document->createElement('div'), $first
);
$container->setAttribute('class', 'container-div');
// move all the fetched nodes into the container
foreach($nodes as $node) {
$container->appendChild($node);
}
// output formatted
$document->formatOutput = TRUE;
echo $document->saveHTML($container);
}
Output:
<div class="container-div">
<div if="rand11"></div>
<div if="rand24"></div>
<div if="rand51"></div>
</div>

Get div from external page, then delete an another div from it

I need a little help, with getting content from external webpages.
I need to get a div, and then delete another div from inside it. This is my code, can someone help me?
This is the relevant portion of my XML code:
<html>
...
<body class="domain-4 page-product-detail" > ...
<div id="informacio" class="htab-fragment"> <!-- must select this -->
<h2 class="description-heading htab-name">Utazás leírása</h2>
<div class="htab-mobile tab-content">
<p class="tab-annot">* Hivatalos ismertető</p>
<div id="trip-detail-question"> <!-- must delete this -->
<form> ...</form>
</div>
<h3>USP</h3><p>Nagy, jól szervezett és családbarát ...</p>
<div class="message warning-message">
<p>Az árak már minden aktuális kedvezményt tartalmaznak!</p>
<span class="ico"></span>
</div>
</div>
</div>
...
</body>
</html>
I need to get the div with id="informacio", and after that I need to delete the div id="trip-detail-question" from it including the form it contains.
This is my code, but its not working correctly :(.
function get_content($url){
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$query = "//div[#id='informacio']";
$entries = $xpath->query($query)->item(0);
foreach($xpath->query("div[#id='trip-detail-question']", $entries) as $node)
$node->parentNode->removeChild($node);
$var = $doc->saveXML($entries);
return $var;
}
Your second XPath expression is incorrect. It tries to select a div in the context of the div you selected previously as its child node. You are trying to select:
//div[#id='informacio']/div[#id='trip-detail-question']
and that node does not exist. You want this node:
//div[#id='informacio']/div/div[#id='trip-detail-question']
which you can also select like this (allowing any element, not just div):
//div[#id='informacio']/*/div[#id='trip-detail-question']
or (allowing more than one nesting levels)
//div[#id='informacio']//div[#id='trip-detail-question']
In the context of the first div, the correct XPath expression would be:
.//div[#id='trip-detail-question']
If you change it in your code, it should work:
foreach($xpath->query(".//div[#id='trip-detail-question']", $entries) as $node)
$node->parentNode->removeChild($node);

For each div tag, take its contents

I'm trying to loop through the code of a HTML page and reformat it's contents. It has a few div's within div's, which I want to extract. I've tried various forms of explode, regex and DOM, but can't find exactly how to do this.
Example:
<div class="section1">
<div class="section2">number 1</div>
</div>
<div class="section1">
<div class="section2">number 2</div>
</div>
The result I'm looking for is basically, for each section 1, get contents from section 2, so the output would be:
number 1, number 2
Does anyone know how to do something like this?
Should be pretty easy with DOMXPath:
$doc = new DOMDocument;
$doc->loadHTML(/*...*/); // load the HTML here
$xpath = new DOMXPath($doc);
$result = $xpath->query("//div[#class='section1']/div[#class='section2']/text()");
foreach ($result as $item) {
echo "$item->wholeText\n";
}
See it in action.
This is a jQuery solution, not PHP:
$('.section1).each(function() {
return $(this).html();
});

PHP preg_match_all - group without returning a match

How would I get content from HTML between h3 tags inside an element that has class pricebox? For example, the following string fragment
<!-- snip a lot of other html content -->
<div class="pricebox">
<div class="misc_info">Some misc info</div>
<h3>599.99</h3>
</div>
<!-- snip a lot of other html content -->
The catch is 599.99 has to be the first match returned, that is if the function call is
preg_match_all($regex,$string,$matches)
the 599.99 has to be in $matches[0][1] (because I use the same script to get numbers from dissimilar looking strings with different $regex - the script looks for the first match).
Try using XPath; definitely NOT RegEx.
Code :
$html = new DOMDocument();
#$html->loadHtmlFile('http://www.path.to/your_html_file_html');
$xpath = new DOMXPath( $html );
$nodes = $xpath->query("//div[#class='pricebox']/h3");
foreach ($nodes as $node)
{
echo $node->nodeValue."";
}

PHP DOMDocument: insertBefore, how to make it work?

I would like to place a new node element, before a given element. I'm using insertBefore for that, without success!
Here's the code,
<DIV id="maindiv">
<!-- I would like to place the new element here -->
<DIV id="child1">
<IMG />
<SPAN />
</DIV>
<DIV id="child2">
<IMG />
<SPAN />
</DIV>
//$div is a new div node element,
//The code I'm trying, is the following:
$maindiv->item(0)->parentNode->insertBefore( $div, $maindiv->item(0) );
//Obs: This code asctually places the new node, before maindiv
//$maindiv object(DOMNodeList)[5], from getElementsByTagName( 'div' )
//echo $maindiv->item(0)->nodeName gives 'div'
//echo $maindiv->item(0)->nodeValue gives the correct data on that div 'some random text'
//this code actuall places the new $div element, before <DIV id="maindiv>
http://pastie.org/1070788
Any kind of help is appreciated, thanks!
If maindiv is from getElementsByTagName(), then $maindiv->item(0) is the div with id=maindiv. So your code is working correctly because you're asking it to place the new div before maindiv.
To make it work like you want, you need to get the children of maindiv:
$dom = new DOMDocument();
$dom->load($yoursrc);
$maindiv = $dom->getElementById('maindiv');
$items = $maindiv->getElementsByTagName('DIV');
$items->item(0)->parentNode->insertBefore($div, $items->item(0));
Note that if you don't have a DTD, PHP doesn't return anything with getElementsById. For getElementsById to work, you need to have a DTD or specify which attributes are IDs:
foreach ($dom->getElementsByTagName('DIV') as $node) {
$node->setIdAttribute('id', true);
}
From scratch, this seems to work too:
$str = '<DIV id="maindiv">Here is text<DIV id="child1"><IMG /><SPAN /></DIV><DIV id="child2"><IMG /><SPAN /></DIV></DIV>';
$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName("div");
$divs->item(0)->appendChild($doc->createElement("div", "here is some content"));
print_r($divs->item(0)->nodeValue);
Found a solution:
$child = $maindiv->item(0);
$child->insertBefore( $div, $child->firstChild );
I don't know how much sense this makes, but well, it worked.

Categories