PHP DOMDocument: Delete elements by class - php

I' trying to delete every node with a given class.
To find the elements I use:
$xpath = new DOMXPath($dom);
foreach( $xpath->query('//div[contains(attribute::class, "foo")]') as $e ) {
// Delete this node
}
But how can I delete the elements in this foreach-loop?
Edit: By the way: How can I check first if there is a element with the class "foo" in the DOM (before starting the loop)?
Update:
This is my HTML:
<div class="main">
<div class="delete_this" contenteditable="true">Target</div>
<div class="class1"></div>
<div class="content"><p>Anything</p></div>
</div>
This doesn't work for the example above:
$xpath = new DOMXPath($dom);
foreach( $xpath->query('//div[contains(attribute::class, "delete_this")]') as $e ) {
$e->parentNode->removeChild($e);
}

You need to use the removeChild() method of the parent element:
$xpath = new DOMXPath($dom);
foreach($xpath->query('//div[contains(attribute::class, "foo")]') as $e ) {
// Delete this node
$e->parentNode->removeChild($e);
}
Btw, about your second question, if there are no elements found, the loop won't iterate at all.
Here comes a working example:
$html = <<<EOF
<div class="main">
<div class="delete_this" contenteditable="true">Target</div>
<div class="class1"></div>
<div class="content"><p>Anything</p></div>
</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
foreach($selector->query('//div[contains(attribute::class, "delete_this")]') as $e ) {
$e->parentNode->removeChild($e);
}
echo $doc->saveHTML($doc->documentElement);

For the second part of the question, the result of the query has a length property which you can use to see if anything was matched:
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[contains(attribute::class, "foo")]');
printf('Removing %d nodes', $nodes->length);

This removes all divs with that class.
To actually remove all the elements by class use *:
$selector = new \DOMXPath( $doc );
foreach ( $selector->query( '//*[contains(attribute::class, "' . $class . '")]' ) as $e ) {
$e->parentNode->removeChild( $e );
}

Related

Why does not display the attribute html via xpath php

Why does not display the attribute html via xpath php
<?php
$content = '<div class="keep-me">Keep this div</div><div class="remove-me" id="test">Remove this div</div>';
$badClasses = array('');
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($content);
libxml_clear_errors();
$xPath = new DOMXpath($dom);
foreach($badClasses as $badClass){
$domNodeList = $xPath->query('//div[#class="remove-me"]/#id');
$domElemsToRemove = ''; // container of deleted elements
foreach ( $domNodeList as $domElement ) {
$domElemsToRemove .= $dom->saveHTML($domElement); // concat them
$domElement->parentNode->removeChild($domElement); // then remove
}
}
$content = $dom->saveHTML();
echo htmlentities($domElemsToRemove);
?>
Works - //div[#class="remove-me"] or //div[#class="remove-me"]/text()
Not working - //div[#class="remove-me"]/#id
Maybe there is a way easier
The XPath //div[#class="remove-me"]/#id is correct, but you need to just loop over the returned elements and add the nodeValue to a list of matching ID's...
$xPath = new DOMXpath($dom);
$domNodeList = $xPath->query('//div[#class="remove-me"]/#id');
$ids = []; // container of deleted elements
foreach ( $domNodeList as $domElement ) {
$ids[] = $domElement->nodeValue;
}
print_r($ids);
If the aim is to fetch the ID of any element with class "remove-me" as is how I interpret the question then perhaps you can try like this - untested btw...
.... other code before
$xp=new DOMXpath( $dom );
$col= $xp->query( '*[#class="remove-me"]' );
if( $col->length > 0 ){
foreach($col as $node){
$id=$node->hasAttribute('id') ? $node->getAttribute('id') : 'banana';
echo $id;
}
}
however looking at the code in the question suggests that you wish to delete nodes - in which case build an array of nodes ( nodelist ) and iterate through it from the end to the front - ie: backwards...

DOMXPath query check for div if it exists

I have this code:
$html = '<div class="container">A<div class="wrapper">B</div>C</div>'
$dom = new DOMDocument;
#$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$links = $xp->query('//div[contains(#class,"container")]');
I want to make the DOMXPath query select the <div> element with class = "container" but i want it only to select the <div class="wrapper"></div> when it exists. So i want it to select <div class="container"> when <div class="wrapper"> doesn't exist, but when it does i want it to only select <div class="wrapper">.
Thanks in advance.
As first, you can count all <div class="wrapper"> elements via:
$wrapper = $xp->query('//div[contains(#class,"wrapper")]')->length;
if this returns int(0) it means, that no element with wrapper class has been found. With these information, we can easily modify your code to something like this:
$html = '<div class="container">A<div class="wrapper">B</div>C</div>';
$dom = new DOMDocument;
#$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$wrapper = $xp->query('//div[contains(#class,"wrapper")]');
if($wrapper->length == 0) {
// wrapper class NOT FOUND, now we can select container class
$links = $xp->query('//div[contains(#class,"container")]');
}
else {
// 1 or MORE wrapper class FOUND, do something with your .wrapper class
}

How to find element in already parsed HTML data

Here I have a very simple code to grab all the 'div' elements with the classname 'info_block'. I am wondering how would I go about finding another element with the classname 'price' from within 'info_block' and display it instead of the whole 'info_block' element.
Main Goal: Find the price in each element with classname 'info_block'. but do inside the foreach, because I may need to find other elements.
<?php
$page = file_get_contents('example.com');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
//echo $dom->saveHTML($var1);
}
?>
There is a element in each of the 'info_block' with a classname 'price' and I would like to display only that element. Like so...
foreach ($div1 as $var1){
$dom2 = new DOMDocument();
$dom2->loadHTML($dom->saveHTML($var1));
$xpath2 = new DOMXPath($dom2);
$div2 = $xpath2->query('//div[#class="price"]');
$div2 = $div2->item(0);
echo $dom2->saveHTML($div2);
}
But instead of just giving me the price it returns the whole HTML for 'info_block' as it did before.
You could provide each <div class="info_block"> found and search for <div class="price">" by providing it in the second argument of ->query():
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
$div2 = $xpath->query('./div[#class="price"]', $var1);
// ^ each div
$div2 = $div2->item(0);
echo $dom->saveHTML($div2);
}
Note: You do not need to create another instance of DOM and DOMXpath.
This example is taken into context of this kind of HTML semantic:
<div class="info_block"> // each info block
<div class="price">1</div> // inside of it has price
</div>
<div class="info_block">
<div class="price">2</div>
</div>
You can combine queries in XPath to find all the desired elements in one go
$xpath->query('//div[#class="info_block"]|//div[#class="price"]');
You can specify dom elements for doing relative XPath queries. Its optional in xpath->query method
<?php
$page = file_get_contents('example.com');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
$div2 = $xpath2->query('//a[#class="price"]', $var1);
foreach ($div2 as $var2) {
echo $var2->nodeValue. "\n";
}
}
?>
For more you can see xpath documentation here
xpath query documentation

Iterate through elements with DOMDocument & DOMXPath

I am trying to iterate through every child element of the containing div:
$html = ' <div id="roothtml">
<h1>
Introduction</h1>
<p>text</p>
<h2>
text</h2>
<p>
test</p>
</div>';
And I have this PHP:
$dom = new DOMDocument();
$dom->loadHTML($html);
$dom->preserveWhitespace = false;
$xpath = new DOMXPath($dom);
$els = $xpath->query("/div");
print_r($els);
All I get though is DOMNodeList Object ( )
Having looked at the IBM tutorial I should be getting an array. What is it I am doing wrong?
Any help is appreciated.
You're using the wrong query string, you should be using //div.
Iterate over the list like this:
$els = $xpath->query("//div");
foreach( $els as $el) {
echo $el->textContent;
}

How to get nodes in first level using PHP DOMDocument?

I'm new to PHP DOM object and have a problem I can't find a solution. I have a DOMDocument with following HTML:
<div id="header">
</div>
<div id="content">
<div id="sidebar">
</div>
<div id="info">
</div>
</div>
<div id="footer">
</div>
I need to get all nodes that are on first level (header, content, footer). hasChildNodes() does not work, because first level node may not have children (header, footer).
For now my code looks like:
$dom = new DOMDocument();
$dom -> preserveWhiteSpace = false;
$dom -> loadHTML($html);
$childs = $dom -> getElementsByTagName('div');
But this gets me all div's. any advice?
You may have to go outside of DOMDocument - maybe convert to SimpleXML or DOMXpath
$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("/");
Here's how I grab the first level elements (in this case, the top level TD elements in a table row:
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadHTML( $tr_element );
$xpath = new DOMXPath( $doc );
$td = $xpath->query("//tr/td[1]")->item(0);
do{
if( $innerHTML = self::DOMinnerHTML( $td ) )
array_push( $arr, $innerHTML );
$td = $td->nextSibling;
} while( $td != null );
$arr now contains the top TD elements, but not nested table TDs which you would get from
$dom->getElementsByTagName( 'td' );
The DOMinnerHTML function is something I snagged somewhere to get the innerHTML of an element/node:
public static function DOMinnerHTML( $element, $deep=true )
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild( $tmp_dom->importNode( $child, $deep ) );
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}

Categories