PHP: "Couldn't fetch DOMElement. Node no longer exists in" using DOMXpath - php

I have some HTML that contains this:
<div class="test">
Outer
<div class="test">Inner 1</div>
<div class="test">Inner 2</div>
</div>
I'm doing str_replace() on the contents of these elements:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
foreach($xpath->query("//div[#class='test']") as $node) {
$node->nodeValue = str_replace(" ", "X", $node->nodeValue);
}
That should replace any spaces with an "X".
But it results in this error:
Warning: Couldn't fetch DOMElement. Node no longer exists in /path/to/my/file.php on line 63
It works if there's only one nested div:
<div class="test">
Outer
<div class="test">Inner 1</div>
</div>
Why does this happen, and how can I get it working?

Try changing
foreach($xpath->query("//div[#class='test']") as $node)
to
foreach($xpath->query('//div[#class="test"]//div[#class="test"]') as $node)
Edit per comments:
Assuming there's a space in the outer element (i.e., its "Outer 1:):
<?php
$string = <<<XML
<div class="test">
Outer 1
<div class="test">Inner 1</div>
<div class="test">Inner 2</div>
</div>
XML;
$dom = new DOMDocument();
$dom->loadHTML($string);
$xpath = new DOMXpath($dom);
foreach($xpath->query('//div[#class="test"]//text()') as $node) {
$nnode = trim($node->nodeValue);
echo $nnode = str_replace(" ", "X", $nnode);
}

Related

Extracting value of a node after a certain tag

Tying to extract the value "Output" between spans only if the title is "ABCD (1,2)" using php. Basically, find "Output (extract Output).
Here is the section of html:
<div class="wrap">
<strong title="ABCD (1,2)" class="name">ABCD (1,2):</strong>
<div id="test1">
<div class="testclass" id="test2">
<span>Output</span>
</div>
</div>
</div>
Here is the code I like to use:
<?php
$html = file_get_contents('test.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
//Some code needs to go here!
$tags = $dom->getElementsByTagName('strong');
?>
One way would be to just use xpath in this case, use a query that would select that desired element. Get that element that has that title and get the following div, and under it, go to the span:
Example (using the markup above):
$html = '
<div class="wrap">
<strong title="ABCD (1,2)" class="name">ABCD (1,2):</strong>
<div id="test1">
<div class="testclass" id="test2">
<span>Output</span>
</div>
</div>
</div>
';
$search_string = 'ABCD (1,2)';
$dom = new DOMDocument;
#$dom->loadHTML($html);
$query = "//strong[#title = '{$search_string}']/following-sibling::div/div/span";
$xpath = new DOMXpath($dom);
$result = $xpath->query($query);
if($result->length > 0) {
echo $result->item(0)->nodeValue;
}

How can I remove DOM element tags but leave their contents?

I have PHP code which removes all nodes that have at least one attribute. Here is my code:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
$xpath = new DOMXPath($dom);
$lines_to_be_removed = $xpath->query("//*[count(#*)>0]");
foreach ($lines_to_be_removed as $line) {
$line->parentNode->removeChild($line);
}
// just to check
echo $dom->saveHTML();
?>
As you see in the fiddle, this is the current output of code above:
<div>
<p>These line shall stay</p>
<p>But keep this</p>
</div>
While this is desired result:
<div>
<p>These line shall stay</p>
Remove this one
<p>But keep this</p>
and this
</div>
How can I do that?
Prior to removing the elements you want to pluck out their child nodes and tack them on behind it.
Example:
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
<div style="color: red">and <p>also</p> this</div>
<div style="color: red">and this <div style="color: red">too</div></div>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//*[#*]") as $node) {
$parent = $node->parentNode;
while ($node->hasChildNodes()) {
$parent->insertBefore($node->lastChild, $node->nextSibling);
}
$parent->removeChild($node);
}
echo $dom->saveHTML();
Outputs:
<div>
<p>These line shall stay</p>
Remove this one
<p>But keep this</p>
and this
and <p>also</p> this
and this too
</div>
https://3v4l.org/9qHRM
(I added some nested elements to demonstrate the safety of this approach.)
Couple of asides:
You don't need $dom->removeChild($dom->doctype) if you load with the additional LIBXML_HTML_NODEFDTD flag.
Your xpath expression can be simplified to //*[#*]
You could use replaceChild() with the text content of that node:
foreach ($lines_to_be_removed as $line) {
$line->parentNode->replaceChild($dom->createTextNode($line->textContent),$line);
}
// <div>
// <p>These line shall stay</p>
// Remove this one
// <p>But keep this</p>
// and this
// </div>
However, this may prove problematic with your // notation of your xpath selector and recursion.
Using a more manual approach to copy the child contents of the target nodes into the parent nodes.
$data = '
<div>
<div>1A</div>
<div class="foo">1B
<div>2C</div>
<div class="foo">2D</div>
<div>2E</div>
<div class="foo">2F
<div>3G</div>
<div class="foo">3H</div>
</div>
</div>
</div>';
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
SomeFunctionName( $dom->documentElement );
$html = $dom->saveHTML();
function SomeFunctionName( $parent )
{
$nodesToDelete = array();
if( $parent->hasChildNodes() )
{
foreach( $parent->childNodes as $node )
{
SomeFunctionName( $node );
if( $node->hasAttributes() and count( $node->attributes ) > 0 )
{
foreach( $node->childNodes as $childNode )
{
$node->parentNode->insertBefore( clone $childNode, $node );
}
$nodesToDelete[] = $node;
}
}
}
foreach( $nodesToDelete as $delete)
{
$delete->parentNode->removeChild( $delete );
}
}
// <div>
// <div>1A</div>
// 1B
// <div>2C</div>
// 2D
// <div>2E</div>
// 2F
// <div>3G</div>
// 3H
// <div>3I</div>
// 3J
// </div>
If you want to nest the child elements in a new "div" container swap out this porition of code
foreach( $parent->childNodes as $node )
{
SomeFunctionName( $node );
if( $node->hasAttributes() and count( $node->attributes ) > 0 )
{
$newNode = $node->ownerDocument->createElement('div');
foreach( $node->childNodes as $childNode )
{
$newNode->appendChild( clone $childNode );
}
$node->parentNode->insertBefore( $newNode, $node );
$nodesToDelete[] = $node;
}
}
// <div>
// <div>1A</div>
// <div>1B
// <div>2C</div>
// <div>2D</div>
// <div>2E</div>
// <div>2F
// <div>3G</div>
// <div>3H</div>
// <div>3I</div>
// <div>3J</div>
// </div>
// </div>
// </div>
This will remove all tags that have class and style attributes, so it's not a bullet proof:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
$xpath = new DOMXPath($dom);
$lines_to_be_removed = $xpath->query("//*[count(#class)>0 or count(#style)>0]");
foreach ($lines_to_be_removed as $line) {
$line->parentNode->removeChild($line);
}
// just to check
echo $dom->saveHTML();
?>
Note this line:
$lines_to_be_removed = $xpath->query("//*[count(#class)>0] or count(#style)>0]");

PHP Simple HTML DOM Parser, Remove attributes from the TAG without any specific unique input

my input
<div id='makeme' class='testme'>
<span id='whatspanID'>somthing</span>
<p class='ptagclass'></p>
</div>
My expected output
<div>
<span></span>
<p></p>
</div>
To remove the content inside the tag, i can use below snippet, but how to remove the attributes from the tag
$html = str_get_html($str);
foreach($html->find("text") as $ht) {
$ht->innertext = "";
}
$html->save();
Using DOM and Xpath allows you to select text and attribute nodes.
$html = <<<'HTML'
<div id='makeme' class='testme'>
<span id='whatspanID'>somthing</span>
<p class='ptagclass'></p>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$div = $xpath->evaluate('//div[#id="makeme"]')->item(0);
$nodes = $xpath->evaluate('.//text()|#*|.//*/#*', $div);
foreach ($nodes as $node) {
if ($node instanceof DOMAttr) {
$node->parentNode->removeAttributeNode($node);
} else {
$node->parentNode->removeChild($node);
}
}
echo $dom->saveHtml($div);
Output:
<div>
<span></span><p></p>
</div>

Extract HTML data using php/DOM

Im newbie with DOM so can someone tell me how to parse the following in php?
<div class="classname1">
<div class="description">some description</div>
<div class="classname2">
<div class="classname3">some text 1</div>
<div class="classname4">some text 2</div>
<div class="classname6">some text 4</div>
</div>
</div>
I would like to retrieve the text in the above class. There could be mode div before and after the html mentioned. I know I should create a dom
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[#class="classname1"]');
foreach ($divs as $div) {
//...
}
I dont know how to access the classnames data
you can use the getAttribute on the DOMElement
http://www.php.net/manual/en/domelement.getattribute.php

How can I get an element's serialised HTML with PHP's DOMDocument?

This is my example script:
$html = <<<HTML
<div class="main">
<div class="text">
Capture this text 1
</div>
<div class="date">
May 2010
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
<div class="date">
June 2010
</div>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="main"]');
foreach ($tags as $tag) {
print_r($tag->nodeValue."\n");
}
This will out put:
Capture this text 1 May 2010
Capture this text 2 June 2010
But I need it output:
<div class="text">
Capture this text 2
</div>
<div class="date">
June 2010
</div>
Or atleast be able to do something like this in my foreach loop:
$text = $tag->query('//div[#class="text"]')->nodeValue;
$date = $tag->query('//div[#class="date"]')->nodeValue;
Well, nodeValue will give you the node's value. You want what's commonly called outerHTML
echo $dom->saveXml($tag);
will output what you are looking for in an X(HT)ML compliant way.
As of PHP 5.3.6 you can also pass a node to saveHtml, which wasnt possible previously:
echo $dom->saveHtml($tag);
The latter will obey HTML4 syntax. Thanks to Artefacto for that.
try this
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="main"]');
foreach ($tags as $tag) {
$innerHTML = '';
$children = $tag->childNodes;
foreach ($children as $child) {
$tmp_doc = new DOMDocument();
$tmp_doc->appendChild($tmp_doc->importNode($child,true));
$innerHTML .= $tmp_doc->saveHTML();
}
var_dump(trim($innerHTML));
}
-Pascal MARTIN

Categories