DOM removing selected child nodes - php

I have a dom element with html inside chat contains some html elements I'd like to remove, while still keeping some tags that are ok.
I try to iterate through child elements all child elements and delete those that need to be removed
foreach ($node->getElementsByTagName('*') as $element)
if ($element->nodeName != 'br')
$node->removeChild($element);
But this throws a Not Found Error exception which not being caught causes a fatal error.
How would I solve this problem ?

Use the following instead to remove the node:
$element->parentNode->removeChild($element);

getElementsByTagName('*') finds all descendent elements, not child elements. So some of the $element you want to remove are not children of $node, hence the failure.
I'm not 100% sure what your intention is here, but most likely you just want to remove certain immediate children. In this case, do the following:
$nodestoremove = array();
foreach ($node->childNodes as $n) {
if ($n->nodeType===XML_ELEMENT_NODE and $n->nodeName!=='br') {
$nodestoremove[] = $n;
}
}
foreach ($nodestoremove as $n) {
$node->removeChild($n);
}
unset($nodestoremove); // so nodes can be garbage-collected
echo $node->C14N(); // xml fragment after removal
Note that we make two passes: one to identify the nodes to delete, and a second pass to delete. This is because childNodes is an active list, so we can't iterate through it forwards as we delete. (Although we could iterate through it backwards.)

Related

DOMElement does not seem to be removed from DOMNodeList

Plase, help!
Need to delete 1st element from DOMNodeList $myDivs, but actual removal dosen't occur - element remains.
$dom = new DOMDocument();
$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$myDivs = $xpath->query('//div[#data-name|data-price]');
usleep(1);
//Must REVERSE iterate DOMNodeList.
for ($i = $myDivs->length - 1; $i >= 0; $i--) {
//Deleting 1st element of (DOMNodeList) $myDivs, containing advertisement product
if ($i == 0) {
//Removing div element (DOMNode) from DOM? DOMNodeList? Nothing changes
$result = $myDivs->item($i)->parentNode->removeChild($myDivs->item($i));
}
//Adding formatted string of attributes of valid DOMElements (div)
$outputArr[] = printf('%1$s - %2$s, %3$s.<br>',
$myDivs->item($i)->getAttribute('data-name'),
$myDivs->item($i)->getAttribute('data-price'),
$myDivs->item($i)->getAttribute('data-currency'))
?? null;
}
for(){} reverse iterates through $myDivs, fetched by XPath and at last iteration ($i=0, element #0) DOMElement should be purged from everywhere (DOM and DOMNodeList), as it seems from php.net:
Keep in mind that DOMNodelists are "live" - changes to the document or node that the DOMNodelist was derived from will be reflected in the DOMNodelist.
You can modify, and even delete, nodes from a DOMNodeList if you iterate backwards
But it doesn't happen!
No errors occur, $result equals that exact element #0 (meaning removeChild() has done it's job correctly)
While debugging I get $myDivs->length=31 at line usleep(1);, and at line $outputArr[] =... I still have same length of $myDivs.
So, element #0 still gets appended to $ouputArr, while shouldn't...
Can't see what I'm missing...
P.S. of course, is such situation one can always just use continue to jump through iteration, but what about deleting?
You're not quoting the official documentation but remarks contributed by users. DOMNodeLists returned from XPath queries aren't "live". removeChild only removes the node from the original document, not from the DOMNodeList.

How to verify if the nth element exist with DIDOM html parser

I am using DiDom html parser library. From it's documentation (https://github.com/Imangazaliev/DiDOM#verify-if-element-exists):
If you need to check if element exist and then get it:
if ($document->has('.post')) {
$elements = $document->find('.post');
// code
}
But what if i need to check existance of n-th element of array of elements with '.post' class, for example:
$elements = $document->find('.post')[1];
The code below doesn't work and throws errors:
if ($document->has('.post')[1]) {
$elements = $document->find('.post')[1];
// code
}
I found the solution. DiDOM has() method doesn't offers nth-child option. So i've used pseudo-classes selector nth-child(n) to check appearance of n-th element.
The code looks now:
if ($document->find('.post:nth-child(2)')) {
$elements = $document->find('.post:nth-child(2)'))[0]->text();
} else {
echo "there are no such item";
}

SimpleXML remove nodes

I've got a foreach loop that is only running once and it has me stumped.
1: I load an array of status values (either "request", "delete", or "purchased")
2: I then load an xml file and need to loop through the "code" nodes and update their status, BUT if the new code is "delete" I want to remove it before moving onto the next one
XML structure is....
<content>
.... lots of stuff
<codes>
<code date="xxx" status="request">xxxxx</code>
.. repeat ...
</codes>
</content>
and the php code is ...
$newstatus = $_POST['updates'];
$file = '../apps/templates/'.$folder.'/layout.xml';
$xml2 = simplexml_load_file($file);
foreach($xml2->codes->code as $code){
if($code['status'] == "delete") {
$dom=dom_import_simplexml($code);
$dom->parentNode->removeChild($dom);
}
}
$xml2->asXml($file);
I've temporarily removed the updating so I can debug the delete check.
This all works BUT it only removes the 1st delete and leaves all the other deletes even though it's a foreach loop??.
Any help greatly appreciated.
Deleting multiple times in the same iteration is unstable. E.g. if you remove the second element, the third becomes the second and so on.
You can prevent that by storing the elements to delete into an array first:
$elementsToRemove = array();
foreach ($xml2->codes->code as $code) {
if ($code['status'] == "delete") {
$elementsToRemove[] = $code;
}
}
And then you remove the element based on the array which is stable while you iterate over it:
foreach ($elementsToRemove as $code) {
unset($code[0]);
}
You could also put the if-condition into an xpath query which does return the array directly (see the duplicate question for an example) or by making use of iterator_to_array().
SimpleXML node lists are plain arrays of references, and like with any deleting of items while forward iterating through an array, the array position pointer can get mixed up because the expected next item has disappeared.
The simple way to remove a bunch of children in SimpleXML without using an extra array is to iterate in reverse (=decrementing the index), taking the looping in your example to:
// FOR EACH NODE IN REVERSE
$elements=$xml2->xpath('codes/code');
$count=count($elements);
for($j=$count-1;$j>=0;$j--){
// IF TO DELETE
$code=$elements[$j];
if($code['status']=="delete"){
// DELETE ELEMENT
$dom=dom_import_simplexml($code);
$dom->parentNode->removeChild($dom);
}
}
Of course, if your other processing requires forward iterating through the elements, then using an array is the best.

PHP - non-recursive var_dump?

When dealing with certain PHP objects, it's possible to do a var_dump() and PHP prints values to the screen that go on and on and on until the PHP memory limit is reached I assume. An example of this is dumping a Simple HTML DOM object. I assume that because you are able to traverse children and parents of objects, that doing var_dump() gives infinite results because it finds the parent of an object and then recursively finds it's children and then finds all those children's parents and finds those children, etc etc etc. It will just go on and on.
My question is, how can you avoid this and keep PHP from dumping recursively dumping out the same things over and over? Using the Simple HTML DOM parser example, if I have a DOM object that has no children and I var_dump() it, I'd like it to just dump the object and no start traversing up the DOM tree and dumping parents, grandparents, other children, etc.
Install XDebug extension in your development environment. It replaces var_dump with its own that only goes 3 members deep by default.
https://xdebug.org/docs/display
It will display items 4 levels deep as an ellipsis. You can change the depth with an ini setting.
All PHP functions: var_dump, var_export, and print_r do not track recursion / circular references.
Edit:
If you want to do it the hard way, you can write your own function
print_rr($thing, $level=0) {
if ($level == 4) { return; }
if (is_object($thing)) {
$vars = get_object_vars($thing);
}
if (is_array($thing)) {
$vars = $thing;
}
if (!$vars) {
print " $thing \n";
return;
}
foreach ($vars as $k=>$v) {
if (is_object($v)) return print_rr($v, $level++);
if (is_array($v)) return print_rr($v, $level++);
print "something like var_dump, var_export output\n";
}
}
Why don't you simply run a foreach loop on your object?
From the PHP docs:
The foreach construct simply gives an easy way to iterate over arrays.
foreach works only on arrays (and objects), and will issue an error
when you try to use it on a variable with a different data type or an
uninitialized variable.
I had this problem and didn't need to see inside the objects, just the object classnames, so I wrote a simple function to replace objects with their classnames before dumping the data:
function sanitizeDumpContent($content)
{
if (is_object($content)) return "OBJECT::".get_class($content);
if (!is_array($content)) return $content;
return array_map(function($node) {
return $this->sanitizeDumpContent($node);
}, $content);
}
Then, when you want to dump something, just do this:
var_dump(sanitizeDumpContent($recursive_content))

Choosing Last Node in SimpleXML

I want to find the attribute of the last node in the xml file.
the following code find's the attribute of the first node. Is there a way to find the last node ?
foreach ($xml->gig[0]->attributes() as $Id){
}
thanks
I'm not to familiar with PHP but you could try the following, using an XPath query:
foreach ($xml->xpath("//gig[last()]")[0]->attributes() as $Id){
}
To get to the last gig node, as Frank Bollack noted, we could use XPath.
foreach (current($xml->xpath('/*/gig[last()]'))->attributes() as $attr) {
}
Or a little more verbose but nicer:
$attrs = array();
$nodes = $xml->xpath('/*/gig[last()]');
if (is_array($nodes) && ! empty($nodes)) {
foreach ($nodes[0]->attributes() as $attr) {
$attrs[$attr->getName()] = (string) $attr;
}
}
var_dump($attrs);
It's true that you could use XPath to get the last node (be it a <gig/> node or otherwise) but you can also mirror the same technique you used for the first node. This way:
// first <gig/>
$xml->gig[0]
// last <gig/>
$xml->gig[count($xml->gig) - 1]
Edit: I've just realized, are you simply trying to get the #id attribute of the first and the last <gig/> node? In which case, forget about attributes() and use SimpleXML's notation instead: attributes are accessed as if they were array keys.
$first_id = $xml->gig[0]['id'];
$last_id = $xml->gig[count($xml->gig) - 1]['id'];
I think that this xpath expression should work
$xml->xpath('root/child[last()]');
That should retrieve the last child element that is a child of the root element.

Categories