DOMElement does not seem to be removed from DOMNodeList - php

Plase, help!
Need to delete 1st element from DOMNodeList $myDivs, but actual removal dosen't occur - element remains.
$dom = new DOMDocument();
$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$myDivs = $xpath->query('//div[#data-name|data-price]');
usleep(1);
//Must REVERSE iterate DOMNodeList.
for ($i = $myDivs->length - 1; $i >= 0; $i--) {
//Deleting 1st element of (DOMNodeList) $myDivs, containing advertisement product
if ($i == 0) {
//Removing div element (DOMNode) from DOM? DOMNodeList? Nothing changes
$result = $myDivs->item($i)->parentNode->removeChild($myDivs->item($i));
}
//Adding formatted string of attributes of valid DOMElements (div)
$outputArr[] = printf('%1$s - %2$s, %3$s.<br>',
$myDivs->item($i)->getAttribute('data-name'),
$myDivs->item($i)->getAttribute('data-price'),
$myDivs->item($i)->getAttribute('data-currency'))
?? null;
}
for(){} reverse iterates through $myDivs, fetched by XPath and at last iteration ($i=0, element #0) DOMElement should be purged from everywhere (DOM and DOMNodeList), as it seems from php.net:
Keep in mind that DOMNodelists are "live" - changes to the document or node that the DOMNodelist was derived from will be reflected in the DOMNodelist.
You can modify, and even delete, nodes from a DOMNodeList if you iterate backwards
But it doesn't happen!
No errors occur, $result equals that exact element #0 (meaning removeChild() has done it's job correctly)
While debugging I get $myDivs->length=31 at line usleep(1);, and at line $outputArr[] =... I still have same length of $myDivs.
So, element #0 still gets appended to $ouputArr, while shouldn't...
Can't see what I'm missing...
P.S. of course, is such situation one can always just use continue to jump through iteration, but what about deleting?

You're not quoting the official documentation but remarks contributed by users. DOMNodeLists returned from XPath queries aren't "live". removeChild only removes the node from the original document, not from the DOMNodeList.

Related

Check if XML element is existing in loop

For a website i'm making i need to get data from an external XML file.
I load the data like this:
$doc = new DOMDocument();
$url = 'http://myurl/results/xml/12345';
if (!$doc->load($url))
{
echo json_encode(array('error'=> 'error'));
exit;
}
$xpath = new DOMXPath($doc);
$program_date = $xpath->query('//game/date');
Then i use a foreach loop to get all the data
if($program_date){
foreach($program_date as $node){
$programArray['program_date'][] = $node->nodeValue;
}
}
The problem i'm having is that sometimes a certain game doesn't have a date.
So when a game doesn't have a date, i just want it to put "-", instead of the date from the XML file. My problem is that i don't know how to check if a date is present in the data.
I used a lot of ways like isset, !isset, else, !empty, empty
$teamArray['program_kind'][] = "-";
but noting works...
Can someone help me with this problem?
Thanks in advance
You need to iterate the game elements, use them as a context and fetch the data with additional XPath expressions.
But one thing first. Use DOMXPath::evaluate(). DOMXPath::query() only supports location paths. It can only return a node list. But XPath expressions can return scalar values, too.
$xpath = new DOMXPath($doc);
$games = $xpath->evaluate('//game');
The result of //game will always be a DOMNodeList object. It can be an empty list, but you can directly iterate it. A condition like if ($games) will always be true.
foreach ($games as $game) {
Now that you have the game element node, you can use it as an context to fetch other data.
$date = $xpath->evaluate('string(date)', $game);
string() casts the first node of the location path into a string. If it can not match a node, it will return an empty string. Check normalize-space() if you want to remove whitespaces at the same time.
You can validate if the game element has a date node using count().
$hasDate = $xpath->evaluate('count(date) > 0', $game);
The result of this XPath expression is always a boolean.

PHP DOM: How to get child elements by tag name in an elegant manner?

I'm parsing some XML with PHP DOM extension in order to store the data in some other form. Quite unsurprisingly, when I parse an element I pretty often need to obtain all children elements of some name. There is the method DOMElement::getElementsByTagName($name), but it returns all descendants with that name, not just immediate children. There is also the property DOMNode::$childNodes but (1) it contains node list, not element list, and even if I managed to turn the list items into elements (2) I'd still need to check all of them for the name. Is there really no elegant solution to get only the children of some specific name or am I missing something in the documentation?
Some illustration:
<?php
DOMDocument();
$document->loadXML(<<<EndOfXML
<a>
<b>1</b>
<b>2</b>
<c>
<b>3</b>
<b>4</b>
</c>
</a>
EndOfXML
);
$bs = $document
->getElementsByTagName('a')
->item(0)
->getElementsByTagName('b');
foreach($bs as $b){
echo $b->nodeValue . "\n";
}
// Returns:
// 1
// 2
// 3
// 4
// I'd like to obtain only:
// 1
// 2
?>
simple iteration process
$parent = $p->parentNode;
foreach ( $parent->childNodes as $pp ) {
if ( $pp->nodeName == 'p' ) {
if ( strlen( $pp->nodeValue ) ) {
echo "{$pp->nodeValue}\n";
}
}
}
An elegant manner I can imagine would be using a FilterIterator that is suitable for the job. Exemplary one that is able to work on such a said DOMNodeList and (optionally) accepting a tagname to filter for as an exemplary DOMElementFilter from the Iterator Garden does:
$a = $doc->getElementsByTagName('a')->item(0);
$bs = new DOMElementFilter($a->childNodes, 'b');
foreach($bs as $b){
echo $b->nodeValue . "\n";
}
This will give the results you're looking for:
1
2
You can find DOMElementFilter in the Development branch now. It's perhaps worth to allow * for any tagname as it's possible with getElementsByTagName("*") as well. But that's just some commentary.
Hier is a working usage example online: https://eval.in/57170
My solution used in a production:
Finds a needle (node) in a haystack (DOM)
function getAttachableNodeByAttributeName(\DOMElement $parent = null, string $elementTagName = null, string $attributeName = null, string $attributeValue = null)
{
$returnNode = null;
$needleDOMNode = $parent->getElementsByTagName($elementTagName);
$length = $needleDOMNode->length;
//traverse through each existing given node object
for ($i = $length; --$i >= 0;) {
$needle = $needleDOMNode->item($i);
//only one DOM node and no attributes specified?
if (!$attributeName && !$attributeValue && 1 === $length) return $needle;
//multiple nodes and attributes are specified
elseif ($attributeName && $attributeValue && $needle->getAttribute($attributeName) === $attributeValue) return $needle;
}
return $returnNode;
}
Usage:
$countryNode = getAttachableNodeByAttributeName($countriesNode, 'country', 'iso', 'NL');
Returns DOM element from parent countries node by specified attribute iso using country ISO code 'NL', basically like a real search would do. Find a certain country by it's name in an array / object.
Another usage example:
$productNode = getAttachableNodeByAttributeName($products, 'partner-products');
Returns DOM node element containing only single (root) node, without searching by any attribute.
Note: for this you must make sure that root nodes are unique by elements' tag name, e.g. countries->country[ISO] - countries node here is unique and parent to all child nodes.

SimpleXML remove nodes

I've got a foreach loop that is only running once and it has me stumped.
1: I load an array of status values (either "request", "delete", or "purchased")
2: I then load an xml file and need to loop through the "code" nodes and update their status, BUT if the new code is "delete" I want to remove it before moving onto the next one
XML structure is....
<content>
.... lots of stuff
<codes>
<code date="xxx" status="request">xxxxx</code>
.. repeat ...
</codes>
</content>
and the php code is ...
$newstatus = $_POST['updates'];
$file = '../apps/templates/'.$folder.'/layout.xml';
$xml2 = simplexml_load_file($file);
foreach($xml2->codes->code as $code){
if($code['status'] == "delete") {
$dom=dom_import_simplexml($code);
$dom->parentNode->removeChild($dom);
}
}
$xml2->asXml($file);
I've temporarily removed the updating so I can debug the delete check.
This all works BUT it only removes the 1st delete and leaves all the other deletes even though it's a foreach loop??.
Any help greatly appreciated.
Deleting multiple times in the same iteration is unstable. E.g. if you remove the second element, the third becomes the second and so on.
You can prevent that by storing the elements to delete into an array first:
$elementsToRemove = array();
foreach ($xml2->codes->code as $code) {
if ($code['status'] == "delete") {
$elementsToRemove[] = $code;
}
}
And then you remove the element based on the array which is stable while you iterate over it:
foreach ($elementsToRemove as $code) {
unset($code[0]);
}
You could also put the if-condition into an xpath query which does return the array directly (see the duplicate question for an example) or by making use of iterator_to_array().
SimpleXML node lists are plain arrays of references, and like with any deleting of items while forward iterating through an array, the array position pointer can get mixed up because the expected next item has disappeared.
The simple way to remove a bunch of children in SimpleXML without using an extra array is to iterate in reverse (=decrementing the index), taking the looping in your example to:
// FOR EACH NODE IN REVERSE
$elements=$xml2->xpath('codes/code');
$count=count($elements);
for($j=$count-1;$j>=0;$j--){
// IF TO DELETE
$code=$elements[$j];
if($code['status']=="delete"){
// DELETE ELEMENT
$dom=dom_import_simplexml($code);
$dom->parentNode->removeChild($dom);
}
}
Of course, if your other processing requires forward iterating through the elements, then using an array is the best.

DOM removing selected child nodes

I have a dom element with html inside chat contains some html elements I'd like to remove, while still keeping some tags that are ok.
I try to iterate through child elements all child elements and delete those that need to be removed
foreach ($node->getElementsByTagName('*') as $element)
if ($element->nodeName != 'br')
$node->removeChild($element);
But this throws a Not Found Error exception which not being caught causes a fatal error.
How would I solve this problem ?
Use the following instead to remove the node:
$element->parentNode->removeChild($element);
getElementsByTagName('*') finds all descendent elements, not child elements. So some of the $element you want to remove are not children of $node, hence the failure.
I'm not 100% sure what your intention is here, but most likely you just want to remove certain immediate children. In this case, do the following:
$nodestoremove = array();
foreach ($node->childNodes as $n) {
if ($n->nodeType===XML_ELEMENT_NODE and $n->nodeName!=='br') {
$nodestoremove[] = $n;
}
}
foreach ($nodestoremove as $n) {
$node->removeChild($n);
}
unset($nodestoremove); // so nodes can be garbage-collected
echo $node->C14N(); // xml fragment after removal
Note that we make two passes: one to identify the nodes to delete, and a second pass to delete. This is because childNodes is an active list, so we can't iterate through it forwards as we delete. (Although we could iterate through it backwards.)

DOMXpath - Get href attribute and text value of an a element

So I have a HTML string like this:
<td class="name">
Some Name
</td>
<td class="name">
Some Name2
</td>
Using XPath I'm able to get value of href attribute using this Xpath query:
$domXpath = new \DOMXPath($this->domPage);
$hrefs = $domXpath->query("//td[#class='name']/a/#href");
foreach($hrefs as $href) {...}
And It's even easier to get a text value, like this:
// Xpath auto. strips any html tags so we are
// left with clean text value of a element
$domXpath = new \DOMXPath($this->domPage);
$names = $domXpath->query("//td[#class='name']/");
foreach($names as $name) {...}
Now I'm curious to know, how can I combine those two queries to get both values with only one query (If it's something like that even posible?).
Fetch
//td[#class='name']/a
and then pluck the text with nodeValue and the attribute with getAttribute('href').
Apart from that, you can combine Xpath queries with the Union Operator | so you can use
//td[#class='name']/a/#href|//td[#class='name']
as well.
To reduce the code to a single loop, try:
$anchors = $domXpath->query("//td[#class='name']/a");
foreach($anchors as $a)
{
print $a->nodeValue." - ".$a->getAttribute("href")."<br/>";
}
As per above :) Too slow ..
Simplest way, evaluate is for this task!
The simplest way to obtain a value is by evaluate() method:
$xp = new DOMXPath($dom);
$v = $xp->evaluate("string(/etc[1]/#stringValue)");
Note: important to limit XPath returns to 1 item (the first a in this case), and cast the value with string() or round(), etc.
So, in a set of multiple items, using your foreach code,
$names = $domXpath->query("//td[#class='name']/");
foreach($names as $contextNode) {
$text = $domXpath->evaluate("string(./a[1])",$contextNode);
$href = $domXpath->evaluate("string(./a[1]/#href)",$contextNode);
}
PS: this example is only for evaluate's illustration... When the information already exists at the node, use what offers best performance, as methods getAttribute(), saveXML(), etc. and properties as $nodeValue, $textContent, etc. supplied by DOMNode. See #Gordon's answer for this particular problem. The XPath subquery (at context) is good for complex cases — or symplify your code, avoiding to check hasChildNodes() + loop for $childNodes, etc. with no significative gain in performance.

Categories