I am currently learning different ways to iterate through the xml document tags using the
php DOMDocument object, I understand the foreach loop for iterating through the tags, but the $element->item(0)->childNodes->item(0)->nodeValue is a bit unclear to me could somebody explain to me in detail? Thank you.
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load('StudentData.xml');
$studentRoot = $xmlDoc->getElementsByTagName('Student');
for ($i = 0; $i < ($studentRoot->length); $i++) {
$firstNameTags = $studentRoot->item($i)->getElementsByTagName('FirstName');
echo $firstNameTags->item(0)->childNodes->item(0)->nodeValue.' <br />';
}
/* so much easier and clear to understand! */
foreach($studentRoot as $node) {
/* For every <student> Tag as a separate node,
step into it's child node, and for each child,
echo the text content inside */
foreach($node->childNodes as $child) {
echo $child->textContent.'<br />';
}
}
?>
$elements->item(0)->childNodes->item(0)->nodeValue
First:
$elements
The current elements$ as parsed and referenced. In the code example, that would be:
$firstNameTags = $studentRoot->item($i)->getElementsByTagName('FirstName');
$firstNameTags->...
Next:
->item(0)
Get a reference to the first of the $elements item in the node list. Since this is zero-indexed, ->item(0) would get the first node in the list by index.
->childNodes
Get a list of the child nodes to that first $elements node referenced by ->item(0) above. As there is no (), this is a (read only) property of the DOMNodeList.
->item(0)
Again, get the first node in the list of child nodes by index.
->nodeValue
The value of the node itself.
If the form of the state alone:
$obj->method()->method()->prop
Confuses you, look into method chaining, which is what this uses to put all of those method calls together.
$ Note, you left off the s, but that indicates there's one or more possible by convention. So $element would be zero or one element reference, $elements might be zero, one or more in a collection of $element.
Related
I'm trying to modify an XML document which contains some node that I can identify by name. For example, I might want to modify the <abc>some text</abc> node in a document (which I can identify by the tag name abc)
The problem I'm facing currently is that I don't know the exact structure of this document. I don't know what the root element is called and I don't know which children might contain this <abc> node.
I tried using SimpleXML<...> but this does not allow me to read arbitrary element children:
$xml = new SimpleXMLElement($xmlString);
foreach ($xml->children() as $child) {
// code here doesnt execute
}
I'm considering building my own XML parser which would have this simple functionality, but I cannot believe that simply iterating over all child nodes of a node (eventually recursively) is not something that is supported by PHP. Hopefully someone can tell me what I'm missing. Thanks!
Use DOMDocument
$dom = new DOMDocument();
#$dom->loadXML($xmlString);
foreach($dom->getElementsByTagName('item') as $item) {
if ($item->hasChildNodes()) {
foreach($item->childNodes as $i) {
YOUR CODE HERE
}
}
}
I found the solution moments after posting, after being stuck on it for a while..
SimpleXML<...> does not have these features, but the DOMDocument and associated classes do;
$dom = new DOMDocument();
$dom->loadXml($xmlString);
foreach($dom->childNodes as $child) {
if ($child->nodeName == "abc") {
$child->textContent = "modified text content";
}
}
Documentation for future reference, here: http://php.net/manual/en/book.dom.php
Thanks for your help.
I am using xml data using a url & because the xml is too long I just want to check condition to get particular node values only :-
Here is my code :-
<?php
$doc = new DOMDocument('1.0', 'utf-8');
$doc->load("https://retailapi.apparel21.com/RetailAPI/products?countrycode=au");
$xpath = new DOMXpath($doc);
/*foreach ($xpath->query("/Products/Product[Code='00122']") as $node)
{
echo $node->nodeValue;
echo "Hi<br>";
}*/
echo $xpath->query("/Products/Product[Code='00122']")->item(0)->nodeValue;
?>
As you can see that I already used foreach loop & successfully executed the condition but.....the thing is inside it, it prints whole data of that all the nodes of it's parent node.
Confused? :)
Ok no worries, just execute this url: https://retailapi.apparel21.com/RetailAPI/products?countrycode=au; please click on Proceed anyway button then wait for some time.
There are many Product tags...now I want the data of the following nodes :-
Id
Code
Name
Description
whose code=00122 that's the first product's data.
I applied foreach then it printed all node's data of that product. I applied simple single statement but then also it printed all node's data :(
And one more thing is can't it be done by simplexml_load_file function?
One more thing :- You can see I am loading url, so the thing is it will read the whole xml first. Can't we query in this itself so that it will only take only related product tags so the loading time can be reduced.
Can anyone please help?
You're nearly there. Replace DOMXpath::query() with DOMXpath::evaluate(). It allows to use Xpath expressions that return scalars like strings. Now the second argument of evaluate() (or query()) is the context, so you can iterate all nodes from one expression and fetch the details using xpath expressions depending on a context node:
$doc = new DOMDocument('1.0', 'utf-8');
$doc->load("https://retailapi.apparel21.com/RetailAPI/products?countrycode=au");
$xpath = new DOMXpath($doc);
$result = [];
foreach ($xpath->evaluate("/Products/Product[Code='00122']") as $node) {
$result[] = [
'id' => $xpath->evaluate('string(Id)', $node),
'code' => $xpath->evaluate('string(Code)', $node),
'name' => $xpath->evaluate('string(Name)', $node),
];
}
var_dump($result);
A call like $xpath->evaluate('Id', $node) would return a list with all Id element nodes that are children of $node. The Xpath function string() casts the first node in this list into a string and returns it. If the list is empty the result will be an empty string.
I want to extract the content of body of a html page along with the tagNames of its child. I have taken an example html like this:
<html>
<head></head>
<body>
<h1>This is H1 tag</h1>
<h2>This is H2 tag</h2>
<h3>This is H3 tag</h3>
</body>
</html>
I have implemented the php code like below and its working fine.
$d=new DOMDocument();
$d->loadHTMLFile('file.html');
$l=$d->childNodes->item(1)->childNodes->item(1)->childNodes;
for($i=0;$i<$l->length;$i++)
{
echo "<".$l->item($i)->nodeName.">".$l->item($i)->nodeValue."</".$l->item($i)->nodeName.">";
}
This code is working perfectly fine, but when I tried to do this using foreach loop instead of for loop, the nodeName property was returning '#text' with every actual nodeName.
Here is that code
$l=$d->childNodes->item(1)->childNodes->item(1)->childNodes;
foreach ($l as $li) {
echo $li->childNodes->item(0)->nodeName."<br/>";
}
Why so?
When I've had this problem it was fixed by doing the following.
$xmlDoc = new DOMDocument();
$xmlDoc->preserveWhiteSpace = false; // important!
You can trace out your $node->nodeType to see the difference. I get 3, 1, 3 even though there was only one node (child). Turn white space off and now I just get 1.
GL.
In DOM, everything is a 'node'. Not just the elements (tags); comments and text between the elements (even if it's just whitespaces or newlines, which seems to be the case in your example) are nodes, too. Since text nodes don't have an actual node name, it's substituted with #text to indicate it's a special kind of node.
Apparently, text nodes are left out when manually selecting child nodes with the item method, but included when iterating over the DOMNodeList. I'm not sure why the class behaves like this, someone else will have to answer that.
Beside nodeName and nodeValue, a DOMNode also has a nodeType property. By checking this property against certain constants you can determine the type of the node and thus filter out unwanted nodes.
I'm coming a little late to this but the best solution for me was different. The issue its that the TEXT node doesn't know it's name but his parent do so all you need to know it's ask his parent for the nodeValue to get the key.
$dom = new DOMDocument();
$dom->loadXML($stringXML);
$valorizador = $dom->getElementsByTagName("tagname");
foreach ($valorizador->item(0)->childNodes as $item) {
$childs = $item->childNodes;
$key = $item->nodeName;
foreach ($childs as $i) {
echo $key." => ".$i->nodeValue. "\n";
}
}
I am trying to retrieve the text from within links in the following HEREDOC.
$html = <<<EOT
<a class="details" href="/link.asp">$2,697.75</a>
<a class="details" href="/link.asp"><s>$150.00</s></a>
<a class="details" href="/link.asp"><font color="red" size="2"><b>Price: $125.00</b></font></a>
EOT;
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadhtml($html);
$xpath = new DOMXPath($dom);
$prices_nodeList = $xpath->query('//child::a[#class="details"]');
//$prices_nodeList = $xpath->query('//descendant::a[#class="details"]');
//$prices_nodeList = $xpath->query('//a[#class="details"]/descendant::text()');
foreach ($prices_nodeList as $price) {
$prices[] = $price->nodeValue;
}
echo("<p>prices</p>");
echo("<pre>");
print_r($prices);
echo("</pre>");
?>
The xpath query assigned to $prices_nodeList
$prices_nodeList = $xpath->query('//child::a[#class="details"]');
seems to do what I want, but I don't think I understand how it works. As far as I can understand, it says 'get all direct child elements of links with class "details".' However the text in the latter two links are not direct children, so I'm not sure why I don't have to use
$prices_nodeList = $xpath->query('//descendant::a[#class="details"]');
This (ie, the first commented-out value of $prices_nodeList) also retrieves all three values. I am wondering why they both work, and whether my query is actually the best way to do it. By contrast
$prices_nodeList = $xpath->query('//a[#class="details"]/descendant::text()');
works as well, but
$prices_nodeList = $xpath->query('//a[#class="details"]/child::text()');
only retrieves the first value ($2,697.75) and not the latter two (since the text is contained within elements).
As far as I can understand, it says 'get all direct child elements of links with class "details".'
No, it means get all links with a class "details" that are children of the current context nodes.
The context nodes are the ones selected by the previous step.
// is a shortcut for /descandant-or-self::node. From the specification:
// is short for /descendant-or-self::node()/. For example, //para is short for /descendant-or-self::node()/child::para and so will select any para element in the document (even a para element that is a document element will be selected by //para since the document element node is a child of the root node); div//para is short for div/descendant-or-self::node()/child::para and so will select all para descendants of div children.
/descendant-or-self::node() basically selects every node. Therefore there is no difference between looking at the child or descendant axis.
If a link is not the child of one node, it is surely the child of one of its descendants, which is selected by // as well.
In XPath, the pseudoperator // (short for descendant-or-self:: axis) is used to select all nodes of a certain type, wherever they are in the input tree. Then:
//child::a is the same as //a
and
//descendant::a is still equivalent to //a
You are always selecting all a nodes in the document, wherever they are.
While:
//a/descendant::text(), which is equal to //a//text(), means select all descendant text nodes of any a and it is different from
//a/child::text(), which equals to //a/text(), and means select all child text nodes of any a ($2,697.75 only is child of a, the other text nodes are descendant).
In XPath explicit axis descendant-or-self:: and child:: are rarely used and necessary. The first one is normally substituted by // . The second one is implicitly applied in / or // as /child:: or //child::.
I am trying to get the value (text) of a specific node from an xml document using php DOM classes but I cannot do it right because I get the text content of that node merged with its descendants.
Let's suppose that I need to get the trees from this document:
<?xml version="1.0"?>
<trees>
LarchRedwoodChestnutBirch
<trimmed>Larch</trimmed>
<trimmed>Redwood</trimmed>
</trees>
And I get:
LarchRedwoodChestnutBirchLarchRedwood
You can see that I cannot remove the substring LarchRedwood made by the trimmed trees from the whole text because I would get only ChestnutBirch and it is not what I need.
Any suggest? (Thanx)
I got it. This works:
function specificNodeValue($node, $implode = true) {
$value = array();
if ($node->childNodes) {
for ($i = 0; $i < $node->childNodes->length; $i++) {
if (!(#$node->childNodes->item($i)->tagName)) {
$value[] = $node->childNodes->item($i)->nodeValue;
}
}
}
return (is_string($implode) ? implode($implode, $value) : ($implode === true ? implode($value) : $value));
}
A given node is like a root, if you get no tagName when you parse its child nodes then it is itself, so the value of that child node it is its own value.
Inside a bad formed xml document a node could have many pieces of value, put them all into an array to get the whole value of the node.
Use the function above to get needed node value without subnode values merged within.
Parameters are:
$node (required) must be a DOMElement object
$implode (optional) if you want to get a string (true by default) or an array (false) made up by many pieces of value. (Set a string instead of a boolean value if you wish to implode the array using a "glue" string).
You can try this to remove the trimmed node
$doc = new DOMDocument('1.0', 'utf-8');
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
$trees = $doc->getElementsByTagName('trees')->item(0);
foreach ($xpath->query('/trees/*') as $node)
{
$trees->removeChild($node);
}
echo $trees->textContent;
echo $trees->nodeValue;
Use $node->nodeValue to get a node's text content. If you use $node->textContent, you get all text from the current node and all child nodes.
Ideally, the XML should be:
<?xml version="1.0"?>
<trees>
<tree>Larch</tree>
<tree>Redwood</tree>
<tree>Chestnut</tree>
<tree>Birch</tree>
</trees>
To split "LarchRedwoodChestnutBirch" into separate words (by capital letter), you'll need to use PHP's "PCRE" functions:
http://www.php.net/manual/en/book.pcre.php
'Hope that helps!