PHP xpath query on XML with default namespace binding - php
I have one solution to the subject problem, but it’s a hack and I’m wondering if there’s a better way to do this.
Below is a sample XML file and a PHP CLI script that executes an xpath query given as an argument. For this test case, the command line is:
./xpeg "//MainType[#ID=123]"
What seems most strange is this line, without which my approach doesn’t work:
$result->loadXML($result->saveXML($result));
As far as I know, this simply re-parses the modified XML, and it seems to me that this shouldn’t be necessary.
Is there a better way to perform xpath queries on this XML in PHP?
XML (note the binding of the default namespace):
<?xml version="1.0" encoding="utf-8"?>
<MyRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/data http://www.example.com/data/MyRoot.xsd"
xmlns="http://www.example.com/data">
<MainType ID="192" comment="Bob's site">
<Price>$0.20</Price>
<TheUrl><![CDATA[http://www.example.com/path1/]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="123" comment="Test site">
<Price>$99.95</Price>
<TheUrl><![CDATA[http://www.example.com/path2]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="922" comment="Health Insurance">
<Price>$600.00</Price>
<TheUrl><![CDATA[http://www.example.com/eg/xyz.php]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="389" comment="Used Cars">
<Price>$5000.00</Price>
<TheUrl><![CDATA[http://www.example.com/tata.php]]></TheUrl>
<Validated>N</Validated>
</MainType>
</MyRoot>
PHP CLI Script:
#!/usr/bin/php-cli
<?php
$xml = file_get_contents("xpeg.xml");
$domdoc = new DOMDocument();
$domdoc->loadXML($xml);
// remove the default namespace binding
$e = $domdoc->documentElement;
$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");
// hack hack, cough cough, hack hack
$domdoc->loadXML($domdoc->saveXML($domdoc));
$xpath = new DOMXpath($domdoc);
$str = trim($argv[1]);
$result = $xpath->query($str);
if ($result !== FALSE) {
dump_dom_levels($result);
}
else {
echo "error\n";
}
// The following function isn't really part of the
// question. It simply provides a concise summary of
// the result.
function dump_dom_levels($node, $level = 0) {
$class = get_class($node);
if ($class == "DOMNodeList") {
echo "Level $level ($class): $node->length items\n";
foreach ($node as $child_node) {
dump_dom_levels($child_node, $level+1);
}
}
else {
$nChildren = 0;
foreach ($node->childNodes as $child_node) {
if ($child_node->hasChildNodes()) {
$nChildren++;
}
}
if ($nChildren) {
echo "Level $level ($class): $nChildren children\n";
}
foreach ($node->childNodes as $child_node) {
if ($child_node->hasChildNodes()) {
dump_dom_levels($child_node, $level+1);
}
}
}
}
?>
The solution is using the namespace, not getting rid of it.
$result = new DOMDocument();
$result->loadXML($xml);
$xpath = new DOMXpath($result);
$xpath->registerNamespace("x", trim($argv[2]));
$str = trim($argv[1]);
$result = $xpath->query($str);
And call it as this on the command line (note the x: in the XPath expression)
./xpeg "//x:MainType[#ID=123]" "http://www.example.com/data"
You can make this more shiny by
finding out default namespaces yourself (by looking at the namespace property of the document element)
supporting more than one namespace on the command line and register them all before $xpath->query()
supporting arguments in the form of xyz=http//namespace.uri/ to create custom namespace prefixes
Bottom line is: In XPath you can't query //foo when you really mean //namespace:foo. These are fundamentally different and therefore select different nodes. The fact that XML can have a default namespace defined (and thus can drop explicit namespace usage in the document) does not mean you can drop namespace usage in XPath.
Just out of curiosity, what happens if you remove this line?
$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");
That strikes me as the most likely to cause the need for your hack. You're basically removing the xmlns="http://www.example.com/data" part and then re-building the DOMDocument. Have you considered simply using string functions to remove that namespace?
$pieces = explode('xmlns="', $xml);
$xml = $pieces[0] . substr($pieces[1], strpos($pieces[1], '"') + 1);
Then continue on your way? It might even end up being faster.
Given the current state of the XPath language, I feel that the best answer is provided by Tomalek: to associate a prefix with the default namespace and to prefix all tag names. That’s the solution I intend to use in my current application.
When that’s not possible or practical, a better solution than my hack is to invoke a method that does the same thing as re-scanning (hopefully more efficiently): DOMDocument::normalizeDocument(). The method behaves “as if you saved and then loaded the document, putting the document in a ‘normal’ form.”
Also as a variant you may use a xpath mask:
//*[local-name(.) = 'MainType'][#ID='123']
Related
How to get iTunes-specific child nodes of RSS feeds?
I'm trying to process an RSS feed using PHP and there are some tags such as 'itunes:image' which I need to process. The code I'm using is below and for some reason these elements are not returning any value. The output is length is 0. How can I read these tags and get their attributes? $f = $_REQUEST['feed']; $feed = new DOMDocument(); $feed->load($f); $items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item'); foreach($items as $key => $item) { $title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue; $pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue; $description = $item->getElementsByTagName('description')->item(0)->textContent; // textContent $arrt = $item->getElementsByTagName('itunes:image'); print_r($arrt); }
getElementsByTagName is specified by DOM, and PHP is just following that. It doesn't consider namespaces. Instead, use getElementsByTagNameNS, which requires the full namespace URI (not the prefix). This appears to be http://www.itunes.com/dtds/podcast-1.0.dtd*. So: $img = $item->getElementsByTagNameNS('http://www.itunes.com/dtds/podcast-1.0.dtd', 'image'); // Set preemptive fallback, then set value if check passes urlImage = ''; if ($img) { $urlImage = $img->getAttribute('href'); } Or put the namespace in a constant. You might be able to get away with simply removing the prefix and getting all image tags of any namespace with getElementsByTagName. Make sure to check whether a given item has an itunes:image element at all (example now given); in the example podcast, some don't, and I suspect that was also giving you trouble. (If there's no href attribute, getAttribute will return either null or an empty string per the DOM spec without erroring out.) *In case you're wondering, there is no actual DTD file hosted at that location, and there hasn't been for about ten years.
<?php $rss_feed = simplexml_load_file("url link"); if(!empty($rss_feed)) { $i=0; foreach ($rss_feed->channel->item as $feed_item) { ?> <?php echo $rss_feed->children('itunes', true)->image->attributes()->href;?> <?php } ?>
How to compare similar XMLs with PHPUnit?
So let's say I want to compare two DOMDocument objects. They have the same content but order and formatting might be off. For example, first one outputs this XML: <responses> <response id="12"> <foo>bar</foo> <lorem>ipsum</lorem> <sit>dolor</sit> </response></responses> Other one outputs: <responses> <response id="12"> <lorem>ipsum</lorem><sit>dolor</sit> <foo>bar</foo> </response> </responses> As you can see, they contain the same XML structure but some elements might be in different order and formatting is completely random. If I do: $this->assertEquals(); The test will of course fail. I don't want to test just XML structure but also contents. Any ideas?
This seems to have solved the problem: https://phpunit.de/manual/current/en/appendixes.assertions.html#appendixes.assertions.assertXmlStringEqualsXmlString
Which version of PHPUnit is this? I'm pretty sure recent versions all support DomDocument comparisons. Short version: Use the $doc->preserveWhiteSpace setting to remove the whitespace, and then use $doc->C14N() to strip comments and get a string you can compare. OK, here's a script you can play with, note that the EOD; lines cannot have any trailing or leading whitespace. $x1 = <<<EOD <responses> <response id="12"> <foo>bar</foo> <lorem>ipsum</lorem> <sit>dolor</sit> <!--This is a comment --> </response></responses> EOD; $x2 = <<<EOD <responses> <response id="12"> <lorem>ipsum</lorem><sit>dolor</sit> <foo>bar</foo> <!--This is another comment --> </response> </responses> EOD; // The next block is part of the same file, I'm just making this formatting-break so that the StackOverflow syntax-highlighting system doesn't choke. $USE_C14N = true; // Try false, just to see the difference. $d1 = new DOMDocument(1.0); $d2 = new DOMDocument(1.0); $d1->preserveWhiteSpace = false; $d2->preserveWhiteSpace = false; $d1->formatOutput = false; // Only useful for "pretty" output with saveXML() $d2->formatOutput = false; // Only useful for "pretty" output with saveXML() $d1->loadXML($x1); // Must be done AFTER preserveWhiteSpace and formatOutput are set $d2->loadXML($x2); // Must be done AFTER preserveWhiteSpace and formatOutput are set if($USE_C14N){ $s1 = $d1->C14N(true, false); $s2 = $d2->C14N(true, false); } else { $s1 = $d1->saveXML(); $s2 = $d2->saveXML(); } echo $s1 . "\n"; echo $s2 . "\n"; Output with $USE_C14N=true; <responses><response id="12"><foo>bar</foo><lorem>ipsum</lorem><sit>dolor</sit></response></responses> <responses><response id="12"><lorem>ipsum</lorem><sit>dolor</sit><foo>bar</foo></response></responses> Output with $USE_C14N=false; <?xml version="1.0"?> <responses><response id="12"><foo>bar</foo><lorem>ipsum</lorem><sit>dolor</sit><!--This is a comment --></response></responses> <?xml version="1.0"?> <responses><response id="12"><lorem>ipsum</lorem><sit>dolor</sit><foo>bar</foo><!--This is another comment --></response></responses> Note that $doc->C14N() might be slower, but I think it seems likely that stripping out comments is desirable. Note that all of this also assumes that whitespace in your XML isn't important, and there are probably some use-cases where that assumption isn't right...
I suggest you turn the XML into DOMDocuments and then use assertEquals with those. It's already supported by PHPUnit - However that might not cover all your needs already. You can re-format the documents and re-load them as well, see PHP XML how to output nice format: $doc->preserveWhiteSpace = false; $doc->formatOutput = true; Another idea is to sort then the children by their tagname - no idea if that has been done before.
You can use PHPUnit's assertXmlFileEqualsXmlFile(), assertXmlStringEqualsXmlFile() and assertXmlStringEqualsXmlString() functions; yet, they do not give informations on what's different, they only let the test fail with Failed asserting that two DOM documents are equal. So you might want to use PHP's XMLDiff PECL extension, or write your own recursive comparison function. If time matters, I'd recommend to not use DOM but SimpleXML instead because of the simpler API.
I've been playing with some of the notions presented here and figured I might as well post my end result. One of the things I wanted to be able to do was to compare the results of two nodes or two documents. (technically, this one can compare either or so long as the first child of a similar document is being compared to another) Basically if I send in a DomDocument, it clones it using a $clone->loadXml($obj->saveXml) but if it's a node sent in, it does a $clone->importNode($obj); The order of the if's becomes important because DomDocument is also a instance of DomNode. /** * #param \DOMDocument|\DOMNode $n1 * #param \DOMDocument|\DOMNode $n2 * #return bool * #throws \Exception for invalid data */ function compareNode($n1, $n2) { $nd1 = new \DOMDocument('1.0', "UTF-8"); if ($n1 instanceof \DOMDocument) { $nd1 = $n1->cloneNode(true); $nd1->preserveWhiteSpace = false; $nd1->formatOutput = false; $nd1->loadXML($n1->saveXML()); } elseif ($n1 instanceof \DOMNode) { $nd1->preserveWhiteSpace = false; $nd1->formatOutput = false; $nd1->importNode($n1); } else { throw new \Exception(__METHOD__ . " node 1 is invalid"); } $nd2 = new \DOMDocument('1.0', "UTF-8"); if ($n2 instanceof \DOMDocument) { $nd2 = $n2->cloneNode(true); $nd2->preserveWhiteSpace = false; $nd2->formatOutput = false; $nd2->loadXML($n2->saveXML()); } elseif ($n1 instanceof \DOMNode) { $nd2->preserveWhiteSpace = false; $nd2->formatOutput = false; $nd2->importNode($n2); } else { throw new \Exception(__METHOD__ . " node 2 is invalid"); } return ($nd1->C14N(true, false) == $nd2->C14N(true, false)); }
Use the following assertion: $this->assertXmlStringEqualsXmlString($expected, $actual);
Parsing XML with PHP (simplexml)
Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this. The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?! $file = fopen("compress.zlib://$url", 'r'); $xmlstr = file_get_contents($url); $xml = new SimpleXMLElement($url,null,true); foreach($xml as $name) { echo "{$name->awCat}\r\n"; } Many, many thanks in advance, Chris PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer: First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all) $merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE); To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site): foreach ($merchantProductFeed->merchant->prod as $prod) { echo $prod->cat->awCat , PHP_EOL; } or you can use an XPath query to get at the wanted elements directly $xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE); foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) { echo $awCat, PHP_EOL; } Live Demo Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them. Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML. For additional examples see A simple program to CRUD node and node values of xml file and PHP Manual - SimpleXml Basic Examples
Try this... $url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/"; $zd = gzopen($url, "r"); $data = gzread($zd, 1000000); gzclose($zd); if ($data !== false) { $xml = simplexml_load_string($data); foreach ($xml->merchant->prod as $pr) { echo $pr->cat->awCat . "<br>"; } }
<?php $xmlstr = file_get_contents("compress.zlib://$url"); $xml = simplexml_load_string($xmlstr); // you can transverse the xml tree however you want foreach ($xml->merchant->prod as $line) { // $line->cat->awCat -> you can use this } more information here
Use print_r($xml) to see the structure of the parsed XML feed. Then it becomes obvious how you would traverse it: foreach ($xml->merchant->prod as $prod) { print $prod->pId; print $prod->text->name; print $prod->cat->awCat; # <-- which is what you wanted print $prod->price->buynow; }
$url = 'you url here'; $f = gzopen ($url, 'r'); $xml = new SimpleXMLElement (fread ($f, 1000000)); foreach($xml->xpath ('//prod') as $name) { echo (string) $name->cat->awCatId, "\r\n"; }
PHP SimpleXML get innerXML
I need to get the HTML contents of answer in this bit of XML: <qa> <question>Who are you?</question> <answer>Who who, <strong>who who</strong>, <em>me</em></answer> </qa> So I want to get the string "Who who, <strong>who who</strong>, <em>me</em>". If I have the answer as a SimpleXMLElement, I can call asXML() to get "<answer>Who who, <strong>who who</strong>, <em>me</em></answer>", but how to get the inner XML of an element without the element itself wrapped around it? I'd prefer ways that don't involve string functions, but if that's the only way, so be it.
function SimpleXMLElement_innerXML($xml) { $innerXML= ''; foreach (dom_import_simplexml($xml)->childNodes as $child) { $innerXML .= $child->ownerDocument->saveXML( $child ); } return $innerXML; };
This works (although it seems really lame): echo (string)$qa->answer;
To the best of my knowledge, there is not built-in way to get that. I'd recommend trying SimpleDOM, which is a PHP class extending SimpleXMLElement that offers convenience methods for most of the common problems. include 'SimpleDOM.php'; $qa = simpledom_load_string( '<qa> <question>Who are you?</question> <answer>Who who, <strong>who who</strong>, <em>me</em></answer> </qa>' ); echo $qa->answer->innerXML(); Otherwise, I see two ways of doing that. The first would be to convert your SimpleXMLElement to a DOMNode then loop over its childNodes to build the XML. The other would be to call asXML() then use string functions to remove the root node. Attention though, asXML() may sometimes return markup that is actually outside of the node it was called from, such as XML prolog or Processing Instructions.
most straightforward solution is to implement custom get innerXML with simple XML: function simplexml_innerXML($node) { $content=""; foreach($node->children() as $child) $content .= $child->asXml(); return $content; } In your code, replace $body_content = $el->asXml(); with $body_content = simplexml_innerXML($el); However, you could also switch to another API that offers distinction between innerXML (what you are looking for) and outerXML (what you get for now). Microsoft Dom libary offers this distinction but unfortunately PHP DOM doesn't. I found that PHP XMLReader API offers this distintion. See readInnerXML(). Though this API has quite a different approach to processing XML. Try it. Finally, I would stress that XML is not meant to extract data as subtrees but rather as value. That's why you running into trouble finding the right API. It would be more 'standard' to store HTML subtree as a value (and escape all tags) rather than XML subtree. Also beware that some HTML synthax are not always XML compatible ( i.e. vs , ). Anyway in practice, you approach is definitely more convenient for editing the xml file.
I would have extend the SimpleXmlElement class: class MyXmlElement extends SimpleXMLElement{ final public function innerXML(){ $tag = $this->getName(); $value = $this->__toString(); if('' === $value){ return null; } return preg_replace('!<'. $tag .'(?:[^>]*)>(.*)</'. $tag .'>!Ums', '$1', $this->asXml()); } } and then use it like this: echo $qa->answer->innerXML();
<?php function getInnerXml($xml_text) { //strip the first element //check if the strip tag is empty also $xml_text = trim($xml_text); $s1 = strpos($xml_text,">"); $s2 = trim(substr($xml_text,0,$s1)); //get the head with ">" and trim (note that string is indexed from 0) if ($s2[strlen($s2)-1]=="/") //tag is empty return ""; $s3 = strrpos($xml_text,"<"); //get last closing "<" return substr($xml_text,$s1+1,$s3-$s1-1); } var_dump(getInnerXml("<xml />")); var_dump(getInnerXml("<xml / >faf < / xml>")); var_dump(getInnerXml("<xml >< / xml>")); var_dump(getInnerXml("<xml>faf < / xml>")); var_dump(getInnerXml("<xml > faf < / xml>")); ?> After I search for a while, I got no satisfy solution. So I wrote my own function. This function will get exact the innerXml content (including white-space, of course). To use it, pass the result of the function asXML(), like this getInnerXml($e->asXML()). This function work for elements with many prefixes as well (as my case, as I could not find any current methods that do conversion on all child node of different prefixes). Output: string '' (length=0) string '' (length=0) string '' (length=0) string 'faf ' (length=4) string ' faf ' (length=6)
function get_inner_xml(SimpleXMLElement $SimpleXMLElement) { $element_name = $SimpleXMLElement->getName(); $inner_xml = $SimpleXMLElement->asXML(); $inner_xml = str_replace('<'.$element_name.'>', '', $inner_xml); $inner_xml = str_replace('</'.$element_name.'>', '', $inner_xml); $inner_xml = trim($inner_xml); return $inner_xml; }
If you don't want to strip CDATA section, comment out lines 6-8. function innerXML($i){ $text=$i->asXML(); $sp=strpos($text,">"); $ep=strrpos($text,"<"); $text=trim(($sp!==false && $sp<=$ep)?substr($text,$sp+1,$ep-$sp-1):''); $sp=strpos($text,'<![CDATA['); $ep=strrpos($text,"]]>"); $text=trim(($sp==0 && $ep==strlen($text)-3)?substr($text,$sp+9,-3):$text); return($text); }
You can just use this function :) function innerXML( $node ) { $name = $node->getName(); return preg_replace( '/((<'.$name.'[^>]*>)|(<\/'.$name.'>))/UD', "", $node->asXML() ); }
Here is a very fast solution i created: function InnerHTML($Text) { return SubStr($Text, ($PosStart = strpos($Text,'>')+1), strpos($Text,'<',-1)-1-$PosStart); } echo InnerHTML($yourXML->qa->answer->asXML());
using regex you could do this preg_match(’/<answer(.*)?>(.*)?<\/answer>/’, $xml, $match); $result=$match[0]; print_r($result);
Not finding elements using getElementsByTagName() using DOMDocument
I'm trying to loop through multiple <LineItemInfo> products contained within a <LineItems> within XML I'm parsing to pull product Ids out and send emails and do other actions for each product. The problem is that it's not returning anything. I've verified that the XML data is valid and it does contain the necessary components. $itemListObject = $orderXML->getElementsByTagName('LineItemInfo'); var_dump($itemListObject->length); var_dump($itemListObject); The output of the var_dump is: int(0) object(DOMNodeList)#22 (0) { } This is my first time messing with this and it's taken me a couple of hours but I can't figure it out. Any advice would be awesome. EDIT: My XML looks like this... except with a lot more tags than just ProductId <LineItems> <LineItemInfo> <ProductId href='[URL_TO_PRODUCT_XML]'>149593</ProductId> </LineItemInfo> <LineItemInfo> <ProductId href='[URL_TO_PRODUCT_XML]'>149593</ProductId> </LineItemInfo> </LineItems> Executing the following code does NOT get me the ProductId $itemListObject = $orderXML->getElementsByTagName('LineItemInfo'); foreach ($itemListObject as $element) { $product = $element->getElementsByTagName('ProductId'); $productId = $product->item(0)->nodeValue; echo $productId.'-'; } EDIT #2 As a side note, calling $element->item(0)->nodeValue on $element instead of $product caused my script's execution to discontinue and not throwing any errors that were logged by the server. It's a pain to debug when you have to run a credit card to find out whether it's functioning or not.
DOMDocument stuff can be tricky to get a handle on, because functions such as print_r() and var_dump() don't necessarily perform the same as they would on normal arrays and objects (see this comment in the manual). You have to use various functions and properties of the document nodes to pull out the data. For instance, if you had the following XML: <LineItemInfo attr1="hi">This is a line item.</LineItemInfo> You could output various parts of that using: $itemListObjects = $orderXML->getElementsByTagName('LineItemInfo'); foreach($itemListObjects as $node) { echo $node->nodeValue; //echos "This is a line item." echo $node->attributes->getNamedItem('attr1')->nodeValue; //echos "hi" } If you had a nested structure, you can follow basically the same procedure using the childNodes property. For example, if you had this: <LineItemInfo attr1="hi"> <LineItem>Line 1</LineItem> <LineItem>Line 2</LineItem> </LineItemInfo> You might do something like this: $itemListObjects = $orderXML->getElementsByTagName('LineItemInfo'); foreach($itemListObjects as $node) { if ($node->hasChildNodes()) { foreach($node->childNodes as $c) { echo $c->nodeValue .","; } } } //you'll get output of "Line 1,Line 2," Hope that helps. EDIT for specific code and XML I ran the following code in a test script, and it seemed to work for me. Can you be more specific about what's not working? I used your code exactly, except for the first two lines that create the document. Are you using loadXML() over loadHTML()? Are there any errors? $orderXML = new DOMDocument(); $orderXML->loadXML(" <LineItems> <LineItemInfo> <ProductId href='[URL_TO_PRODUCT_XML]'>149593</ProductId> </LineItemInfo> <LineItemInfo> <ProductId href='[URL_TO_PRODUCT_XML]'>149593</ProductId> </LineItemInfo> </LineItems> "); $itemListObject = $orderXML->getElementsByTagName('LineItemInfo'); foreach ($itemListObject as $element) { $product = $element->getElementsByTagName('ProductId'); $productId = $product->item(0)->nodeValue; echo $productId.'-'; } //outputs "149593-149595-"
XML tags tend to be lower-camel-case (or just "camel-case"), i.e. "lineItemInfo", instead of "LineItemInfo" and XML is case-sensitive, so check for that.