PHP XML - Find out the path to a known value - php

Here is an XML bit:
[11] => SimpleXMLElement Object
(
[#attributes] => Array
(
[id] => 46e8f57e67db48b29d84dda77cf0ef51
[label] => Publications
)
[section] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[id] => 9a34d6b273914f18b2273e8de7c48fd6
[label] => Journal Articles
[recordId] => 1a5a5710b0e0468e92f9a2ced92906e3
)
I know the value "46e8f57e67db48b29d84dda77cf0ef51" but its location varies across files. Can I use XPath to find the path to this value? If not what could be used?
Latest trial that does not work:
$search = $xml->xpath("//text()=='047ec63e32fe450e943cb678339e8102'");
while(list( , $node) = each($search)) {
echo '047ec63e32fe450e943cb678339e8102',$node,"\n";
}

PHPs DOMNode objects have a function for that: DOMNode::getNodePath()
$xml = <<<'XML'
<root>
<child key="1">
<child key="2"/>
<child key="3"/>
</child>
</root>
XML;
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$nodes = $xpath->evaluate('//child');
foreach ($nodes as $node) {
var_dump($node->getNodePath());
}
Output:
string(11) "/root/child"
string(20) "/root/child/child[1]"
string(20) "/root/child/child[2]"
SimpleXML is a wrapper for DOM and here is a function that allows you to get the DOMNode for an SimpleXMLElement: dom_import_simplexml.
$xml = <<<'XML'
<root>
<child key="1">
<child key="2"/>
<child key="3"/>
</child>
</root>
XML;
$structure = simplexml_load_string($xml);
$elements = $structure->xpath('//child');
foreach ($elements as $element) {
$node = dom_import_simplexml($element);
var_dump($node->getNodePath());
}
To fetch an element by its attribute xpath can be used.
Select all nodes using the element joker anywhere in the document:
//*
Filter them by the id attribute:
//*[#id = "46e8f57e67db48b29d84dda77cf0ef51"]
$dom = new DOMDocument();
$dom->loadXml('<node id="46e8f57e67db48b29d84dda77cf0ef51"/>');
$xpath = new DOMXpath($dom);
foreach ($xpath->evaluate('//*[#id = "46e8f57e67db48b29d84dda77cf0ef51"]') as $node) {
var_dump(
$node->getNodePath()
);
}

Is this string always in the #id attribute? Then a valid and distinct path is always //*[#id='46e8f57e67db48b29d84dda77cf0ef51'], no matter where it is.
To construct a path to a given node, use $node->getNodePath() which will return an XPath expression for the current node. Also take this answer on constructing XPath expression using #id attributes, similar to like Firebug does, in account.
For SimpleXML you will have to do everything by hand. If you need to support attribute and other paths, you will have to add this, this code only supports element nodes.
$results = $xml->xpath("/highways/route[66]");
foreach($results as $result) {
$path = "";
while (true) {
// Is there an #id attribute? Shorten the path.
if ($id = $result['id']) {
$path = "//".$result->getName()."[#id='".(string) $id."']".$path;
break;
}
// Determine preceding and following elements, build a position predicate from it.
$preceding = $result->xpath("preceding-sibling::".$result->getName());
$following = $result->xpath("following-sibling::".$result->getName());
$predicate = (count($preceding) + count($following)) > 0 ? "[".(count($preceding)+1)."]" : "";
$path = "/".$result->getName().$predicate.$path;
// Is there a parent node? Then go on.
$result = $result->xpath("parent::*");
if (count($result) > 0) $result = $result[0];
else break;
}
echo $path."\n";
}

Related

How to count all nodes in DOMDocument

Using PHP 7.1 I want to count the number of nodes in the root of this string:
<p>Lorem</p>
<p>Ipsum</p>
<div>Dolores</div>
<b>Amet</b>
Using following PHP:
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
$root = $dom->documentElement;
$children = $root->childNodes;
var_dump($children)
Returns:
object(DOMNodeList)#4 (1) {
["length"]=>
int(1)
}
I don't understand why the string of HTML only returns as 1 node. Additionally, I am unable to iterate through the nodes.
After a nice conversation in chat with #bart we find a solution.
$content = "
<p>Lorem</p>
<p>Ipsum</p>
<div>Dolores</div>
<b>Amet</b>
";
$dom = new DOMDocument;
$dom->loadHTML($content);
$allElements = $dom->getElementsByTagName('*');
echo $allElements->length;
echo "<br />";
$node = array();
foreach($allElements as $element) {
if(array_key_exists($element->tagName, $node)) {
$node[$element->tagName] += 1;
} else {
$node[$element->tagName] = 1;
}
}
print_r($node);
ps: html and body tag are added and counted by default increasing the result by 2.
For the record ( and despite other answer being accepted, here is the correct way to list the child nodes :-). This includes the text nodes, which people forget are there!
<?php
$content = "
<p>Lorem</p>
<p>Ipsum</p>
<div>Dolores</div>
<b>Amet</b>
";
$dom = new DOMDocument;
$dom->loadHTML($content);
$nodes=[];
$bodyNodes = $dom->getElementsByTagName('body'); // returns DOMNodeList object
foreach($bodyNodes[0]->childNodes as $child) // assuming 1 <body> node
{
$nodes[]=$child->nodeName;
}
print_r($nodes);
Outputs this, illustrating the point...:
Array
(
[0] => p
[1] => #text
[2] => p
[3] => #text
[4] => div
[5] => #text
[6] => b
[7] => #text
)
Well I was already typing this answer up so I'll add it here anyway.
You have to iterate through the contents of a DOMNodeList object, it's not an array structure that can be seen with var_dump() and friends. When iterating with foreach you get an instance of a DOMNode object. The count of elements in the DOMNodeList is stored in the length property.
$content = "
<p>Lorem</p>
<p>Ipsum</p>
<div>Dolores</div>
<b>Amet</b>
";
$dom = new DomDocument();
$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$allElements = $dom->getElementsByTagName('*');
echo "We found $allElements->length elements\n";
foreach ($allElements as $element) {
echo "$element->tagName = $element->nodeValue\n";
}

How to sort content of an XML file loaded with SimpleXML?

There is an XML file with a content similar to the following:
<FMPDSORESULT xmlns="http://www.filemaker.com">
<ERRORCODE>0</ERRORCODE>
<DATABASE>My_Database</DATABASE>
<LAYOUT/>
<ROW MODID="1" RECORDID="1">
<Name>John</Name>
<Age>19</Age>
</ROW>
<ROW MODID="2" RECORDID="2">
<Name>Steve</Name>
<Age>25</Age>
</ROW>
<ROW MODID="3" RECORDID="3">
<Name>Adam</Name>
<Age>45</Age>
</ROW>
I tried to sort the ROW tags by the values of Name tags using array_multisort function:
$xml = simplexml_load_file( 'xml1.xml');
$xml2 = sort_xml( $xml );
print_r( $xml2 );
function sort_xml( $xml ) {
$sort_temp = array();
foreach ( $xml as $key => $node ) {
$sort_temp[ $key ] = (string) $node->Name;
}
array_multisort( $sort_temp, SORT_DESC, $xml );
return $xml;
}
But the code doesn't work as expected.
I would recommend using the DOM extension, as it is more flexible:
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$doc->load('xml1.xml');
// Get the root node
$root = $doc->getElementsByTagName('FMPDSORESULT');
if (!$root->length)
die('FMPDSORESULT node not found');
$root = $root[0];
// Pull the ROW tags from the document into an array.
$rows = [];
$nodes = $root->getElementsByTagName('ROW');
while ($row = $nodes->item(0)) {
$rows []= $root->removeChild($row);
}
// Sort the array of ROW tags
usort($rows, function ($a, $b) {
$a_name = $a->getElementsByTagName('Name');
$b_name = $b->getElementsByTagName('Name');
return ($a_name->length && $b_name->length) ?
strcmp(trim($a_name[0]->textContent), trim($b_name[0]->textContent)) : 0;
});
// Append ROW tags back into the document
foreach ($rows as $row) {
$root->appendChild($row);
}
// Output the result
echo $doc->saveXML();
Output
<?xml version="1.0"?>
<FMPDSORESULT xmlns="http://www.filemaker.com">
<ERRORCODE>0</ERRORCODE>
<DATABASE>My_Database</DATABASE>
<LAYOUT/>
<ROW MODID="3" RECORDID="3">
<Name>Adam</Name>
<Age>45</Age>
</ROW>
<ROW MODID="1" RECORDID="1">
<Name>John</Name>
<Age>19</Age>
</ROW>
<ROW MODID="2" RECORDID="2">
<Name>Steve</Name>
<Age>25</Age>
</ROW>
</FMPDSORESULT>
Regarding XPath
You can use DOMXPath for even more flexible traversing. However, in this specific problem the use of DOMXPath will not bring significant improvements, in my opinion. Anyway, I'll give examples for completeness.
Fetching the rows:
$xpath = new DOMXPath($doc);
$xpath->registerNamespace('myns', 'http://www.filemaker.com');
$rows = [];
foreach ($xpath->query('//myns:ROW') as $row) {
$rows []= $row->parentNode->removeChild($row);
}
Appending the rows back into the document:
$root = $xpath->evaluate('/myns:FMPDSORESULT')[0];
foreach ($rows as $row) {
$root->appendChild($row);
}
Some SimpleXMLElement methods return arrays but most return SimpleXMLElement objects which implement Iterator. A var_dump() will only show part of of the data in a simplified representation. However it is an object structure, not a nested array.
If I understand you correctly you want to sort the ROW elements by the Name child. You can fetch them with the xpath() method, but you need to register a prefix for the namespace. It returns an array of SimpleXMLElement objects. The array can be sorted with usort.
$fResult = new SimpleXMLElement($xml);
$fResult->registerXpathNamespace('fm', 'http://www.filemaker.com');
$rows = $fResult->xpath('//fm:ROW');
usort(
$rows,
function(SimpleXMLElement $one, SimpleXMLElement $two) {
return strcasecmp($one->Name, $two->Name);
}
);
var_dump($rows);
In DOM that will not look much different, but DOMXpath::evaluate() return a DOMNodeList. You can convert it into an array using iterator_to_array.
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('fm', 'http://www.filemaker.com');
$rows = iterator_to_array($xpath->evaluate('//fm:ROW'));
usort(
$rows,
function(DOMElement $one, DOMElement $two) use ($xpath) {
return strcasecmp(
$xpath->evaluate('normalize-space(Name)', $one),
$xpath->evaluate('normalize-space(Name)', $two)
);
}
);
var_dump($rows);
DOM has no magic methods to access children and values, Xpath can be used to fetch them. The Xpath function string() converts the first node into a string. It return an empty string if the node list is empty. normalize-space() does a little more. It replaces all groups of whitespaces with a single space and strips it from the start and end of the string.

Parse XML to PHP using ID value

How can I echo xml values with php by calling their "columnId" and not the position in the array ? (The array is really long)
Here is a sample of the xml :
<Data>
<Value columnId="ITEMS_SOLD">68</Value>
<Value columnId="TOTAL_SALES">682</Value>
<Value columnId="SHIPPING_READY">29</Value>
...
</Data>
The following php gives me all of the values :
$url = 'XXX';
$xml = file_get_contents($url);
$feed = simplexml_load_string($xml) or die("Error: Cannot create object");
foreach($feed->Data->Value as $key => $value){
echo $value;
}
I would like to be able to use something like that in my document :
echo $feed->Data->Value['TOTAL_SALES'];
Thank you for your help.
echo $feed->Data->Value[1];
I have an another way for your solution. You can convert xml object into array and use this for further process. Try this code:
<?php
$url = 'XXX';
//Read xml data, If file exist...
if (file_exists($url)) {
//Load xml file...
$xml = simplexml_load_file($url);
$arrColumn = array();//Variable initialization...
$arrFromObj = (array) $xml;//Convert object to array...
$i = 0;//Variable initialization with value...
//Loop until data...
foreach($xml AS $arrKey => $arrData) {
$columnId = (string) $arrData['columnId'][0];//array object to string...
$arrColumn[$columnId] = $arrFromObj['Value'][$i];//assign data to array...
$i++;//Incremental variable...
}
} else {//Condition if file not exist and display message...
exit('Failed to open file');
}
?>
Above code will store result into array variable $arrColumn and result is:
Array
(
[ITEMS_SOLD] => 68
[TOTAL_SALES] => 682
[SHIPPING_READY] => 29
)
Hope this help you well!
Use XPath. SimpleXML and DOM support it, but SimpleXML has some limits (It can only fetch node lists).
SimpleXML
$feed = simplexml_load_string($xml);
var_dump(
(string)$feed->xpath('//Value[#columnId = "TOTAL_SALES"]')[0]
);
Output:
string(3) "682"
DOM
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
var_dump(
$xpath->evaluate('string(//Value[#columnId = "TOTAL_SALES"])')
);
Output:
string(3) "682"

Using XPath to extract XML in PHP

I have the following XML:
<root>
<level name="level1">
<!-- More children <level> -->
</level>
<level name="level2">
<!-- Some more children <level> -->
</level>
</root>
How can I extract a <level> directly under <root> so that I can run an XPath query such as $xml->xpath('//some-query') relative to the extracted <level>?
DOMXPath::query's second parameter is the context node. Just pass the DOMNode instance you have previously "found" and your query runs "relative" to that node. E.g.
<?php
$doc = new DOMDocument;
$doc->loadxml( data() );
$xpath = new DOMXPath($doc);
$nset = $xpath->query('/root/level[#name="level1"]');
if ( $nset->length < 1 ) {
die('....no such element');
}
else {
$elLevel = $nset->item(0);
foreach( $xpath->query('c', $elLevel) as $elC) {
echo $elC->nodeValue, "\r\n";
}
}
function data() {
return <<< eox
<root>
<level name="level1">
<c>C1</c>
<a>A</a>
<c>C2</c>
<b>B</b>
<c>C3</c>
</level>
<level name="level2">
<!-- Some more children <level> -->
</level>
</root>
eox;
}
But unless you have to perform multiple separate (possible complex) subsequent queries, this is most likely not necessary
<?php
$doc = new DOMDocument;
$doc->loadxml( data() );
$xpath = new DOMXPath($doc);
foreach( $xpath->query('/root/level[#name="level1"]/c') as $c ) {
echo $c->nodeValue, "\r\n";
}
function data() {
return <<< eox
<root>
<level name="level1">
<c>C1</c>
<a>A</a>
<c>C2</c>
<b>B</b>
<c>C3</c>
</level>
<level name="level2">
<c>Ahh</c>
<a>ouch</a>
<c>no</c>
<b>wrxl</b>
</level>
</root>
eox;
}
has the same output using just one query.
DOMXpath::evaluate() allows you to fetch node lists and scalar values from a DOM.
So you can fetch a value directly using an Xpath expression:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
var_dump(
$xpath->evaluate('string(/root/level[#name="level2"]/#name)')
);
Output:
string(6) "level2"
The Xpath expression
All level element nodes in root:
/root/level
That have a specific name attribute:
/root/level[#name="level2"]
The value you like to fetch (name attribute for validation):
/root/level[#name="level2"]/#name
Cast into a string, if node was found the result will be an empty string:
string(/root/level[#name="level2"]/#name)
Loop over nodes, use them as context
If you need to execute several expression for the node it might be better to fetch it separately and use foreach(). The second argument for DOMXpath::evaluate() is the context node.
foreach ($xpath->evaluate('/root/level[#name="level2"]') as $level) {
var_dump(
$xpath->evaluate('string(#name)', $level)
);
}
Node list length
If you need to handle that no node was found you can check the DOMNodeList::$length property.
$levels = $xpath->evaluate('/root/level[#name="level2"]');
if ($levels->length > 0) {
$level = $levels->item(0);
var_dump(
$xpath->evaluate('string(#name)', $level)
);
} else {
// no level found
}
count() expression
You can validate that here are elements before with a count() expression, too.
var_dump(
$xpath->evaluate('count(/root/level[#name="level2"])')
);
Output:
float(1)
Boolean result
It is possible to make that a condition in Xpath and return the boolean value.
var_dump(
$xpath->evaluate('count(/root/level[#name="level2"]) > 0')
);
Output:
bool(true)
Using querypath for parsing XML/HTML makes this all super easy.
$qp = qp($xml) ;
$levels = $qp->find('root')->eq(0)->find('level') ;
foreach($levels as $level ){
//do whatever you want with it , get its xpath , html, attributes etc.
$level->xpath() ; //
}
Excellent beginner tutorial for Querypath
This should work:
$dom = new DOMDocument;
$dom->loadXML($xml);
$levels = $dom->getElementsByTagName('level');
foreach ($levels as $level) {
$levelname = $level->getAttribute('name');
if ($levelname == 'level1') {
//do stuff
}
}
I personally prefer the DOMNodeList class for parsing XML.

Simplexml get node by attribute

I've got xml file:
<?xml version="1.0" ?>
<xml>
<opis lang="en">My text</opis>
<opis lang="cz">My text2</opis>
</xml>
I want to get "My text2" - so a node where attribute lang is "cz":
$xml = simplexml_load_file($fileName);
$result = $xml->xpath('//xml/opis[#lang="cz"]')
but instead of value I get:
array(1) (
[0] => SimpleXMLElement object {
#attributes => array(1) (
[lang] => (string) cz
)
}
))
You could get the value like this:
$xml = simplexml_load_file($fileName);
$result = $xml->xpath('//xml/opis[#lang="cz"]');
foreach($result as $res) {
echo $res;
}
Try using DomDocument:
$xml = new DomDocument;
$xml->load('yourFile');
$xpath = new DomXpath($xml);
foreach ($xpath->query('//xml/opis[#lang="cz"]') as $rowNode) {
echo $rowNode->nodeValue; // will be 'this item'
}

Categories