XPath string for all children of namespaced elements - php

Just getting started with XPath, and using it's implementation with PHP's SimpleXML objects. Right now I'm using //zuq:* to create an array of SimpleXML objects with the zuq prefix in a given document. However, I'd like the SimpleXML objects to reference all descendants regardless of namespace. I tried using //child::zuq:*, but the SimpleXML trees it creates don't seem to be complete.
Essentially, the objects captured should be all the top level objects of the zuq namespace throughout the document, containing all descendant elements regardless of namespace, including zuq.
tl;dr: How can I create a SimpleXML object tree from a given document where each SimpleXML root object is the highest level document element of a given namespace (such as zuq) containing all descendants of said element regardless of the descendant namespace? XPath is not a requisite but appears to be the best choice based on my reading.
test.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:zuq="http://localhost/zuq">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<h1>Heading</h1>
<p>Paragraph</p>
<zuq:region name="myRegion">
<div class="myClass">
<h1><zuq:data name="myDataHeading" /></h1>
<p>
<zuq:data name="myDataParagraph">
<zuq:format type="trim">
<zuq:param name="length" value="200" />
<zuq:param name="append">
<span class="paragraphTrimOverflow">...</span>
</zuq:param>
</zuq:format>
</zuq:data>
</p>
</div>
</zuq:region>
</body>
</html>
$sxml = simplexml_load_file('test.html');
$sxml_zuq = $sxml->xpath('//zuq:*/descendant-or-self::node()');
print_r($sxml_zuq);
Produces:
Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object //I don't know why these don't contain their zuq descendants
(
)
[p] => SimpleXMLElement Object
(
)
)
)
[1] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
)
[2] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[3] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[4] => SimpleXMLElement Object
(
)
[5] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => myDataHeading
)
)
[6] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[7] => SimpleXMLElement Object
(
)
[8] => SimpleXMLElement Object
(
)
[9] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => myDataParagraph
)
)
[10] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => myDataParagraph
)
)
[11] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => trim
)
)
[12] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => trim
)
)
[13] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => length
[value] => 200
)
)
[14] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => trim
)
)
[15] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => append
)
[span] => ...
)
[16] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => append
)
[span] => ...
)
[17] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => paragraphTrimOverflow
)
[0] => ...
)
[18] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => paragraphTrimOverflow
)
[0] => ...
)
[19] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => append
)
[span] => ...
)
[20] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => trim
)
)
[21] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => myDataParagraph
)
)
[22] => SimpleXMLElement Object
(
)
[23] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[24] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
)
)

Don't trust the output of the print_r statement ... it seems to be showing an empty object, but in my testing the children are actually still there. For example, starting with your code above:
$sxml = simplexml_load_file('test.html');
$sxml_zuq = $sxml->xpath('//zuq:*/descendant-or-self::node()');
If I subsequently try a command like this:
print_r($sxml_zuq[0]->div->h1);
I get this output:
SimpleXMLElement Object
(
)
It seems to be empty, right? But if I modify the command to look like this:
echo $sxml_zuq[0]->div->h1->asXML();
I get the resultant tree with the namespaced child:
<h1><zuq:data name="myDataHeading"/></h1>
I'm not 100% sure why this is; it probably has something to do with the print_r statement trying to flatten the simplexml object and not dealing with the namespaces properly. But when you keep to the simplexml objects themselves that are returned from your xpath call, all of the children are preserved.
Now, in regards to your xpath itself, you probably DON'T want the "descendant-or-self" axis, because that will match not only the top-level zuq element, but also match all its children and create a larger array than you're actually seeking to return (unless I'm misunderstanding what you're asking). If you try something like this:
$sxml_zuq = $sxml->xpath('//zuq:*[not(ancestor::zuq:*)]');
then you'll get back an array of ONLY the top level of zuq namespaced elements. (while your example XML only had one such top-level element, your actual data may have several siblings at that level). You can then capture the content of each of these top level elements like this:
foreach ($sxml_zuq as $zuq_node) {
echo ($zuq_node->asXML());
}
Things get a little trickier if you want to repeat this process but do the search for top-level (or any) elements in the default namespace; you'd have to use the registerNamespace function to give the default namespace a temporary prefix, and do the xpath search on that.

I think you're looking for //zuq:*/descendant-or-self::*. This will result in all subtrees with the root having zuq namespace prefix.
The observed behavior seems to be an artifact of SimpleXML (the XPath specification does not deal with trees in the XPath query output, only separate nodes). You can probably solve it using something like
//zuq:*[not(ancestor::zuq:*)]/descendant-or-self::*
ancestor[...] checks whether there is an ancestor for which a condition is true - i.e. whether there is an ancestor with zuq prefix. So you should get only zuq: roots that have no zuq: ancestor.

Related

PHP simplexml reading child tag attribute

I have a xml like below,
<y>
<n>
<n id='test1'></n>
<n id='test2'></n>
</n>
</y>
and want to read each "id" of child "n" tag .
I use this php code;
$xml = simplexml_load_file("my.xml");
echo $xml->n[0]->n;
but getting error,
Trying to get property of non-object
It should be : $xml->n->n[0] which is an array. If you print_r($xml) you might see like this:
SimpleXMLElement Object
(
[n] => SimpleXMLElement Object
(
[n] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[id] => test1
)
)
[1] => SimpleXMLElement Object
(
[#attributes] => Array
(
[id] => test2
)
)
)
)
)

Accessing elements in Simplexml with complex structure

I'm new to woking with XML with PHP. I have a fairly complex XML structure and am using simplexml in laravel and am having trouble accessing all the elements I need to get. I am able to loop through the large XML file but simpleXML is returning two objects per record and I only seem to be able to access the elements in 'header', the first object returned...
here is part of the xml object
SimpleXMLElement Object
(
[identifier] => RCM0635
[datestamp] => 2015-06-09
)
SimpleXMLElement Object
(
[lidoWrap] => SimpleXMLElement Object
(
[lido] => SimpleXMLElement Object
(
[lidoRecID] => RCM:1748
[descriptiveMetadata] => SimpleXMLElement Object
(
[objectClassificationWrap] => SimpleXMLElement Object
(
[objectWorkTypeWrap] => SimpleXMLElement Object
(
[objectWorkType] => SimpleXMLElement Object
(
[term] => musical instruments
)
)
[classificationWrap] => SimpleXMLElement Object
(
[classification] => Array
(
[0] => SimpleXMLElement Object
(
[term] => Cornet
)
[1] => SimpleXMLElement Object
(
[conceptID] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => SH_Class
)
)
)
)
)
)
With the code below I can get the elements in the header but I can't figure out how to get the other elements?
$streamer = \Prewk\XmlStringStreamer::createStringWalkerParser(public_path().'/xml/many_mimo_records.xml');
while ($node = $streamer->getNode()) {
$simpleXmlNode = simplexml_load_string($node);
echo (string)$simpleXmlNode->identifier;
echo (string)$simpleXmlNode->datestamp;
}
I'd be very grateful for any advice...
I'm not sure if I understand You but in major:
You act on while ($node = $streamer->getNode()) loop what means that after first iteration You'll get this object:
SimpleXMLElement Object
(
[identifier] => RCM0635
[datestamp] => 2015-06-09
)
so for the first time it's ok to read it like:
`
$simpleXmlNode = simplexml_load_string($node);
echo (string)$simpleXmlNode->identifier;
echo (string)$simpleXmlNode->datestamp;
`
but in the second iteration You have:
`
SimpleXMLElement Object
(
[lidoWrap] => SimpleXMLElement Object
(
[lido] => SimpleXMLElement Object
(
[lidoRecID] => RCM:1748
[descriptiveMetadata] => SimpleXMLElement Object
(
[objectClassificationWrap] => SimpleXMLElement Object
(
[objectWorkTypeWrap] => SimpleXMLElement Object
(
[objectWorkType] => SimpleXMLElement Object
(
[term] => musical instruments
)
)
[classificationWrap] => SimpleXMLElement Object
(
[classification] => Array
(
[0] => SimpleXMLElement Object
(
[term] => Cornet
)
[1] => SimpleXMLElement Object
(
[conceptID] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => SH_Class
)
)
)
)
)
)
`
so the code inside while is wrong.
i sugest to try something like this:
`
while ($node = $streamer->getNode()) {
$simpleXmlNode = simplexml_load_string($node);
if (!empty($simpleXmlNode->identifier))
echo (string)$simpleXmlNode->identifier;
if (!empty($simpleXmlNode->datestamp))
echo (string)$simpleXmlNode->datestamp;
if (!empty($simpleXmlNode->lidoWrap)) {
$lido = $simpleXmlNode->lidoWrap->lido;
echo (string)$lido->lidoRecID;
// and so on as the recursive XML node objects
}
}
`

How to display a value from a SimpleXML object (the array notation is confusing me)

I have a PHP file that uses cURL to retrieve some XML. I now want to retrieve a value from the XML but I cannot traverse to it as I am confused with the notation.
Here's my retrieved XML:
SimpleXMLElement Object
(
[#attributes] => Array
(
[uri] => /fruit/apple/xml/green/pipType
)
[result] => SimpleXMLElement Object
(
[JobOpenings] => SimpleXMLElement Object
(
[row] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[no] => 1
)
[FL] => Array
(
[0] => 308343000000092052
[1] => ZR_6_JOB
)
)
[1] => SimpleXMLElement Object
(
[#attributes] => Array
(
[no] => 2
)
[FL] => Array
(
[0] => 308343000000091031
[1] => ZR_5_JOB
)
)
)
)
)
)
I have this XML stored in a variable called $xml using:
$xml = new SimpleXmlElement($data, LIBXML_NOCDATA);
Any help for how I can select the ZR_5_JOB element please?
I have tried countless times, the last effort I had was:
print_r($xml->result->JobOpenings->row[0]->FL[0]);
Could anybody please help?
(I know I will then need to do some iteration, but I'll deal with that later!)
First loop the JobOpenings rows to get each row separately and then you can access the childrens of that element in an easy way.
foreach($xml->result->JobOpenings->row as $item) {
echo $item->FL[0] . '<br>';
}

I'm trying to find an attribute using simpleXML

I have a simpleXML output of:
SimpleXMLElement Object
(
[#attributes] => Array
(
[version] => 2
)
[currentTime] => 2013-02-05 21:26:09
[result] => SimpleXMLElement Object
(
[rowset] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => characters
[key] => characterID
[columns] => name,characterID,corporationName,corporationID
)
[row] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => Wrytha Cy
[characterID] => 209668693
[corporationName] => Deep Core Mining Inc.
[corporationID] => 1000006
)
)
[1] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => Eve Mae
[characterID] => 624980803
[corporationName] => Viziam
[corporationID] => 1000066
)
)
[2] => SimpleXMLElement Object
(
[#attributes] => Array
(
[name] => Wrytha
[characterID] => 709227913
[corporationName] => The Flying Tigers
[corporationID] => 669350666
)
)
)
)
)
[cachedUntil] => 2013-02-05 21:35:04
)
I would like to loop through with my php loop and get "name' and "characterID". I've trying something like:
$simpleXML = simplexml_load_string($xml);
foreach ($simpleXML->result->rowset->row as $row) {
print_r($row);
$name = $row['#attributes']['name'];
echo $name.'<br>';
}
but $name is not being set. It's gonna be something simple, just not seeing it in my haste and first time with simpleXML.
Attributes are accessed using the syntax $element['attribute_name'], so in your case, you need $row['name'].
It's important to remember that SimpleXML objects are kind of magic - the $element->child, $element_list[0] and $element['foo'] syntax overloads the normal PHP logic to be useful. Similarly, (string)$element will give you the full textual content of an element, however it is broken up in the actual XML.
As such, the print_r output will not give you a "real" view of the object, so should be used with care. There are a couple of alternative debug functions I've written here which give a more accurate idea of how the object will behave.

SimpleXML data missing

I have a XML which I parse with php simpleXML.
The XML:
<GetOneGetAll DateTimeSystem="28-06-2011 17:19:29" RetCode="200" RetVal="1" RetMsg="User ok.">
<User Id="bc5cb4cf-19a6-4504-8e1a-f72dd97bcc66" ReferedConfirmedUsers="0" TotalRecomendations="0" DistinctRecomendations="0">
<Name>Name</Name>
<Surname>Surname</Surname>
<Gender>F</Gender>
<Email>email#email.com</Email>
<RefererCode>59286904</RefererCode>
<CustomPhotoMessage HasCustomPhoto="0" HasCustomMessage="0"/>
<ReferedConfirmedUsersList/>
</User>
</GetOneGetAll>
When I print_r the var using simpleXML I get:
SimpleXMLElement Object
(
[#attributes] => Array
(
[DateTimeSystem] => 28-06-2011 17:22:52
[RetCode] => 200
[RetVal] => 1
[RetMsg] => Login ok.
)
[User] => SimpleXMLElement Object
(
[#attributes] => Array
(
[Id] => bc5cb4cf-19a6-4504-8e1a-f72dd97bcc66
[ReferedConfirmedUsers] => 0
[TotalRecomendations] => 0
[DistinctRecomendations] => 0
)
[Name] => SimpleXMLElement Object
(
)
[Surname] => SimpleXMLElement Object
(
)
[Gender] => SimpleXMLElement Object
(
)
[Email] => SimpleXMLElement Object
(
)
[RefererCode] => SimpleXMLElement Object
(
)
[CustomPhotoMessage] => SimpleXMLElement Object
(
[#attributes] => Array
(
[HasCustomPhoto] => 0
[HasCustomMessage] => 0
)
)
[ReferedConfirmedUsersList] => SimpleXMLElement Object
(
)
)
)
Where is the data of Surname, Name, Email, Gender, etc.?
This is just a guess, but if you have xdebug installed, then the default recursion level of var_dump output is 3. This setting is xdebug.var_display_max_depth
You are using print_r, but some similar recursive limit could be being reached.

Categories