Using PHP SimpleXML to access non-default namespace element attributes - php

I have some annoying XML from an API response that looks like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><report><QueryResult>
<ResumptionToken>123456</ResumptionToken>
<IsFinished>true</IsFinished>
<ResultXml>
<rowset xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
<xsd:complexType name="Row">
<xsd:sequence>
<xsd:element maxOccurs="1" minOccurs="1" name="Column0" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="0" saw-sql:displayFormula="0" saw-sql:length="4" saw-sql:precision="12" saw-sql:scale="0" saw-sql:tableHeading="" saw-sql:type="integer" type="xsd:int"/>
<xsd:element maxOccurs="1" minOccurs="0" name="Column1" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="ISBN" saw-sql:displayFormula=""Bibliographic Details"."ISBN"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
<xsd:element maxOccurs="1" minOccurs="0" name="Column2" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="ISSN" saw-sql:displayFormula=""Bibliographic Details"."ISSN"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
<xsd:element maxOccurs="1" minOccurs="0" name="Column3" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="Publication Date" saw-sql:displayFormula=""Bibliographic Details"."Publication Date"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
<Row>
<Column0>0</Column0>
<Column1>55555555 444444445</Column1>
<Column3>[2019]</Column3>
</Row>
<Row>
<Column0>0</Column0>
<Column1>555555555</Column1>
<Column3>©2009.</Column3>
</Row>
I'm using PHP's SimpleXML to parse this data, but am struggling to access the column headers located in the non-default namespace under xsd:element. For example, I need to access the value: saw-sql:columnHeading="Publication Date", as this column can be dynamic and isn't always "Publication Date". So I'm looking to pluck out the values for saw-sql[#columnHeading].
I've tried all manners of registering the namespaces with Xpath, using attributes() etc etc. The closest I got was:
$ResponseXml->registerXPathNamespace('xsd','http://www.w3.org/2001/XMLSchema');
$elements = $ResponseXml->xpath('//xsd:element[#minOccurs]');
This actually got me the default namespace attributes, but I need the ones for saw-sql, and the same method of:
$ResponseXml->registerXPathNamespace('saw-sql','urn:saw-sql');
$elements = $ResponseXml->xpath('//saw-sql:element[#columnHeading]');
does not get me any results.

Your XPath //saw-sql:element[#columnHeading] is looking for elements named element (in the saw-sql namespace), which have attributes named columnHeading (in no namespace), but the element names are actually in the xsd namespace, while the attributes are in the saw-sql namespace.
So I believe what you want is:
$ResponseXml->registerXPathNamespace('xsd','http://www.w3.org/2001/XMLSchema');
$ResponseXml->registerXPathNamespace('saw-sql','urn:saw-sql');
$elements = $ResponseXml->xpath('//xsd:element[#saw-sql:columnHeading]');

fwiw you could use DOMDocument to parse it instead of SimpleXML, for example
<?php
$xml = <<<'XML'
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><report><QueryResult>
<ResumptionToken>123456</ResumptionToken>
<IsFinished>true</IsFinished>
<ResultXml>
<rowset xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
<xsd:complexType name="Row">
<xsd:sequence>
<xsd:element maxOccurs="1" minOccurs="1" name="Column0" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="0" saw-sql:displayFormula="0" saw-sql:length="4" saw-sql:precision="12" saw-sql:scale="0" saw-sql:tableHeading="" saw-sql:type="integer" type="xsd:int"/>
<xsd:element maxOccurs="1" minOccurs="0" name="Column1" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="ISBN" saw-sql:displayFormula=""Bibliographic Details"."ISBN"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
<xsd:element maxOccurs="1" minOccurs="0" name="Column2" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="ISSN" saw-sql:displayFormula=""Bibliographic Details"."ISSN"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
<xsd:element maxOccurs="1" minOccurs="0" name="Column3" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="Publication Date" saw-sql:displayFormula=""Bibliographic Details"."Publication Date"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
<Row>
<Column0>0</Column0>
<Column1>55555555 444444445</Column1>
<Column3>[2019]</Column3>
</Row>
<Row>
<Column0>0</Column0>
<Column1>555555555</Column1>
<Column3>©2009.</Column3>
</Row>
XML;
$domd = new DOMDocument();
#$domd->loadHTML($xml);
foreach ($domd->getElementsByTagName(strtolower("ResultXml")) as $resultXml) {
$keyNames1 = [];
$keyNames2 = [];
foreach ($resultXml->getElementsByTagName("rowset") as $rowset) {
foreach ($rowset->getElementsByTagName("sequence") as $sequence) {
foreach ($sequence->getElementsByTagName("element") as $element) {
$keyNames1[] = $element->getAttribute("name");
$keyNames2[] = $element->getAttribute(strtolower("saw-sql:columnHeading"));
}
}
}
$rows = [];
foreach ($resultXml->getElementsByTagName("row") as $row) {
$rowData = [];
foreach ($keyNames1 as $keyName1Key => $keyName1Name) {
$tmp = $row->getElementsByTagName(strtolower($keyName1Name));
if ($tmp->length) {
$rowData[$keyNames2[$keyName1Key]] = $tmp->item(0)->textContent;
}
}
$rows[] = $rowData;
}
var_export($rows);
}
yields
array (
0 =>
array (
0 => '0',
'ISBN' => '55555555 444444445',
'Publication Date' => '[2019]',
),
1 =>
array (
0 => '0',
'ISBN' => '555555555',
'Publication Date' => '©2009.',
),
)
i used loadHTML instead of loadXML because DOMDocument insists that your XML is not valid XML; in loadHTML() mode, everything is lowercase for some reason, in loadXML() everything is case-sensitive.

SimpleXMLElement::attributes() allows you to access the attributes of a specific namespace providing the namespace URI as a parameter.
$value = $simpleXMLElement->attributes($namespaceURI);
But first I would suggest defining a constant (or variable) for the namespaces that you are using. This will make your code a lot more readable and avoid typos.
Be aware that "rowset" redefines the default namespace for itself and the descendant element nodes, they are not in the "empty/none" namespace.
// define a dictionary for the namespaces
const XMLNS = [
// same alias as in the document
'xsd' => 'http://www.w3.org/2001/XMLSchema',
// let's use a shorter alias
'saw' => 'urn:saw-sql',
// own alias - used without alias in the document
'rowset' => 'urn:schemas-microsoft-com:xml-analysis:rowset'
];
$report = new SimpleXMLElement(getXMLString());
foreach (XMLNS as $alias => $uri) {
$report->registerXpathNamespace($alias, $uri);
}
$columns = [];
foreach($report->xpath('//xsd:complexType[#name="Row"]/xsd:sequence/xsd:element') as $element) {
$columns[] = [
// read the attribute (fallback to '' if missing), cast to string
'name'=> (string)($element['name'] ?? ''),
// read attribute with namespace
'heading'=> (string)($element->attributes(XMLNS['saw'])['columnHeading'] ?? '')
];
}
var_dump($columns);
Output:
array(4) {
[0]=>
array(2) {
["name"]=>
string(7) "Column0"
["heading"]=>
string(1) "0"
}
[1]=>
array(2) {
["name"]=>
string(7) "Column1"
["heading"]=>
string(4) "ISBN"
}
[2]=>
array(2) {
["name"]=>
string(7) "Column2"
["heading"]=>
string(4) "ISSN"
}
[3]=>
array(2) {
["name"]=>
string(7) "Column3"
["heading"]=>
string(16) "Publication Date"
}
}
The Xpath Expression
Fetch the type definitions: //xsd:complexType
Filter for "Row": //xsd:complexType[#name="Row"]
The elements inside the sequence:
//xsd:complexType[#name="Row"]/xsd:sequence/xsd:element
The part in [] are conditions for nodes returned the previous location path. So //foo[#bar] would return the foo element nodes with a bar attribute, while //foo/#bar would return the bar attributes of all foo element nodes.
DOM
This solution would not look much different with DOM. The Xpath processor is a separate object and here are specific methods to work with namespaces (suffix "NS"). DOM is more specific and powerful then SimpleXML.
$document = new DOMDocument();
$document->loadXML(getXMLString());
$xpath = new DOMXpath($document);
foreach (XMLNS as $alias => $uri) {
$xpath->registerNamespace($alias, $uri);
}
$columns = [];
foreach($xpath->evaluate('//xsd:complexType[#name="Row"]/xsd:sequence/xsd:element') as $element) {
$columns[] = [
'name'=> $element->getAttribute('name'),
'heading'=> $element->getAttributeNS(XMLNS['saw'], 'columnHeading')
];
}
var_dump($columns);

Related

Parse XML Document recursive

I have XML documents containing information of articles, that have a kind of hierarchy:
<?xml version="1.0" encoding="UTF-8"?>
<page>
<elements>
<element>
<type>article</type>
<id>1</id>
<parentContainerID>page</parentContainerID>
<parentContainerType>page</parentContainerType>
</element>
<element>
<type>article</type>
<id>2</id>
<parentContainerID>1</parentContainerID>
<parentContainerType>article</parentContainerType>
</element>
<element>
<type>photo</type>
<id>3</id>
<parentContainerID>2</parentContainerID>
<parentContainerType>article</parentContainerType>
</element>
<... more elements ..>
</elements>
</page>
The element has the node parentContainerID and the node parentContainerType. If parentContainerType == page, this is the master element. The parentContainerID shows what's the element's master. So it should look like: 1 <- 2 <- 3
Now I need to build a new page (html) of this stuff that looks like this:
content of ID 1, content of ID 2, content of ID 3 (the IDs are not ongoing).
I guess this could be done with a recursive function. But I have no idea how to manage this?
Here is no nesting/recursion in the XML. The <element/> nodes are siblings. To build the parent child relations I would suggest looping over the XML and building two arrays. One for the relations and one referencing the elements.
$xml = file_get_contents('php://stdin');
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$relations = [];
$elements = [];
foreach ($xpath->evaluate('//element') as $element) {
$id = (int)$xpath->evaluate('string(id)', $element);
$parentId = (int)$xpath->evaluate('string(parentContainerID)', $element);
$relations[$parentId][] = $id;
$elements[$id] = $element;
}
var_dump($relations);
Output:
array(3) {
[0]=>
array(1) {
[0]=>
int(1)
}
[1]=>
array(1) {
[0]=>
int(2)
}
[2]=>
array(1) {
[0]=>
int(3)
}
}
The relations array now contains the child ids for any parent, elements without a parent are in index 0. This allows you use a recursive function access the elements as a tree.
function traverse(
int $parentId, callable $callback, array $elements, array $relations, $level = -1
) {
if ($elements[$parentId]) {
$callback($elements[$parentId], $parentId, $level);
}
if (isset($relations[$parentId]) && is_array($relations[$parentId])) {
foreach ($relations[$parentId] as $childId) {
traverse($childId, $callback, $elements, $relations, ++$level);
}
}
}
This executes the callback for each node. The proper implementation for this would be a RecursiveIterator but the function should do for the example.
traverse(
0,
function(DOMNode $element, int $id, int $level) use ($xpath) {
echo str_repeat(' ', $level);
echo $id, ": ", $xpath->evaluate('string(type)', $element), "\n";
},
$elements,
$relations
);
Output:
1: article
2: article
3: photo
Notice that the $xpath object is provided as context to the callback. Because the $elements array contains the original nodes, you can use Xpath expression to fetch detailed data from the DOM related to the current element node.

Parsing XML into a returnable multidimensional array in PHP

I have tried this a few different ways, returning the SimpleXML object directly, and now running through it, turning it into a multidimensional array, but nothing seems to work, as soon as I try to return it to another function, it just blanks. I must be missing something here, but for the life of me I just cannot see what it is.
public function getSettings() {
$xml = simplexml_load_file('SETTINGS.xml');
$settings = array();
foreach ($xml->settings->page as $page) {
$settings[$page->title] = array("Navbar" => $page->navbar, "Elements" => array());
foreach ($page->element as $element){
array_push($settings[$page->title]["Elements"],
["Name" => $element->name,
"File" => $element->location,
"Style" => $element->style]);
}
}
return $settings;
}
SETTINGS.xml
<?xml version="1.0" encoding="UTF-8"?>
<settings>
<page>
<title>Login</title>
<navbar></navbar>
<element>
<name>Login</name>
<location>Login.php</location>
<style>Style.css</style>
</element>
</page>
<page>
<title>Dashboard</title>
<navbar>Navbar.php</navbar>
<element>
<name>Recent Punishments Table</name>
<location>RecentPunishments.php</location>
<style>Style.css</style>
</element>
</page>
</settings>

Get value from xml using the attribute with xpath in php

I want to get the value '23452345235' of the parameter with name="userID" from this xml:
<?xml version="1.0" encoding="UTF-8"?>
<callout>
<parameter name="UserID">
23452345235
</parameter>
<parameter name="AccountID">
57674567567
</parameter>
<parameter name="NewUserID">
54745674566
</parameter>
</callout>
I'm using this code:
$xml = simplexml_load_string($data);
$myDataObject = $xml->xpath('//parameter[#name="UserID"]');
var_dump($myDataObject);
And I'm getting this:
array(1) {
[0] =>
class SimpleXMLElement#174 (1) {
public $#attributes =>
array(1) {
'name' =>
string(6) "UserID"
}
}
}
I actually want to get the value of '23452345235' or receive the parameter in order to get this value.
What I'm doing wrong?
Well you can (optionally) put it under a loop. Like this:
$myDataObject = $xml->xpath('//parameter[#name="UserID"]');
foreach($myDataObject as $element) {
echo $element;
}
Or directly:
echo $myDataObject[0];
Actually is quite straightforward, as seen on your var_dump(), its an array, so access it as such.
SimpleXMLElement::xpath() can only return an array of SimpleXMLElement objects, so it generates an element and attaches the fetched attribute to it.
DOMXpath::evaluate() can return scalar values from Xpath expressions:
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
var_dump($xpath->evaluate('normalize-space(//parameter[#name="UserID"])'));
Output:
string(11) "23452345235"

php xml iterator

I have the following xml:
<user>
<section xmlns="ss">Testing</section>
<department xmlns="da">IT</department>
</user>
Now while iterating, i want the namespace information for the tag(ss for section and da for department).
With SimpleXMLIterator, I am not able to get the namespace info for every tag.
Any help would be appreciated
Use the SimpleXMLElement::getNamespaces() method to access the element's namespace(s).
$xml = '
<user>
<section xmlns="ss">Testing</section>
<department xmlns="da">IT</department>
</user>
';
$iterator = new SimpleXMLIterator($xml);
foreach ($iterator as $element) {
var_dump($element->getNamespaces());
}
Outputs (along with lots of warnings because of your broken XML):
array(1) {
[""]=>
string(6) "ss"
}
array(1) {
[""]=>
string(6) "da"
}

How can I move XML elements with PHP's SimpleXML?

How can I move an xml element elsewhere in a document? So I have this:
<outer>
<foo>
<child name="a"/>
<child name="b"/>
<child name="c"/>
</foo>
<bar />
</outer>
and want to end up with:
<outer>
<foo />
<bar>
<child name="a"/>
<child name="b"/>
<child name="c"/>
</bar>
</outer>
Using PHP's simpleXML.
Is there a function I'm missing (appendChild-like)?
You could make a recusrive function that clones the attributes and children. There is no other way to move the children with SimpleXML
class ExSimpleXMLElement extends SimpleXMLElement {
//ajoute un object à un autre
function sxml_append(ExSimpleXMLElement $to, ExSimpleXMLElement $from) {
$toDom = dom_import_simplexml($to);
$fromDom = dom_import_simplexml($from);
$toDom->appendChild($toDom->ownerDocument->importNode($fromDom, true));
}
}
$customerXML = <<<XML
<customer>
<address_billing>
<address_book_id>10</address_book_id>
<customers_id>20</customers_id>
<telephone>0120524152</telephone>
<entry_country_id>73</entry_country_id>
</address_billing>
<countries>
<countries_id>73</countries_id>
<countries_name>France</countries_name>
<countries_iso_code_2>FR</countries_iso_code_2>
</countries>
</customer>
XML;
$customer = simplexml_load_string($customerXML, "ExSimpleXMLElement");
$customer->sxml_append($customer->address_billing, $customer->countries);
echo $customer->asXML();
<?xml version="1.0"?>
<customer>
<address_billing>
<address_book_id>10</address_book_id>
<customers_id>20</customers_id>
<telephone>0120524152</telephone>
<entry_country_id>73</entry_country_id>
<countries>
<countries_id>73</countries_id>
<countries_name>France</countries_name>
<countries_iso_code_2>FR</countries_iso_code_2>
</countries>
</address_billing>
</customer>

Categories