I have XML documents containing information of articles, that have a kind of hierarchy:
<?xml version="1.0" encoding="UTF-8"?>
<page>
<elements>
<element>
<type>article</type>
<id>1</id>
<parentContainerID>page</parentContainerID>
<parentContainerType>page</parentContainerType>
</element>
<element>
<type>article</type>
<id>2</id>
<parentContainerID>1</parentContainerID>
<parentContainerType>article</parentContainerType>
</element>
<element>
<type>photo</type>
<id>3</id>
<parentContainerID>2</parentContainerID>
<parentContainerType>article</parentContainerType>
</element>
<... more elements ..>
</elements>
</page>
The element has the node parentContainerID and the node parentContainerType. If parentContainerType == page, this is the master element. The parentContainerID shows what's the element's master. So it should look like: 1 <- 2 <- 3
Now I need to build a new page (html) of this stuff that looks like this:
content of ID 1, content of ID 2, content of ID 3 (the IDs are not ongoing).
I guess this could be done with a recursive function. But I have no idea how to manage this?
Here is no nesting/recursion in the XML. The <element/> nodes are siblings. To build the parent child relations I would suggest looping over the XML and building two arrays. One for the relations and one referencing the elements.
$xml = file_get_contents('php://stdin');
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$relations = [];
$elements = [];
foreach ($xpath->evaluate('//element') as $element) {
$id = (int)$xpath->evaluate('string(id)', $element);
$parentId = (int)$xpath->evaluate('string(parentContainerID)', $element);
$relations[$parentId][] = $id;
$elements[$id] = $element;
}
var_dump($relations);
Output:
array(3) {
[0]=>
array(1) {
[0]=>
int(1)
}
[1]=>
array(1) {
[0]=>
int(2)
}
[2]=>
array(1) {
[0]=>
int(3)
}
}
The relations array now contains the child ids for any parent, elements without a parent are in index 0. This allows you use a recursive function access the elements as a tree.
function traverse(
int $parentId, callable $callback, array $elements, array $relations, $level = -1
) {
if ($elements[$parentId]) {
$callback($elements[$parentId], $parentId, $level);
}
if (isset($relations[$parentId]) && is_array($relations[$parentId])) {
foreach ($relations[$parentId] as $childId) {
traverse($childId, $callback, $elements, $relations, ++$level);
}
}
}
This executes the callback for each node. The proper implementation for this would be a RecursiveIterator but the function should do for the example.
traverse(
0,
function(DOMNode $element, int $id, int $level) use ($xpath) {
echo str_repeat(' ', $level);
echo $id, ": ", $xpath->evaluate('string(type)', $element), "\n";
},
$elements,
$relations
);
Output:
1: article
2: article
3: photo
Notice that the $xpath object is provided as context to the callback. Because the $elements array contains the original nodes, you can use Xpath expression to fetch detailed data from the DOM related to the current element node.
Related
I'm trying to retrieve one specific node based on the <id> element from a huge XML file. I have used DOMDocument, but its not ideal since it loads the whole document first. There is around 1400 <item> nodes in the document. This is a simplified version of the document:
<main>
<body>
...
<sub>
...
<items>
...
<item>
<name>Abc</name>
...
<id>123</id>
<calls>
<call>
<name>Monkey</name>
<text>Monkeys r cool</text>
...
</call>
<call>
<name>Pig</name>
<text>Pigs too!</text>
...
</call>
</calls>
<cones>
<cone>
<name>Lorem</name>
<text>Lorem ipsum</text>
...
</cone>
<cone>
<name>More</name>
<text>Placeholder</text>
...
</cone>
</cones>
<a>true</a>
</item>
<item>
<name>Def</name>
...
<id>456</id>
<calls>
<call>
<name>aa</name>
<text>aa</text>
...
</call>
<call>
<name>bb</name>
<text>bb</text>
...
</call>
</calls>
<cones>
<cone>
<name>cc</name>
<text>cc</text>
...
</cone>
<cone>
<name>dd</name>
<text>dd</text>
...
</cone>
</cones>
<a>true</a>
</item>
</items>
</sub>
</body>
</main>
So basically I'm trying to retrieve the current node and its children's data from matching the <id> element. I have tried find tutorials on XMLReader, but can't seem to find that much. This is what I've tried so far:
$xml = new XMLReader();
$xml->open('doc.xml');
while($xml->read()) {
if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'id') {
$xml->read();
echo $xml->value;
}
}
This finds every <id> element, but i want to find one specific and read the data from the current node, and its children. Maybe using the example to find the node and readInnerXml() to get the data
I'm not an expert so any help / push to the right direction is much appreciated :D
If all the item elements are siblings you can use XMLReader::read() to find the first element and XMLReader::next() to iterate them.
Then use XMLReader::expand() to load the item and its descendants into DOM, use Xpath to read data from it.
$searchForID = '123';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// look for the first "item" element node
while (
$reader->read() && $reader->localName !== 'item'
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'item') {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "id" with the searched contents
if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(name)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(name)', $call),
'text' => $xpath->evaluate('string(text)', $call)
];
},
iterator_to_array(
$xpath->evaluate('calls/call', $item)
)
)
]
);
}
$reader->next('item');
}
$reader->close();
XML with namespaces
If the XML uses a namespace (like the one you linked in the comments) you will have to takes it into consideration.
For the XMLReader that means validating not just localName (the node name without any namespace prefix/alias) but the namespaceURI as well.
For DOM methods that would mean using the namespace aware methods (with the suffix NS) and registering your own alias/prefix for the Xpath expressions.
$searchForID = '2755';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
// the namespace uri
$xmlns_siri = 'http://www.siri.org.uk/siri';
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register an alias for the siri namespace
$xpath->registerNamespace('siri', $xmlns_siri);
// look for the first "item" element node
while (
$reader->read() &&
(
$reader->localName !== 'EstimatedVehicleJourney' ||
$reader->namespaceURI !== $xmlns_siri
)
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'EstimatedVehicleJourney') {
// validate the namespace of the node
if ($reader->namespaceURI === $xmlns_siri) {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "VehicleRef" with the searched contents
// note the use of the registered namespace alias
if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(siri:OriginName)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
];
},
iterator_to_array(
$xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
)
)
]
);
}
}
$reader->next('EstimatedVehicleJourney');
}
$reader->close();
I'm not sure if this is the expected behavior or if I'm doing something wrong:
<?php
$xml = '<?xml version="1.0"?>
<foobar>
<foo>
<nested>
<img src="example1.png"/>
</nested>
</foo>
<foo>
<nested>
<img src="example2.png"/>
</nested>
</foo>
</foobar>';
$dom = new DOMDocument();
$dom->loadXML($xml);
$node = $dom->getElementsByTagName('foo')[0];
$simplexml = simplexml_import_dom($node);
echo $simplexml->asXML() . "\n";
echo " === With // ====\n";
var_dump($simplexml->xpath('//img'));
echo " === With .// ====\n";
var_dump($simplexml->xpath('.//img'));
Even though I only imported a specific DomNode, and asXml() returns only that part, the xpath() still seems to operate on the whole document.
I can prevent that by using .//img, but that seemed rather strange to me.
Result:
<foo>
<nested>
<img src="example1.png"/>
</nested>
</foo>
=== With // ====
array(2) {
[0] =>
class SimpleXMLElement#4 (1) {
public $#attributes =>
array(1) {
'src' =>
string(12) "example1.png"
}
}
[1] =>
class SimpleXMLElement#5 (1) {
public $#attributes =>
array(1) {
'src' =>
string(12) "example2.png"
}
}
}
=== With .// ====
array(1) {
[0] =>
class SimpleXMLElement#5 (1) {
public $#attributes =>
array(1) {
'src' =>
string(12) "example1.png"
}
}
}
It is expected behavior. You're importing an DOM element node into an SimpleXMLElement. This does not modify the XML document in the background - the node keeps its context.
Here are Xpath expressions that go up (parent::, ancestor::) or to siblings (preceding-sibling::, following-sibling::).
Location paths starting with a / are always relative to the document, not the context node. An explicit reference to the current node with the . avoids that trigger. .//img is short for current()/descendant-or-self::img - an alternative would be descendant::img.
However you don't need to convert the DOM node into a SimpleXMLElement to use Xpath.
$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
foreach ($xpath->evaluate('//foo[1]') as $foo) {
var_dump(
$xpath->evaluate('string(.//img/#src)', $foo)
);
}
Output:
string(12) "example1.png"
//foo[1] fetches the first foo element node in the document. If here is no matching element in the document it will return an empty list. Using foreach allows to avoid an error in that case. It will be iterated once or never.
string(.//img/#src) fetches the src attribute of descendant img elements and casts the first one into a string. If here is no matching node the return value will be and empty string. The second argument to DOMXpath::evaluate() is the context node.
I want to know if passing an XML node and then calling upon a method to access it is legal syntax in PHP. I tried converting to string, but that didn't work.
What am I doing wrong?
What would be the best/simplest alternative?
XML
<user>
<widgets>
<widget>Widget 1</widget>
<stuff>
<morestuff>Things</morestuff>
</stuff>
<stuff>
<morestuff>Things</morestuff>
</stuff>
<widget>Widget 2</widget>
</widgets>
</user>
PHP
<?php
$xmlfile = 'widgets/widgets_files/widgets.xml';
$widgets = array();
$user = new SimpleXMLElement($xmlfile, NULL, true);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom = dom_import_simplexml($user)->ownerDocument;
foreach ($user->widgets->widget as $widget) {
$new_widget = new Widget($widget); //Where the node gets passed
array_push($widgets, $new_widget);
}
//For example
$new_widget[0]->set_subnodes();
$new_widget[0]->get_subnodes();
class Widget {
private $widget;
private $stuffArray = array();
public function __construct($widget) {
$this->widget = $widget;
}
public function set_subnodes() {
foreach ($this->widget->stuff->morestuff as $morestuff => $value) {
$this->stuffArray[$morestuff] = $value;
}
}
public function get_subnodes() {
foreach ($this->stuffArray as $stuff) {
echo$stuff;
}
}
}
It is indeed possible to pass XML objects as parameters to objects and to call methods on them, but there are a number of errors in your code which are stopping it from working. In particular, the XML that you are using isn't the structure that you think it is--the stuff and morestuff nodes are not children of widget, so none of the actions that you're trying to perform with them will work. Here's a corrected version of the XML and some PHP code that does what I think you're trying to do above:
$widgets = array();
# you can load your code from a file, obviously--for the purposes of the example,
# I'm loading mine using a function.
$sxe = simplexml_load_string( get_my_xml() );
foreach ($sxe->widgets->widget as $widget) {
$new_widget = new Widget($widget); // Where the node gets passed
array_push($widgets, $new_widget);
}
// For example
foreach ($widgets as $w) {
$w->set_subnodes();
$w->get_subnodes();
}
function get_my_xml() {
return <<<XML
<user>
<widgets>
<widget>Widget 1
<stuff>
<morestuff>Things</morestuff>
</stuff>
<stuff>
<morestuff>Other Things</morestuff>
</stuff>
</widget>
<widget>Widget 2
<stuff>
<morestuff>Widget Two's Things</morestuff>
</stuff>
<stuff>
<morestuff>Widget Two's Other Things</morestuff>
</stuff>
</widget>
</widgets>
</user>
XML;
}
The Widget object:
class Widget {
private $widget;
private $stuffArray = array();
public function __construct($widget) {
$this->widget = $widget;
}
public function set_subnodes() {
# put all the "morestuff" nodes into the stuffArray
foreach ($this->widget->xpath("stuff/morestuff") as $ms) {
print "pushing $ms on to array" . PHP_EOL;
array_push($this->stuffArray, $ms);
}
}
public function get_subnodes() {
foreach ($this->stuffArray as $stuff) {
print "Running get_subnodes: got $stuff" . PHP_EOL;
}
}
}
Output:
pushing Things on to array
pushing Other Things on to array
Running get_subnodes: got Things
Running get_subnodes: got Other Things
pushing Widget Two's Things on to array
pushing Widget Two's Other Things on to array
Running get_subnodes: got Widget Two's Things
Running get_subnodes: got Widget Two's Other Things
I want to get the value '23452345235' of the parameter with name="userID" from this xml:
<?xml version="1.0" encoding="UTF-8"?>
<callout>
<parameter name="UserID">
23452345235
</parameter>
<parameter name="AccountID">
57674567567
</parameter>
<parameter name="NewUserID">
54745674566
</parameter>
</callout>
I'm using this code:
$xml = simplexml_load_string($data);
$myDataObject = $xml->xpath('//parameter[#name="UserID"]');
var_dump($myDataObject);
And I'm getting this:
array(1) {
[0] =>
class SimpleXMLElement#174 (1) {
public $#attributes =>
array(1) {
'name' =>
string(6) "UserID"
}
}
}
I actually want to get the value of '23452345235' or receive the parameter in order to get this value.
What I'm doing wrong?
Well you can (optionally) put it under a loop. Like this:
$myDataObject = $xml->xpath('//parameter[#name="UserID"]');
foreach($myDataObject as $element) {
echo $element;
}
Or directly:
echo $myDataObject[0];
Actually is quite straightforward, as seen on your var_dump(), its an array, so access it as such.
SimpleXMLElement::xpath() can only return an array of SimpleXMLElement objects, so it generates an element and attaches the fetched attribute to it.
DOMXpath::evaluate() can return scalar values from Xpath expressions:
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
var_dump($xpath->evaluate('normalize-space(//parameter[#name="UserID"])'));
Output:
string(11) "23452345235"
This seams obvious, but what I found the most was how to manipulate existing XML and now I wish to build from ground zero. The source is a database converted into an Array. The root is a single "menu" and all child elements are called "item". The structure is defined by the value of "parent" property and "code" property.
item[0] ("code"=>"first" "somevar"=>"somevalue")
item[1] ("code"=>"second", "parent"=>"first" "somevar"=>"othervalue")
Means item[1] is a child of item[0].
<menu>
<item code="first" somevar="somevalue">
<item code="second" somevar="othervalue" />
</item>
</menu>
There will be only two levels of items this time, maybe later I'll expand the capabilities to "n" levels...
I tried with SimpleXML, but it seams is too simple. So I tried with DOMDocument, but I'm stuck creating new elements...
$domMenu = new DOMDocument();
$domMenu->createElement("menu");
... creating the $domItem as a DOMElement with attributes ...
$domMenu->menu->appendChild($domItem);
This generates an error, it seams "menu" is not seen as an DOMElement. Should I use getElements methods or there is a better way of build this XML?
You did not append the menu element to the DOM. And DOM does not map element names to object properties like SimpleXML. The root element is accessible using the DOMDocument::$documentElement property.
$domMenu = new DOMDocument();
$domMenu->appendChild(
$menuNode = $domMenu->createElement("menu")
);
... creating the $domItem as a DOMElement with attributes ...
$menuNode->appendChild($domItem);
In you case I would suggest using xpath to find the parent node for the itemNode and if not found let the function call itself (recursion) to append the parent element first. If here is not parent item, append the node to the document element.
$data = [
["code"=>"second", "parent"=>"first", "somevar"=>"othervalue"],
["code"=>"first", "somevar"=>"somevalue"]
];
function appendItem($xpath, $items, $item) {
// create the new item node
$itemNode = $xpath->document->createElement('item');
$itemNode->setAttribute('code', $item['code']);
$itemNode->setAttribute('somevar', $item['somevar']);
$parentCode = isset($item['parent']) ? $item['parent'] : NULL;
// does it have a parent and exists this parent in the $items array
if (isset($parentCode) && isset($items[$parentCode])) {
// fetch the existing parent
$nodes = $xpath->evaluate('//item[#code = "'.$parentCode.'"]');
if ($nodes->length > 0) {
$parentNode = $nodes->item(0);
} else {
// parent node not found create it
$parentNode = appendItem($xpath, $items, $items[$parentCode]);
}
} else {
$parentNode = $xpath->document->documentElement;
}
$parentNode->appendChild($itemNode);
return $itemNode;
}
$dom = new DOMDocument();
$xpath = new DOMXpath($dom);
$dom->appendChild(
$dom->createElement("menu")
);
// build an indexed list using the "code" values
$items = [];
foreach ($data as $item) {
$items[$item['code']] = $item;
}
foreach ($items as $item) {
// check if the item has already been added
if ($xpath->evaluate('count(//item[#code = "'.$item['code'].'"])') == 0) {
// add it
appendItem($xpath, $items, $item);
}
}
$dom->formatOutput = TRUE;
echo $dom->saveXml();
Output:
<?xml version="1.0"?>
<menu>
<item code="first" somevar="somevalue">
<item code="second" somevar="othervalue"/>
</item>
</menu>
$xml = new SimpleXMLElement('<menu/>');
array_walk_recursive($array, array ($xml, 'addChild'));
print $xml->asXML();