I have a series of arbitrary XML Documents that I need to parse and perform some string manipulation on each element within the document.
For example:
<sample>
<example>
<name>David</name>
<age>21</age>
</example>
</sample>
For the nodes name and age I might want to run it through a function such as strtoupper to change the case.
I am struggling to do this in a generic way. I have been trying to use RecursiveIteratorIterator with SimpleXMLIterator to achieve this but I am unable to get the parent key to update the xml document:
$iterator = new RecursiveIteratorIterator(new SimpleXMLIterator($xml->asXML()));
foreach ($iterator as $k=> $v) {
$iterator->$k = strtoupper($v);
}
This fails because $k in this example is 'name' so it's trying to do:
$xml->name = strtoupper($value);
When it needs to be
$xml->example->name = strtoupper($value);
As the schema of the documents change I want to use something generic to process them all but I don't know how to get the key.
Is this possible with Spl iterators and simplexml?
You are most likely looking for something that I worded SimpleXML-Self-Reference once. It does work here, too.
And yes, Simplexml has support for SPL and RecursiveIteratorIterator.
So first of all, you can directly make $xml work with tree-traversal by opening the original XML that way:
$buffer = <<<BUFFER
<sample>
<example>
<name>David</name>
<age>21</age>
</example>
</sample>
BUFFER;
$xml = simplexml_load_string($buffer, 'SimpleXMLIterator');
// #################
That allows you to do all the standard modifications (as SimpleXMLIterator is as well a SimpleXMLElement) but also the recursive tree-traversal to modify each leaf-node:
$iterator = new RecursiveIteratorIterator($xml);
foreach ($iterator as $node) {
$node[0] = strtoupper($node);
// ###
}
This exemplary recursive iteration over all leaf-nodes shows how to set the self-reference, the key here is to assign to $node[0] as outlined in the above link.
So all left is to output:
$xml->asXML('php://output');
Which then simply gives:
<?xml version="1.0"?>
<sample>
<example>
<name>DAVID</name>
<age>21</age>
</example>
</sample>
And that's the whole example and it should also answer your question.
Related
I honestly tried to find a solution for php, but a lot of threads sound similar, but are not applicable for me or are for completely different languages.
I want to split an xml file based on nodes. Ideally multiple nodes, but of course one is enough and could be applied multiple times.
e.g. I want to split this by the tag <thingy> and <othernode>:
<root>
<stuff />
<thingy><othernode>one</othernode></thingy>
<thingy><othernode>two</othernode></thingy>
<thingy>
<othernode>three</othernode>
<othernode>four</othernode>
</thingy>
<some other data/>
</root>
Ideally I want to have 4 xmlstrings of type:
<root>
<stuff />
<thingy><othernode>CONTENT</othernode></thingy>
<some other data/>
</root>
With CONTENT being one, two, three and four. Plottwist: CONTENT can also be a whole subtree. Of course it all also can be filled with various namespaces and tag prefixes (like <q1:node/>. Formatting is irrelevant for me.
I tried SimpleXml, but it lacks the possiblity to write into dom easily
I tried DomDocument, but all what I do seems to destroy some links/relation of parent/child nodes in some way.
I tried XmlReader/Writer, but that is extremely hard to maintain and combine (at least for me).
So far my best guess is something with DomDocument, node cloning and removing everything but one node?
Interesting question.
If I get it right, it is given that <othernode> is always a child of <thingy> and the split is for each <othernode> at the place of the first <thingy> in the original document.
DOMDocument appeared useful in this case, as it allows to easily move nodes around - including all its children.
Given the split on a node-list (from getElementsByTagName()):
echo "---\n";
foreach ($split($doc->getElementsByTagName('othernode')) as $doc) {
echo $doc->saveXML(), "---\n";
}
When moving all <othernode> elements into a DOMDocumentFragement of its own while cleaning up <thingy> parent elements when emptied (unless the first anchor element) and then temporarily bring each of them back into the DOMDocument:
$split = static function (DOMNodeList $nodes): Generator {
while (($element = $nodes->item(0)) && $element instanceof DOMElement) {
$doc ??= $element->ownerDocument;
$basin ??= $doc->createDocumentFragment();
$anchor ??= $element->parentNode;
[$parent] = [$element->parentNode, $basin->appendChild($element)];
$parent->childElementCount || $parent === $anchor || $parent->parentNode->removeChild($parent);
}
if (empty($anchor)) {
return;
}
assert(isset($basin, $doc));
while ($element = $basin->childNodes->item(0)) {
$element = $anchor->appendChild($element);
yield $doc;
$anchor->removeChild($element);
}
};
This results in the following split:
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>one</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>two</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>three</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>four</othernode></thingy>
<some other="data"/>
</root>
---
Trying to make an API for currency conversion,
Need to select a specific currency and delete it from the xml file...
XML file looks like this:
<currencies>
<currency>
<ccode>CAD</ccode>
<cname>Canadian Dollar</cname>
<cntry>Canada</cntry>
</currency>
<currency>
<ccode>CHF</ccode>
<cname>Swiss Franc</cname>
<cntry>Liechtenstein, Switzerland</cntry>
</currency>
<currency>
<ccode>CNY</ccode>
<cname>Yuan Renminbi</cname>
<cntry>China</cntry>
</currency>
...etc
I need to use php to select and delete the specific currency, at the moment trying this:
<?php
$dom = new DOMDocument("1.0", "utf-8");
$dom->load('data/ccodes.xml');
$nodes = $dom->getElementsByTagName("currencies");
foreach ($nodes as $n){
if($n->getAttribute("ccode") == "CAD") {
$parent = $n->parentNode;
$parent->removeChild($n);
}
}
echo $dom->saveXML();
?>
But It's not working.... I'm pretty sure it's really simple but I have no idea what I'm doing with coding... :/
Need to make it so I can just change CAD to whatever to delete any currency I need to...
Your iterating the root node currencies but I think you meant to iterate the currency nodes. ccode is not an attribute node, but a child element node. Even if you iterate currency nodes with the correct condition it would still not fully work.
DOMElement::getElementsByTagName() returns a live result. Inside the loop you modify the DOM and the list is modified as well. You could us a for loop to iterate it backwards, use iterator_to_array() to materialize the node list into an array or use Xpath. DOMXpath::evaluate() returns a node list, but it is not a live result. So the list will not change if you modify the document.
$document = new DOMDocument();
//$document->load('data/ccodes.xml');
$document->loadXml($xml);
$xpath = new DOMXpath($document);
foreach ($xpath->evaluate('/currencies/currency[ccode="CAD"]') as $node) {
$node->parentNode->removeChild($node);
}
echo $document->saveXML();
Output:
<?xml version="1.0"?>
<currencies>
<currency>
<ccode>CHF</ccode>
<cname>Swiss Franc</cname>
<cntry>Liechtenstein, Switzerland</cntry>
</currency>
<currency>
<ccode>CNY</ccode>
<cname>Yuan Renminbi</cname>
<cntry>China</cntry>
</currency>
</currencies>
I've been battling with this all day :(
Although I found answers for similar questions they don't update an existing XML, they create a new XML.
Any help would be very much appreciated.
This is the XML I'm loading and trying to sort just the images->image nodes:
<?xml version="1.0"?>
<stuff>
<other_nodes>
</other_nodes>
<images>
<image><sorted_number><![CDATA[1]]></sorted_number></image>
<image><sorted_number><![CDATA[3]]></sorted_number></image>
<image><sorted_number><![CDATA[2]]></sorted_number></image>
</images>
</stuff>
//load the xml into a var
$theXML = //load the xml from the database
$imageNode = $theXML->images;
//sort the images into sorted order
$d = $imageNode;
// turn into array
$e = array();
foreach ($d->image as $image) {
$e[] = $image;
}
// sort the array
usort($e, function($a, $b) {
return $a->sorted_number - $b->sorted_number;
});
//now update the xml in the correct order
foreach ($e as $node) {
//???unsure how to update the images node in my XML
}
SimpleXML is too simple for your task. There is no easy way to reorder nodes. Basically, after your sorting routine, you have to reconstruct <image> nodes, but you have CDATA inside, and SimpleXML can't directly add CDATA value.
If you want try by this way, here you can find a cool SimpleXML class extension that add CDATA property, but also this solution use DOMDocument.
Basically, IMHO, since every solution require DOM, the best way is to use directly DOMDocument and — eventually — (re)load XML with SimpleXML after transformation:
$dom = new DOMDocument();
$dom->loadXML( $xml, LIBXML_NOBLANKS );
$dom->formatOutput = True;
$images = $dom->getElementsByTagName( 'image' );
/* This is the same as your array conversion: */
$sorted = iterator_to_array( $images );
/* This is your sorting routine adapted to DOMDocument: */
usort( $sorted, function( $a, $b )
{
return
$a->getElementsByTagName('sorted_number')->item(0)->nodeValue
-
$b->getElementsByTagName('sorted_number')->item(0)->nodeValue;
});
/* This is the core loop to “replace” old nodes: */
foreach( $sorted as $node ) $images->item(0)->parentNode->appendChild( $node );
echo $dom->saveXML();
ideone demo
The main routine add sorted nodes as child to existing <images> node. Please note that there is no need to pre-remove old childs: since we refer to same object, by appending a node in fact we remove it from its previous position.
If you want obtain a SimpleXML object, at the end of above code you can append this line:
$xml = simplexml_load_string( $dom->saveXML() );
Consider an XSLT solution using its <xsl:sort>. As information, XSLT (whose script is a well-formed XML file) is a declarative, special-purpose programming language (same type as SQL), used specifically to manipulate XML documents and sorting is one type of manipulation. Often used as a stylesheet to render XML content into HTML, XSLT is actually a language.
Most general-purpose languages including PHP (xsl extension), Python (lxml module), Java (javax.xml), Perl (libxml), C# (System.Xml), and VB (MSXML) maintain XSLT 1.0 processors. And various external executable processors like Xalan and Saxon (the latter of which can run XSLT 2.0 and recently 3.0) are also available -which of course PHP can call with exec(). Below embeds XSLT as a string variable but can very easily be loaded from an external .xsl or .xslt file.
// Load the XML source and XSLT file
$doc = new DOMDocument();
$doc->loadXML($xml);
$xsl = new DOMDocument;
$xslstr = '<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes"
cdata-section-elements="sorted_number" />
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM (COPIES ALL CONTENT AS IS) -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- SORT IMAGE CHILDREN IN EACH IMAGES NODE -->
<xsl:template match="images">
<xsl:copy>
<xsl:apply-templates select="image">
<xsl:sort select="sorted_number" order="ascending" data-type="number"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:transform>';
$xsl->loadXML($xslstr);
// Configure the processor
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
// Transform XML source
$newXml = $proc->transformToXML($doc);
echo $newXml;
Result (notice <![CData[]]> being preserved)
<?xml version="1.0" encoding="UTF-8"?>
<stuff>
<other_nodes/>
<images>
<image>
<sorted_number><![CDATA[1]]></sorted_number>
</image>
<image>
<sorted_number><![CDATA[2]]></sorted_number>
</image>
<image>
<sorted_number><![CDATA[3]]></sorted_number>
</image>
</images>
</stuff>
Before going deeper, is a save of the sorted state really necessary? Like in a database, you can always sort items when retrieving them, same here with the code you have already written.
That said, "updating" in your case means delete all <image> nodes and add them back in order.
Update:
see fusion3k's answer, that it is not necessary to delete nodes, but just append them. I'd suggest to go with his solution.
You are using SimpleXml, which does not provide methods for copying nodes. You will need to re-create every single node, child-node, attribute.
Your XML looks simple, but I guess it is an example and your real XML is more complex. Then rather use DOM and its importNode() method, which can copy complex nodes, including all their attributes and children.
On the other hand, SimpleXml to me feels much easier, so I combine both:
$xml = simplexml_load_string($x); // assume XML in $x
$images = $xml->xpath("/stuff/images/image");
usort($images, function ($a, $b){
return strnatcmp($a->sorted_number, $b->sorted_number);
});
Comments:
xpath() is a quick way to get all items into an array of SimpleXml objects.
$images is sorted now, but we can't delete the original nodes, because $images holds references to these nodes.
This is why we need to save $images to a new, temporary document.
$tmp = new DOMDocument('1.0', 'utf-8');
$tmp->loadXML("<images />");
// add image to $tmp, then delete it from $xml
foreach($images as $image) {
$node = dom_import_simplexml($image); // make DOM from SimpleXml
$node = $tmp->importNode($node, TRUE); // import and append in $tmp
$tmp->getElementsByTagName("images")->item(0)->appendChild($node);
unset($image[0]); // delete image from $xml
}
Comments:
using DOM now, because I can copy nodes with importNode()
at this point, $tmp has all the <image> nodes in the desired order, $xml has none.
To copy nodes back from $tmp to $xml, we need to import $xml into DOM:
$xml = dom_import_simplexml($xml)->ownerDocument;
foreach($tmp->getElementsByTagName('image') as $image) {
$node = $xml->importNode($image, TRUE);
$xml->getElementsByTagName("images")->item(0)->appendChild($node);
}
// output...
echo $xml->saveXML();
see it in action: https://eval.in/535800
I have an xml file that I need to parse through and get values. Below is a snippit of xml
<?xml version="1.0"?>
<mobile>
<userInfo>
</userInfo>
<CATALOG>
<s0>
<SUB0>
<DESCR>Paranormal Studies</DESCR>
<SUBJECT>147</SUBJECT>
</SUB0>
</s0>
<sA>
<SUB0>
<DESCR>Accounting</DESCR>
<SUBJECT>ACCT</SUBJECT>
</SUB0>
<SUB1>
<DESCR>Accounting</DESCR>
<SUBJECT>ACCTG</SUBJECT>
</SUB1>
<SUB2>
<DESCR>Anatomy</DESCR>
<SUBJECT>ANATOMY</SUBJECT>
</SUB2>
<SUB3>
<DESCR>Anthropology</DESCR>
<SUBJECT>ANTHRO</SUBJECT>
</SUB3>
<SUB4>
<DESCR>Art</DESCR>
<SUBJECT>ART</SUBJECT>
</SUB4>
<SUB5>
<DESCR>Art History</DESCR>
<SUBJECT>ARTHIST</SUBJECT>
</SUB5>
</sA>
So, I need to grab all the child elements of <sA> and then there are more elements called <sB> etc
But I do not know how to get all of the child elements with <sA>, <sB>, etc.
How about this:
$xmlstr = LoadTheXMLFromSomewhere();
$xml = new simplexml_load_string($xmlstr);
$result = $xml->xpath('//sA');
foreach ($result as $node){
//do something with node
}
PHP does have a nice class to access XML, which is called SimpleXml for a reason, consider heavily using that if your code is going to access only a part of the XML (aka query the xml). Also, consider doing queries using XPath, which is the best way to do it
Notice that I did the example with sA nodes only, but you can configure your code for other node types really easily.
Hope I can help!
you should look into simplexml_load_string() as I'm pretty sure it would make your life a lot easier. It returns a StdObject that you can use like so:
$xml = simplexml_load_string(<your huge xml string>);
foreach ($xml->hpt_mobile->CATALOG->sA as $value){
// do things with sA children
}
$xml = new DOMDocument();
$xml->load('path_to_xml');
$htp = $xml->getElementsByTagName('hpt_mobile')[0];
$catalog = $htp->getElementsByTagName('CATALOG')[0]
$nodes = $catalog->getElementsByTagName('sA')->childNodes;
I have an XML file loaded into a DOM document,
I wish to iterate through all 'foo' tags, getting values from every tag below it. I know I can get values via
$element = $dom->getElementsByTagName('foo')->item(0);
foreach($element->childNodes as $node){
$data[$node->nodeName] = $node->nodeValue;
}
However, what I'm trying to do, is from an XML like,
<stuff>
<foo>
<bar></bar>
<value/>
<pub></pub>
</foo>
<foo>
<bar></bar>
<pub></pub>
</foo>
<foo>
<bar></bar>
<pub></pub>
</foo>
</stuff>
iterate over every foo tag, and get specific bar or pub, and get values from there.
Now, how do I iterate over foo so that I can still access specific child nodes by name?
Not tested, but what about:
$elements = $dom->getElementsByTagName('foo');
$data = array();
foreach($elements as $node){
foreach($node->childNodes as $child) {
$data[] = array($child->nodeName => $child->nodeValue);
}
}
It's generally much better to use XPath to query a document than it is to write code that depends on knowledge of the document's structure. There are two reasons. First, there's a lot less code to test and debug. Second, if the document's structure changes it's a lot easier to change an XPath query than it is to change a bunch of code.
Of course, you have to learn XPath, but (most of) XPath isn't rocket science.
PHP's DOM uses the xpath_eval method to perform XPath queries. It's documented here, and the user notes include some pretty good examples.
Here's another (lazy) way to do it.
$data[][$node->nodeName] = $node->nodeValue;
With FluidXML you can query and iterate XML very easly.
$data = [];
$store_child = function($i, $fooChild) use (&$data) {
$data[] = [ $fooChild->nodeName => $fooChild->nodeValue ];
};
fluidxml($dom)->query('//foo/*')->each($store_child);
https://github.com/servo-php/fluidxml