I honestly tried to find a solution for php, but a lot of threads sound similar, but are not applicable for me or are for completely different languages.
I want to split an xml file based on nodes. Ideally multiple nodes, but of course one is enough and could be applied multiple times.
e.g. I want to split this by the tag <thingy> and <othernode>:
<root>
<stuff />
<thingy><othernode>one</othernode></thingy>
<thingy><othernode>two</othernode></thingy>
<thingy>
<othernode>three</othernode>
<othernode>four</othernode>
</thingy>
<some other data/>
</root>
Ideally I want to have 4 xmlstrings of type:
<root>
<stuff />
<thingy><othernode>CONTENT</othernode></thingy>
<some other data/>
</root>
With CONTENT being one, two, three and four. Plottwist: CONTENT can also be a whole subtree. Of course it all also can be filled with various namespaces and tag prefixes (like <q1:node/>. Formatting is irrelevant for me.
I tried SimpleXml, but it lacks the possiblity to write into dom easily
I tried DomDocument, but all what I do seems to destroy some links/relation of parent/child nodes in some way.
I tried XmlReader/Writer, but that is extremely hard to maintain and combine (at least for me).
So far my best guess is something with DomDocument, node cloning and removing everything but one node?
Interesting question.
If I get it right, it is given that <othernode> is always a child of <thingy> and the split is for each <othernode> at the place of the first <thingy> in the original document.
DOMDocument appeared useful in this case, as it allows to easily move nodes around - including all its children.
Given the split on a node-list (from getElementsByTagName()):
echo "---\n";
foreach ($split($doc->getElementsByTagName('othernode')) as $doc) {
echo $doc->saveXML(), "---\n";
}
When moving all <othernode> elements into a DOMDocumentFragement of its own while cleaning up <thingy> parent elements when emptied (unless the first anchor element) and then temporarily bring each of them back into the DOMDocument:
$split = static function (DOMNodeList $nodes): Generator {
while (($element = $nodes->item(0)) && $element instanceof DOMElement) {
$doc ??= $element->ownerDocument;
$basin ??= $doc->createDocumentFragment();
$anchor ??= $element->parentNode;
[$parent] = [$element->parentNode, $basin->appendChild($element)];
$parent->childElementCount || $parent === $anchor || $parent->parentNode->removeChild($parent);
}
if (empty($anchor)) {
return;
}
assert(isset($basin, $doc));
while ($element = $basin->childNodes->item(0)) {
$element = $anchor->appendChild($element);
yield $doc;
$anchor->removeChild($element);
}
};
This results in the following split:
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>one</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>two</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>three</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>four</othernode></thingy>
<some other="data"/>
</root>
---
Related
I've been battling with this all day :(
Although I found answers for similar questions they don't update an existing XML, they create a new XML.
Any help would be very much appreciated.
This is the XML I'm loading and trying to sort just the images->image nodes:
<?xml version="1.0"?>
<stuff>
<other_nodes>
</other_nodes>
<images>
<image><sorted_number><![CDATA[1]]></sorted_number></image>
<image><sorted_number><![CDATA[3]]></sorted_number></image>
<image><sorted_number><![CDATA[2]]></sorted_number></image>
</images>
</stuff>
//load the xml into a var
$theXML = //load the xml from the database
$imageNode = $theXML->images;
//sort the images into sorted order
$d = $imageNode;
// turn into array
$e = array();
foreach ($d->image as $image) {
$e[] = $image;
}
// sort the array
usort($e, function($a, $b) {
return $a->sorted_number - $b->sorted_number;
});
//now update the xml in the correct order
foreach ($e as $node) {
//???unsure how to update the images node in my XML
}
SimpleXML is too simple for your task. There is no easy way to reorder nodes. Basically, after your sorting routine, you have to reconstruct <image> nodes, but you have CDATA inside, and SimpleXML can't directly add CDATA value.
If you want try by this way, here you can find a cool SimpleXML class extension that add CDATA property, but also this solution use DOMDocument.
Basically, IMHO, since every solution require DOM, the best way is to use directly DOMDocument and — eventually — (re)load XML with SimpleXML after transformation:
$dom = new DOMDocument();
$dom->loadXML( $xml, LIBXML_NOBLANKS );
$dom->formatOutput = True;
$images = $dom->getElementsByTagName( 'image' );
/* This is the same as your array conversion: */
$sorted = iterator_to_array( $images );
/* This is your sorting routine adapted to DOMDocument: */
usort( $sorted, function( $a, $b )
{
return
$a->getElementsByTagName('sorted_number')->item(0)->nodeValue
-
$b->getElementsByTagName('sorted_number')->item(0)->nodeValue;
});
/* This is the core loop to “replace” old nodes: */
foreach( $sorted as $node ) $images->item(0)->parentNode->appendChild( $node );
echo $dom->saveXML();
ideone demo
The main routine add sorted nodes as child to existing <images> node. Please note that there is no need to pre-remove old childs: since we refer to same object, by appending a node in fact we remove it from its previous position.
If you want obtain a SimpleXML object, at the end of above code you can append this line:
$xml = simplexml_load_string( $dom->saveXML() );
Consider an XSLT solution using its <xsl:sort>. As information, XSLT (whose script is a well-formed XML file) is a declarative, special-purpose programming language (same type as SQL), used specifically to manipulate XML documents and sorting is one type of manipulation. Often used as a stylesheet to render XML content into HTML, XSLT is actually a language.
Most general-purpose languages including PHP (xsl extension), Python (lxml module), Java (javax.xml), Perl (libxml), C# (System.Xml), and VB (MSXML) maintain XSLT 1.0 processors. And various external executable processors like Xalan and Saxon (the latter of which can run XSLT 2.0 and recently 3.0) are also available -which of course PHP can call with exec(). Below embeds XSLT as a string variable but can very easily be loaded from an external .xsl or .xslt file.
// Load the XML source and XSLT file
$doc = new DOMDocument();
$doc->loadXML($xml);
$xsl = new DOMDocument;
$xslstr = '<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes"
cdata-section-elements="sorted_number" />
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM (COPIES ALL CONTENT AS IS) -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- SORT IMAGE CHILDREN IN EACH IMAGES NODE -->
<xsl:template match="images">
<xsl:copy>
<xsl:apply-templates select="image">
<xsl:sort select="sorted_number" order="ascending" data-type="number"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:transform>';
$xsl->loadXML($xslstr);
// Configure the processor
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
// Transform XML source
$newXml = $proc->transformToXML($doc);
echo $newXml;
Result (notice <![CData[]]> being preserved)
<?xml version="1.0" encoding="UTF-8"?>
<stuff>
<other_nodes/>
<images>
<image>
<sorted_number><![CDATA[1]]></sorted_number>
</image>
<image>
<sorted_number><![CDATA[2]]></sorted_number>
</image>
<image>
<sorted_number><![CDATA[3]]></sorted_number>
</image>
</images>
</stuff>
Before going deeper, is a save of the sorted state really necessary? Like in a database, you can always sort items when retrieving them, same here with the code you have already written.
That said, "updating" in your case means delete all <image> nodes and add them back in order.
Update:
see fusion3k's answer, that it is not necessary to delete nodes, but just append them. I'd suggest to go with his solution.
You are using SimpleXml, which does not provide methods for copying nodes. You will need to re-create every single node, child-node, attribute.
Your XML looks simple, but I guess it is an example and your real XML is more complex. Then rather use DOM and its importNode() method, which can copy complex nodes, including all their attributes and children.
On the other hand, SimpleXml to me feels much easier, so I combine both:
$xml = simplexml_load_string($x); // assume XML in $x
$images = $xml->xpath("/stuff/images/image");
usort($images, function ($a, $b){
return strnatcmp($a->sorted_number, $b->sorted_number);
});
Comments:
xpath() is a quick way to get all items into an array of SimpleXml objects.
$images is sorted now, but we can't delete the original nodes, because $images holds references to these nodes.
This is why we need to save $images to a new, temporary document.
$tmp = new DOMDocument('1.0', 'utf-8');
$tmp->loadXML("<images />");
// add image to $tmp, then delete it from $xml
foreach($images as $image) {
$node = dom_import_simplexml($image); // make DOM from SimpleXml
$node = $tmp->importNode($node, TRUE); // import and append in $tmp
$tmp->getElementsByTagName("images")->item(0)->appendChild($node);
unset($image[0]); // delete image from $xml
}
Comments:
using DOM now, because I can copy nodes with importNode()
at this point, $tmp has all the <image> nodes in the desired order, $xml has none.
To copy nodes back from $tmp to $xml, we need to import $xml into DOM:
$xml = dom_import_simplexml($xml)->ownerDocument;
foreach($tmp->getElementsByTagName('image') as $image) {
$node = $xml->importNode($image, TRUE);
$xml->getElementsByTagName("images")->item(0)->appendChild($node);
}
// output...
echo $xml->saveXML();
see it in action: https://eval.in/535800
I have the following code which works but seems like the incorrect way to implement this. Disregard the "...." that is all extra stuff we need not be concerned by. The issue I was having was that Sup was an array some of the time and other times it was just a value (or so print_r claimed, I thought/hoped it would have just been a one element array).
$users is a simpleXMLElement.
foreach($users as $user) {
if ($user->InstSup->Sup[1] == '') {
foreach($user->InstSup as $affid) {
....
} else {
foreach($user->InstSup->Sup as $affid) {
Here are the varying instances...
<Users>
<User>
<InstSup><Sup>1</Sup></InstSup>
</User>
<User>
<InstSup><Sup>2</Sup><Sup>3</Sup><Sup>4</Sup><Sup>5</Sup></InstSup>
</User>
</Users>
Thanks.
First of all, don't trust the output of print_r (or var_dump) when you deal with SimpleXMLElements. It's not showing the whole picture, better take a look at the XML as-is, for example with the asXML() method.
Now to the problem you've got. When you want to have just the list (so to speak an "array") of the <Sup> elements that are children of <InstSup>, you better query the document with Xpath. It's fairly straight forward and gives you the array you want:
$users = new SimpleXMLElement($buffer);
$sups = $users->xpath('/Users/User/InstSup/Sup');
foreach ($sups as $index => $sup) {
printf("#%d: %s (%s)\n", $index, $sup, $sup->asXML());
}
This creates the following output:
#0: 1 (<Sup>1</Sup>)
#1: 2 (<Sup>2</Sup>)
#2: 3 (<Sup>3</Sup>)
#3: 4 (<Sup>4</Sup>)
#4: 5 (<Sup>5</Sup>)
And this is the $buffer to complete the example:
$buffer = <<<XML
<Users>
<User>
<InstSup><Sup>1</Sup></InstSup>
</User>
<User>
<InstSup><Sup>2</Sup><Sup>3</Sup><Sup>4</Sup><Sup>5</Sup></InstSup>
</User>
</Users>
XML;
As the line-up in the output shows, even though the <Sup> elements are inside (same-named but) different parent elements, the XPath query expression
/Users/User/InstSup/Sup
returns all the elements in that path from the document.
So hopefully you now better understand that it's not only that print_r is not that useful because it doesn't show the whole picture, but also by understanding how the document has it's nodes ordered, you can even more easily query the data with an Xpath expression.
I have problems to deal with XML in PHP. What i want is to remove a not needed element and replace it with a other one.
Let's say the XML looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<software>
<info>
<version>1.2</version>
<lasterror>1386680712</lasterror>
</info>
<decryption>
<funcgroup siglength="86">
<func>
<name>Mk</name>
<args>$a</args>
<code>XXXX</code>
</func>
<func>
<name>Nk</name>
<args>$a,$b</args>
<code>XXXX</code>
</func>
</funcgroup>
</decryption>
</software>
PHP Code:
$domtree = new DOMDocument('1.0', 'UTF-8');
$domtree->loadXML(file_get_contents('test.xml'));
$thedocument = $domtree->documentElement;
$list = $thedocument->getElementsByTagName('funcgroup');
foreach ($list as $domElement) {
$sig_length = $domElement->getAttribute('siglength');
if($sig_length == $signature_length) {
$domElement->parentNode->removeChild($domElement);
break;
}
}
$some_stuff = $domtree->getElementsByTagName('software');
$some_stuff = $domtree->getElementsByTagName('decryption');
$funcgroup = $domtree->appendChild($domtree->createElement('funcgroup'));
$funcgroup->setAttribute('siglength', $signature_length);
$func = $funcgroup->appendChild($domtree->createElement('func'));
$func->appendChild($domtree->createElement('name', $outer_element[0]));
$func->appendChild($domtree->createElement('args', $outer_element[1]));
$code = $func->appendChild($domtree->createElement('code'));
$code->appendChild($domtree->createTextNode($outer_element[2]));
Note: I removed some stuff otherwise it would get too complicated i guess. The above code shows what i do, but without some other loops and variables which are not needed in that question. Every variable (and array) is defined. So don't worry about that.
What i want is to remove the whole <funcgroup siglength="86"> in order to replace it with a different one.
The script works fine, but there is one problem in the output XML. It looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<software>
<info>
<version>6.3</version>
<lasterror>1386680712</lasterror>
</info>
<decryption/>
</software>
<funcgroup siglength="86">
<func>
<name>Nk</name>
<args>$a</args>
<code>YYYYY</code>
</func>
<func>
<name>Ok</name>
<args>$a,$b</args>
<code>YYYY</code>
</func>
</funcgroup>
As you can see, the closing software and decryption tags are on the wrong place now.
How can i fix that? I spent hours but can't find a working solution.
The problem is caused by the removeChild() since it works fine if i do not remove something.
You are adding your new child node to the document itself (instead of the decryption node), which is not what you want
$domtree->appendChild
Instead you should:
$decryption = $domtree->getElementsByTagName('decryption')->item(0);
$funcgroup = $decryption->appendChild($domtree->createElement('funcgroup'));
Edit:
You can edit the text value of the lasterror node by doin:
$domtree->getElementsByTagName('lasterror')->item(0)->firstChild->nodeValue = "New value";
Consult the documentation of the DOMNodeList and DOMNode class to see what else you can do with it.
I have xml with the following structure:
<?xml version="1.0"?>
<ONIXMessage xmlns="http://test.com/test">
...data...
</ONIXMessage>
I need to change xmlns attribute with my own value. How can I do it? Preferably with DOMDocument class.
I need to change xmlns attribute with my own value. How can I do it? Preferably with DOMDocument class.
This by design is not possible. Every DOMDocument has a single root/document element.
In your example XML that root element is:
{http://test.com/test}ONIXMessage
I write the element name as an expanded-name with the convention to put the namespace URI in front enclosed in angle brackets.
Writing the element name in a form that shows it's entire expanded-name also demonstrates that you do not only want to change the value of an attribute here, but you want to change the namespace URI of a specific element. So you want to change the element name. And probably also any child element name it contains if the child is in the same namespace.
As the xmlns attribute only reflects the namespace URI of the element itself, you can not change it. Once it is set in DOMDocument, you can not change it.
You can replace the whole element, but the namespace of the children is not changed either then. Here an example with an XML similar to yours with only textnode children (which aren't namespaced):
$xml = <<<EOD
<?xml version="1.0"?>
<ONIXMessage xmlns="uri:old">
...data...
</ONIXMessage>
EOD;
$doc = new DOMDocument();
$doc->loadXML($xml);
$newNode = $doc->createElementNS('uri:new', $doc->documentElement->tagName);
$oldNode = $doc->replaceChild($newNode, $doc->documentElement);
foreach(iterator_to_array($oldNode->childNodes, true) as $child) {
$doc->documentElement->appendChild($child);
}
Resulting XML output is:
<?xml version="1.0"?>
<ONIXMessage xmlns="uri:new">
...data...
</ONIXMessage>
Changing the input XML now to something that contains children like
<?xml version="1.0"?>
<ONIXMessage xmlns="uri:old">
<data>
...data...
</data>
</ONIXMessage>
Will then create the following output, take note of the old namespace URI that pops up now again:
<?xml version="1.0"?>
<ONIXMessage xmlns="uri:new">
<default:data xmlns:default="uri:old">
...data...
</default:data>
</ONIXMessage>
As you can see DOMDocument does not provide a functionality to replace namespace URIs for existing elements out of the box. But hopefully with the information provided in this answer so far it is more clear why exactly it is not possible to change that attributes value if it already exists.
The expat based parser in the libxml based PHP extension does allow to "change" existing attribute values regardless if it is an xmlns* attribute or not - because it just parses the data and you can process it on the fly with it.
A working example is:
$xml = <<<EOD
<?xml version="1.0" encoding="utf-8"?>
<ONIXMessage xmlns="uri:old">
<data>
...data...
</data>
</ONIXMessage>
EOD;
$uriReplace = [
'uri:old' => 'uri:new',
];
$parser = xml_parser_create('UTF-8');
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
xml_set_default_handler($parser, function ($parser, $data) {
echo $data;
});
xml_set_element_handler($parser, function ($parser, $name, $attribs) use ($xml, $uriReplace) {
$selfClosing = '/>' === substr($xml, xml_get_current_byte_index($parser), 2);
echo '<', $name;
foreach ($attribs as $name => $value) {
if (substr($name, 0, 5) === 'xmlns' && isset($uriReplace[$value])) {
$value = $uriReplace[$value];
}
printf(' %s="%s"', $name, htmlspecialchars($value, ENT_COMPAT | ENT_XML1));
}
echo $selfClosing ? '/>' : '>';
}, function ($parser, $name) use ($xml) {
$selfClosing = '/>' === substr($xml, xml_get_current_byte_index($parser) - 2, 2);
if ($selfClosing) return;
echo '</', $name, '>';
});
xml_parse($parser, $xml, true);
xml_parser_free($parser);
The output then has transparently changed the namespace URI from uri:old to uri:new:
<ONIXMessage xmlns="uri:new">
<data>
...data...
</data>
</ONIXMessage>
As this example shows, each XML feature you make use of in your XML needs to be handled with the parser. For example the XML declaration is missing. However these can be added by implementing missing handler classbacks (e.g. for CDATA sections) or by outputting missing output (e.g. for the "missing" XML declaration). I hope this is helpful and shows you an alternative way on how to change even these values that are not intended to change.
I was tesing with a simple example of how to display XML in browser using PHP and found this example which works good
<?php
$xml = new DOMDocument("1.0");
$root = $xml->createElement("data");
$xml->appendChild($root);
$id = $xml->createElement("id");
$idText = $xml->createTextNode('1');
$id->appendChild($idText);
$title = $xml->createElement("title");
$titleText = $xml->createTextNode('Valid');
$title->appendChild($titleText);
$book = $xml->createElement("book");
$book->appendChild($id);
$book->appendChild($title);
$root->appendChild($book);
$xml->formatOutput = true;
echo "<xmp>". $xml->saveXML() ."</xmp>";
$xml->save("mybooks.xml") or die("Error");
?>
It produces the following output:
<?xml version="1.0"?>
<data>
<book>
<id>1</id>
<title>Valid</title>
</book>
</data>
Now I have got two questions regarding how the output should look like.
The first line in the xml file '', should not be displayed, that is it should be hidden
How can I display the TextNode in the next line. In total I am exepecting an output in this fashion
<data>
<book>
<id>1</id>
<title>
Valid
</title>
</book>
</data>
Is that possible to get the desired output, if so how can I accomplish that.
Thanks
To skip the XML declaration you can use the result of saveXML on the root node:
$xml_content = $xml->saveXML($root);
file_put_contents("mybooks.xml", $xml_content) or die("cannot save XML");
Please note that saveXML(node) has a different output from saveXML().
First question:
here is my post where all usable threads with answers are listed: How do you exclude the XML prolog from output?
Second question:
I don't know of any PHP function that outputs text nodes like that.
You could:
read xml using DomDocument and save each node as string
iterate trough nodes
detect text nodes and add new lines to xml string manually
At the end you would have the same XML with text node values in new line:
<node>
some text data
</node>