How do I clone Distinct XML structures without data in PHP? - php

I have an XML document that looks like this:
<root>
<node/>
<node>
<sub>more</sub>
</node>
<node>
<sub>another</sub>
</node>
<node>value</node>
</root>
Here's my pseudo-code:
import xml.
create empty-xml.
foreach child of imported-xml-root-node,
recursively clone node structure without data.
if clone does not match one already in empty-xml,
then add clone to empty-xml.
I'm trying to get a result that looks like this:
<root>
<node/>
<node>
<sub/>
</node>
</root>
Note that my piddly example data is only 3 nodes deep. In production, there will be an unknown number of descendants, so an acceptable answer needs to handle variable node depths.
Failed Approaches
I have reviewed The DOMNode class which has a cloneNode method with a recursive option that I would like to use, although it would take some extra work to purge the data. But while the class contains a hasChildNodes function which returns a boolean, I can't find a way to actually return the collection of children.
$doc = new DOMDocument();
$doc->loadXML($xml);
$root_node = $doc->documentElement;
if ( $root_node->hasChildNodes() ) {
// looking for something like this:
// foreach ($root_node->children() as $child)
// $doppel = $child->cloneNode(true);
}
Secondly, I have tried my hand with the The SimpleXMLElement class which does have an awesome children method. Although it's lacking the recursive option, I built a simple function to surmount that. But the class is missing a clone/copyNode method, and my function is bloating into something nasty to compensate. Now I'm considering combining usage of the two classes so I've got access to both SimpleXMLElement::children and DOMDocument::cloneNode, but I can tell this is not going cleanly and surely this problem can be solved better.
$sxe = new SimpleXMLElement($xml);
$indentation = 0;
function getNamesRecursive( $xml, &$indentation )
{
$indentation++;
foreach($xml->children() as $child) {
for($i=0;$i<$indentation;$i++)
echo "\t";
echo $child->getName() . "\n";
getNamesRecursive($child,$indentation);
}
$indentation--;
}
getNamesRecursive($sxe,$indentation);

Consider XSLT, the special-purpose language designed to transform XML files. And PHP maintains an XSLT 1.0 processor. You simply need to keep items of position 1 and copy only its elements not text.
XSLT (save as .xsl file to use below in php)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" omit-xml-declaration="yes" />
<xsl:strip-space elements="*"/>
<!-- Identity Transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Remove any nodes position greater than 2 -->
<xsl:template match="*[position() > 2]"/>
<!-- Copy only tags -->
<xsl:template match="/*/*/*">
<xsl:copy/>
</xsl:template>
</xsl:transform>
PHP
// LOAD XML AND XSL FILES
$xml = new DOMDocument('1.0', 'UTF-8');
$xml->load('Input.xml');
$xslfile = new DOMDocument('1.0', 'UTF-8');
$xslfile->load('Script.xsl');
// TRANSFORM XML with XSLT
$proc = new XSLTProcessor;
$proc->importStyleSheet($xslfile);
$newXml = $proc->transformToXML($xml);
// ECHO OUTPUT STRING
echo $newXml;
# <root>
# <node/>
# <node>
# <sub/>
# </node>
# </root>
// NEW DOM OBJECT
$final = new DOMDocument('1.0', 'UTF-8');
$final->loadXML($newXml);

well here's my stinky solution. suggestions for improvements or completely new better answers are still very welcome.
$xml = '
<root>
<node/>
<node>
<sub>more</sub>
</node>
<node>
<sub>another</sub>
</node>
<node>value</node>
</root>
';
$doc = new DOMDocument();
$doc->loadXML($xml);
// clone without data
$empty_xml = new DOMDocument();
$empty_xml->appendChild($empty_xml->importNode($doc->documentElement));
function clone_without_data(&$orig, &$clone, &$clonedoc){
foreach ($orig->childNodes as $child){
if(get_class($child) === "DOMElement")
$new_node = $clone->appendChild($clonedoc->importNode($child));
if($child->hasChildNodes())
clone_without_data($child,$new_node,$clonedoc);
}
}
clone_without_data($doc->documentElement, $empty_xml->documentElement, $empty_xml);
// remove all duplicates
$distinct_structure = new DOMDocument();
$distinct_structure->appendChild($distinct_structure->importNode($doc->documentElement));
foreach ($empty_xml->documentElement->childNodes as $child){
$match = false;
foreach ($distinct_structure->documentElement->childNodes as $i => $element){
if ($distinct_structure->saveXML($element) === $empty_xml->saveXML($child)) {
$match = true;
break;
}
}
if (!$match)
$distinct_structure->documentElement->appendChild($distinct_structure->importNode($child,true));
}
$distinct_structure->formatOutput = true;
echo $distinct_structure->saveXML();
Which results in this output:
<?xml version="1.0"?>
<root>
<node/>
<node>
<sub/>
</node>
</root>

Related

How to merge two xml files by ID (in first as subnode value, in second as attribute)

I have two XML files with this structure:
first.xml
<items>
<item>
<id>foo</id>
<desc>lorem ipsum</desc>
</item>
<item>
<id>boo</id>
<desc>lorem ipsum</desc>
</item>
</items>
second.xml
<item_list>
<item id="foo">
<stock_quantity>20</stock_quantity>
</item>
<item id="boo">
<stock_quantity>11</stock_quantity>
</item>
</item_list>
and I need to combine them by the id so the ouput file would look like this:
output.xml
<items>
<item>
<id>foo</id>
<desc>lorem ipsum</desc>
<stock_quantity>20</stock_quantity>
</item>
<item>
<id>boo</id>
<desc>lorem ipsum</desc>
<stock_quantity>11</stock_quantity>
</item>
</items>
I need to use PHP and XML DOMDocument. Do you have any idea how to do this?
You can use simplexml library to achieve that,
// loading xml to object from file
$xml1 = simplexml_load_file("first.xml") or die("Error: Cannot create object");
$xml2 = simplexml_load_file("second.xml") or die("Error: Cannot create object");
// its core xml iterator for simplexml library
foreach ($xml1->children() as $items1) {
$id = trim($items1->id); // trim to check with id matched in 2.xml
foreach ($xml2->children() as $items2) { // iterating children of 2.xml
if ($items2[0]['id'] == $id) { // simply checking attribute of id in 2.xml with 1.xml's id value
foreach ($items2 as $key => $value) {
$items1->addChild($key, (string) ($value)); // adding children to 1.xml object
}
}
}
}
$xml1->asXml('output.xml'); // generating https://www.php.net/manual/en/simplexmlelement.asxml.php
Using DOMDocument and it's ability to copy nodes from one document to the other allows you to directly insert the node from the stock to the main XML.
Rather than looping to find the matching record, this also uses XPath to search for the matching record, the expression //item[#id='boo']/stock_quantity says find the <stock_quantity> element in the <item> element with an attribute of id='boo'
$main = new DOMDocument();
$main->load("main.xml");
$add = new DOMDocument();
$add->load("stock.xml");
$searchAdd = new DOMXPath($add);
// Find the list of items
$items = $main->getElementsByTagName("item");
foreach ( $items as $item ) {
// Exract the value of the id node
$id = $item->getElementsByTagName("id")[0]->nodeValue;
// Find the corresponding node in the stock file
$stockQty = $searchAdd->evaluate("//item[#id='{$id}']/stock_quantity");
// Import the <stock_quantity> node (and all contents)
$copy = $main->importNode($stockQty[0], true);
// Add the imported node
$item->appendChild($copy);
}
echo $main->saveXML();
Consider XSLT, the special-purpose language (like SQL) designed to transform XML files such as your specific end-use needs. Like many general-purpose languages, PHP can run XSLT 1.0 as a lower level layer using special libraries namely php-xsl class (requires the .ini extension enabled).
XSLT (save as .xsl file, a special .xml file; below assumes second XML in same directory)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- ADD NODE BY CORRESPONDING id VALUE -->
<xsl:template match="item">
<xsl:copy>
<xsl:variable name="curr_id" select="id"/>
<xsl:apply-templates select="#*|node()"/>
<xsl:copy-of select="document('second.xml')/item_list/item[#id = $curr_id]/*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
PHP (reference only first XML)
// Load the XML source and XSLT file
$xml = new DOMDocument;
$xml->load('first.xml');
$xsl = new DOMDocument;
$xsl->load('XSLTScript.xsl');
// Configure transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
// Transform XML source
$newXML = new DOMDocument;
$newXML = $proc->transformToXML($xml);
echo $newXML;
// Save output to file
$xmlfile = 'output.xml';
file_put_contents($xmlfile, $newXML);

PHP: remove node from xml by attribute

suppose I have an xml like this:
<products>
<product id="1">
<name>aaa</name>
<producturl>aaa</producturl>
<bigimage>aaa</bigimage>
<description>aaa</description>
<price>aaa</price>
<categoryid1>aaa</categoryid1>
<instock>aaa</instock>
</product>
<product id="2">
<name>aaa</name>
<producturl>aaa</producturl>
<bigimage>aaa</bigimage>
<description>aaa</description>
<price>aaa</price>
<categoryid1>aaa</categoryid1>
<instock>aaa</instock>
</product>
</products>
and I need to delete certain node depending on the id attribute, if this attribute is in an array.
I've tried different ways, but the xml is outputted always as the original one!
My code so far:
<?php header("Content-type: text/xml");
$url="http://www.aaa.it/aaa.xml";
$url=file_get_contents($url);
$array=array("1","4","5");
$doc=new SimpleXMLElement($url);
foreach($doc->product as $product){
if(!in_array($product['id'],$array)){
$dom=dom_import_simplexml($product);
$dom->parentNode->removeChild($dom);
// unset($doc->product->$product);
}
}
echo $doc->asXml(); ?>
Thanks a lot everyone.
Consider a partly XPath and XSLT solution, both siblings in the Extensible Stylesheet Family. XPath is first used to retrieve all current product ids which is then compared with current array of ids to keep using array_diff. XSLT is then iteratively built to remove nodes according to these unmatched ids. Removing nodes in XSLT requires simply an empty template match.
// Load the XML source
header("Content-type: text/xml");
$url="http://www.aaa.it/aaa.xml";
$url=file_get_contents($url);
$doc=new SimpleXMLElement($url);
// Retrieve all XML product ids with XPath
$xpath = $doc->xpath("//product/#id");
$xmlids = [];
foreach($xpath as $item => $value){ $xmlids[] = (string)$value; }
// Compare difference with $array
$array = array("1","4","5");
$removeids = array_diff($xmlids, $array);
// Dynamically build XSLT string for each resulting id
foreach($removeids as $id){
$xslstr='<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="product[#id=\''.$id.'\']"/>
</xsl:transform>';
$xsl = new SimpleXMLElement($xslstr);
// Configure the transformer and run
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
$newXML = $proc->transformToXML($doc);
// Adjust $doc object with each loop
$doc = new SimpleXMLElement($newXML);
}
// Echo Output
echo $doc->asXML();

XML Clone node in PHP

I have to clone an XML node and its childs and append it to a new XML in a specifics tag.
Ie:
Source XML:
<root>
<element>
<back>
<item1>ABC</item1>
<item2>DEF</item2>
<more>
<moreitem>GHI</moreitem>
</more
</back>
</element>
</root>
Destination XML:
<root>
<base1>
<item1>FOO</item1>
<item2>BAR</item2>
<base2>
**<back>From source XML and all its childs here</back>**
</base2>
</base1>
<root>
DOMXpath::evaluate() allows you to fetch nodes using Xpath expressions. DOMDocument::importNode() duplicates a node and imports a node into a target document. DOMNode::cloneNode() create a duplicate of node to add in the same document. DOMNode::appendChild() allows you to append the imported/cloned node.
$source = <<<'XML'
<root>
<element>
<back>
<item1>ABC</item1>
<item2>DEF</item2>
<more>
<moreitem>GHI</moreitem>
</more>
</back>
</element>
</root>
XML;
$target = <<<'XML'
<root>
<base1>
<item1>FOO</item1>
<item2>BAR</item2>
<base2>
</base2>
</base1>
</root>
XML;
$sourceDocument = new DOMDocument();
$sourceDocument->loadXml($source);
$sourceXpath = new DOMXpath($sourceDocument);
$targetDocument = new DOMDocument();
$targetDocument->loadXml($target);
$targetXpath = new DOMXpath($targetDocument);
foreach ($targetXpath->evaluate('/root/base1/base2[1]') as $targetNode) {
foreach ($sourceXpath->evaluate('/root/element/back') as $backNode) {
$targetNode->appendChild($targetDocument->importNode($backNode, TRUE));
}
}
echo $targetDocument->saveXml();
Output:
<?xml version="1.0"?>
<root>
<base1>
<item1>FOO</item1>
<item2>BAR</item2>
<base2>
<back>
<item1>ABC</item1>
<item2>DEF</item2>
<more>
<moreitem>GHI</moreitem>
</more>
</back>
</base2>
</base1>
</root>
Of course you can use XSLT, the native programming language to restructure XML documents to any nuanced needs. Specifically here, you require pulling XML content from an external source XML file. And PHP like other general purpose languages (Java, C#, Python, VB) maintain libraries for XSLT processing.
XSLT (save as .xsl or .xslt file to be used in PHP below and be sure Source and Destination XML files are in same directory)
<?xml version="1.0" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*" />
<!-- Identity Transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="back">
<back>
<xsl:copy-of select="document('Source.xml')"/>
</back>
</xsl:template>
</xsl:transform>
PHP (loading XML and XSL files externally but can be embedded as string)
$destinationdoc = new DOMDocument();
$doc1->load('Destination.xml');
$xsl = new DOMDocument;
$xsl->load('XSLTScript.xsl');
// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
// Transform XML source
$newXml = $proc->transformToXML($doc1);
// Save output to file
$xmlfile = 'FinalOutput.xml';
file_put_contents($xmlfile, $newXml);
OUTPUT (using your above posted Source and Destination xml)
<?xml version="1.0" encoding="UTF-8"?>
<root>
<base1>
<item1>FOO</item1>
<item2>BAR</item2>
<base2>
<back>
<root>
<element>
<back>
<item1>ABC</item1>
<item2>DEF</item2>
<more>
<moreitem>GHI</moreitem>
</more>
</back>
</element>
</root>
</back>
</base2>
</base1>
</root>
This is an easy way to do this:
$src = new DOMDocument();
$dst = new DOMDocument();
$src->loadXML($src_xml);
$dst->loadXML($dst_xml);
$back = $src->getElementsByTagName('back')->item(0);
$base = $dst->getElementsByTagName('base2')->item(0);
$base->appendChild( $dst->importNode( $back, true ) );
echo $dst->saveXML();

PHP Move DOMDocument nodes to a new parent

I have a xml file from a client which is not compleet what i want, so i have to rewrite it.
This is what i have:
<artikel>
<kop>
<titel>Artikel 2.</titel>
</kop>
<lid>
<lidnr>1</lidnr>
<al>content</al>
</lid>
<lid>
<lidnr>2</lidnr>
<al>content</al>
</lid>
</artikel>
and this is what i need:
<artikel>
<kop>
<titel>Artikel 2.</titel>
</kop>
<leden>
<lid>
<lidnr>1</lidnr>
<al>content</al>
</lid>
<lid>
<lidnr>2</lidnr>
<al>content</al>
</lid>
</leden>
</artikel>
I do not know xml very well, so i have a problem. I think this needed to be done:
1) create a new_parent_node "leden"
2) per "lid": add "lid" to "leden" node and remove from "artikel" node
3) add new node "leden" after "kop" node
This is what i have so far:
$dom->load($publicatieurl_xml);
$artikels = $dom->getElementsByTagName('artikel');
foreach ($artikels as $key => $artikel) {
$lidNodes = $artikel->getElementsByTagName('lid');
if ( $lidNodes->length !== 0 ) {
$new_parent_node = $dom->createElement('leden');
foreach ( $lidNodes as $key => $lid ) {
$new_parent_node->appendChild( $lid );
}
echo ($new_parent_node->ownerDocument->saveXML($new_parent_node));
}
}
Where this does not work: $new_parent_node->appendChild( $lid );
because it is an object.
So what i need to know is:
1) how can i add the already existing XML-element "$lid" to my "leden" node
2) how do i remove the "lid" nodes? Yet another foreach loop? Because i can not remove it in the one where i append the $lid, because that ruins the foreach elements...
I would use XSLT for that. First create the stylesheet document:
translate.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<artikel>
<xsl:copy-of select="/artikel/kop" />
<leden>
<xsl:copy-of select="/artikel/lid" />
</leden>
</artikel>
</xsl:template>
</xsl:stylesheet>
Now comes the PHP code:
// Load input from customer. (Can be an http:// url if desired)
$input = new DOMDocument();
$input->load('input.xml');
// Load the stylesheet document
$xsl = new DOMDocument();
$xsl->load('translate.xsl');
$xsltproc = new XSLTProcessor();
$xsltproc->importStylesheet($xsl);
// transformToXML() returns the translated xml as a string
echo $xsltproc->transformToXML($input);
// ... or transformToDoc() can be used if you need to
// further process the translated xml.
$newdoc = $xsltproc->transformToDoc($input);
Btw, if you don't want to store the xsl in a separate file, you use DOMDocument::loadXML() to load it:
$xsl = new DOMDocument();
$xsl->loadXML(<<<EOF
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<artikel>
<xsl:copy-of select="/artikel/kop" />
<leden>
<xsl:copy-of select="/artikel/lid" />
</leden>
</artikel>
</xsl:template>
</xsl:stylesheet>
EOF
);

How to distinguish between empty element and null-size string in DOMDocument?

I have trouble to load XML document into DOM preserving empty tags and null-size strings. Here the example:
$doc = new DOMDocument("1.0", "utf-8");
$root = $doc->createElement("root");
$doc->appendChild($root);
$element = $doc->createElement("element");
$root->appendChild($element);
echo $doc->saveXML();
produces following XML:
<?xml version="1.0" encoding="utf-8"?>
<root><element/></root>
Empty element, exactly as expected. Now let's add empty text node into element.
$doc = new DOMDocument("1.0", "utf-8");
$root = $doc->createElement("root");
$doc->appendChild($root);
$element = $doc->createElement("element");
$element->appendChild($doc->createTextNode(""));
$root->appendChild($element);
echo $doc->saveXML();
produces following XML:
<?xml version="1.0" encoding="utf-8"?>
<root><element></element></root>
Non-empty element with null-size string. Good! But when I am trying to do:
$doc = new DOMDocument();
$doc->loadXML($xml);
echo $doc->saveXML($doc);
on these XML documents I always get
<?xml version="1.0" encoding="utf-8"?>
<root><element/></root>
ie null-size string is removed and just empty element is loaded. I believe it happens on loadXML(). Is there any way to convince DOMDocument loadXML() not to convert null-size string into empty element? It would be preferable if DOM would have TextNode with null-size string as element's child.
Solution is needed to be in PHP DOM due to the way what would happen to the loaded data further.
The problem to distinguish between those two is, that when DOMDocument loads the XML serialized document, it does only follow the specs.
By the book, in <element></element> there is no empty text-node in that element - which is what others have commented already as well.
However DOMDocument is perfectly fine if you insert an empty text-node there your own. Then you can easily distinguish between a self-closing tag (no children) and an empty element (having one child, an empty text-node).
So how to enter those empty text-nodes? For example by using from the XMLReader based XMLReaderIterator library, specifically the DOMReadingIteration, which is able to build up the document, while offering each current XMLReader node for interaction:
$doc = new DOMDocument();
$iterator = new DOMReadingIteration($doc, $reader);
foreach ($iterator as $index => $value) {
// Preserve empty elements as non-self-closing by making them non-empty with a single text-node
// children that has zero-length text
if ($iterator->isEndElementOfEmptyElement()) {
$iterator->getLastNode()->appendChild(new DOMText(''));
}
}
echo $doc->saveXML();
That gives for your input:
<?xml version="1.0" encoding="utf-8"?>
<root><element></element></root>
This output:
<?xml version="1.0"?>
<root><element></element></root>
No strings attached. A fine build DOMDocument. The example is from examples/read-into-dom.php and a fine proof that it is no problem when you load the document via XMLReader and you deal with that single special case you have.
Here is no difference for the loading XML parser. The DOM is exactly the same.
If you load/save a XML format that has a problem with empty tags, you can use an option to avoid the empty tags on save:
$dom = new DOMDocument();
$dom->appendChild($dom->createElement('foo'));
echo $dom->saveXml();
echo "\n";
echo $dom->saveXml(NULL, LIBXML_NOEMPTYTAG);
Output:
<?xml version="1.0"?>
<foo/>
<?xml version="1.0"?>
<foo></foo>
You can trick XSLT processors to not use self-closing elements, by pretending a xsl:value-of inserting a variable, but that variable being an empty string ''.
Input:
<?xml version="1.0" encoding="utf-8"?>
<root>
<foo>
<bar some="value"></bar>
<self-closing attr="foobar" val="3.5"/>
</foo>
<goo>
<gle>
<nope/>
</gle>
</goo>
</root>
Stylesheet:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(node())]">
<xsl:copy>
<xsl:for-each select="#*">
<xsl:attribute name="{name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:for-each>
<xsl:value-of select="''"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output:
<?xml version="1.0" encoding="utf-8"?>
<root>
<foo>
<bar some="value"></bar>
<self-closing attr="foobar" val="3.5"></self-closing>
</foo>
<goo>
<gle>
<nope></nope>
</gle>
</goo>
</root>
To solve this in PHP without the use of a XSLT processor, I can only think of adding empty text nodes to all elements with no children (like you do in the creation of the XML).

Categories