Combining several XML documents sequential with PHP - php

I am trying to merge xml-documents from folder "files" into one DOMDocument, and create a table of contents.
The documents have the following structure:
<chapter title="This is first chapter">
<section title="This is the first section">
<paragraph title="This is the first paragraph">This is the paragraph content</paragraph>
</section>
</chapter>
The following code is used for merging the XML-files:
foreach(glob("files/*xml") as $filename) {
$count++;
if ($count == 1)
{
$first = new DOMDocument("1.0", 'UTF-8');
$first->formatOutput = true;
$first->load($filename);
$xml = new DOMDocument("1.0", 'UTF-8');
$xml->formatOutput = true;
}
else {
$second = new DOMDocument("1.0", 'UTF-8');
$second->formatOutput = true;
$second->load($filename);
$second = $second->documentElement;
foreach($second->childNodes as $node)
{
$importNode = $first->importNode($node,TRUE);
$first->documentElement->appendChild($importNode);
}
$first->saveXML();
$xml->appendChild($xml->importNode($first->documentElement,true));
}
}
print $xml->saveXML();
Everything seems to work OK, except a problem with <chapter>-elements. This is what happens when two documents (let's say two identical versions of the XML i presented in the beginning of my question) are merged:
<chapter title="This is first chapter">
<section title="This is the first section">
<paragraph title="This is the first paragraph">This is the paragraph content</paragraph>
</section>
<chapter title="This is second chapter">
<section title="This is the first section">
<paragraph title="This is the first paragraph">This is the paragraph content</paragraph>
</section>
</chapter>
</chapter>
I think the reason for this problem, is that there are no root element for the merged documents. So, is there for example a way to add a <doc> tag or something for the merged XML's?

Look at it from another view point. You create a new document that combines all the chapters of you book. So create a book element and import the chapters into it.
// create a new document
$dom = new DOMDocument();
// and add the root element
$dom->appendChild($dom->createElement('book'));
// for each document/xml to add
foreach ($chapters as $chapter) {
// create a dom
$addDom = new DOMDocument();
// load the chapter
$addDom->load($chapter);
// if here is a root node in the loaded xml
if ($addDom->documentElement) {
// append to the result dom
$dom->documentElement->appendChild(
// after importing the document element to the result dom
$dom->importNode($addDom->documentElement, TRUE)
);
}
}
echo $dom->saveXml();

Related

php xml XPath foreach returning all nodes instead of iterating

I have a XML file like this
<Articles>
<Article>
<ArticleTitle>
First Book
</ArticleTitle>
</Article>
<Article>
<ArticleTitle>
Second Book
</ArticleTitle>
</Article>
And using this type of php script to retrieve the contents of ArticleTitle in an iterative fashion
foreach ($xml->Article as $Article){
$title = $Article->xpath('//ArticleTitle');
echo $title[0];
}
But this displays
First Book
First Book
Instead of
First Book
Second Book
I was assuming that when the foreach ($xml->Article as $Article) starts it will grab each Article node and then I can access the contents of that node but this is not what is happening. What am I doing wrong?
Your issue is that the xpath you have used is an absolute path (it starts with /) so the fact that you are calling it from a child node has no effect. You should use a relative path, in this case either simply ArticleTitle will suffice or .//ArticleTitle to allow for other nodes between Article and ArticleTitle. For example:
foreach ($xml->Article as $Article){
$title = $Article->xpath('ArticleTitle');
echo $title[0];
}
foreach ($xml->Article as $Article){
$title = $Article->xpath('.//ArticleTitle');
echo $title[0];
}
Output in both cases is:
First Book
Second Book
Demo on 3v4l.org
This should work too with your original XPath expression :
$xml = <<<'XML'
<Articles>
<Article>
<ArticleTitle>First Book</ArticleTitle>
</Article>
<Article>
<ArticleTitle>Second Book</ArticleTitle>
</Article>
</Articles>
XML;
$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
$elements = $xpath->query('//ArticleTitle');
foreach($elements as $element)
echo ($element->nodeValue), "\n";
?>
Output :
First Book
Second Book

Remove child from XML with PHP DOM

I want to remove first video element (video src=time.mp4) from this xml (filename.xml) and save the xml into filename4.smil :
<?xml version="1.0" encoding="utf-8"?>
<smil>
<stream name="mysq"/>
<playlist name="Default" playOnStream="mysq" repeat="true" scheduled="2010-01-01 01:01:00">
<video src="time.mp4" start="0" length="-1"> </video>
<video src="sample.mp4" start="0" length="-1"> </video>
</playlist>
</smil>
i am using this code, but is not working:
<?php
$doc = new DOMDocument;
$doc->load("filename.xml");
$thedocument = $doc->documentElement;
//this gives you a list of the messages
$list0 = $thedocument->getElementsByTagName('playlist');
$list = $list0->item(0);
$nodeToRemove = null;
foreach ($list as $domElement){
$videos = $domElement->getElementsByTagName( 'video' );
$video = $videos->item(0);
$attrValue = $video->getAttribute('src');
if ($attrValue == 'time.mp4') {
$nodeToRemove = $videos; //will only remember last one- but this is just an example :)
}
}
//Now remove it.
if ($nodeToRemove != null)
$thedocument->removeChild($nodeToRemove);
$doc->save('filename4.smil');
?>
Assuming that there is only 1 playlist item and you want to remove the first video element from that, here are 2 methods.
This one uses getElementsByTagName() as you are in your code, but simple picks the first item from each list and then removes the item (you have to use parentNode to remove the child node).
$playlist = $doc->getElementsByTagName('playlist')->item(0);
$video = $playlist->getElementsByTagName( 'video' )->item(0);
$video->parentNode->removeChild($video);
This version uses XPath, which is more flexible, it looks for the playlist elements with a video element somewhere inside. Again, just taking the first one and removing it...
$xp = new DOMXPath($doc);
$video = $xp->query('//playlist//video')->item(0);
$video->parentNode->removeChild($video);
The problem with
$thedocument->removeChild($nodeToRemove);
is that you are trying to remove a child element from the base document. As this node is nested in the hierarchy, it won't be able to remove it, you need to remove it from it's direct parent.
Using Xpath expressions you can fetch video nodes with a specific src attribute, iterate them and remove them.
$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
$expression = '/smil/playlist/video[#src="time.mp4"]';
foreach ($xpath->evaluate($expression) as $video) {
$video->parentNode->removeChild($video);
}
var_dump($document->saveXML());
It is possible to fetch nodes by position as well: /smil/playlist/video[1].

How to add nodes to a multi-level XML from an array?

I have an array $arr=array("A-B-C-D","A-B-E","A-B-C-F") and my expected output of the XML should be
<root>
<A>
<B>
<C>
<D></D>
<F></F>
</C>
<E></E>
</B>
</A>
</root>
I have already done the code that creates a new node of the XML for the first array element i.e. A-B-C-D. But when I move to the second element I need to check how many nodes are already created (A-B) and then add the new node based on that in the proper position.
So how do I traverse the XML and find the exact position where the new node should be attached?
my current code looks like this
$arr=explode("-",$input);
$doc = new DomDocument();
$doc->formatOutput=true;
$doc->LoadXML('<root/>');
$root = $doc->documentElement;
$comm = $doc->createElement('comm');
$root->appendChild($comm);
foreach($arr as $a2) {
$newcomm = $doc->createElement($a2);
$community->appendChild($newcomm);
$community=$newcomm;
}
Should I use xpath or some other method will be easier?
To stick with using DOMDocument, I've added an extra loop to allow you to add all of the original array items in. The main thing is before adding a new item in, check if it's already there...
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
$set=array("A-B-C-D","A-B-E","A-B-C-F-G", "A-B-G-Q");
$doc = new DomDocument();
$doc->formatOutput=true;
$doc->LoadXML('<root/>');
foreach ( $set as $input ) {
$arr=explode("-",$input);
$base = $doc->documentElement;
foreach($arr as $a2) {
$newcomm = null;
// Decide if the element already exists.
foreach ( $base->childNodes as $nextElement ) {
if ( $nextElement instanceof DOMElement
&& $nextElement->tagName == $a2 ) {
$newcomm = $nextElement;
}
}
if ( $newcomm == null ) {
$newcomm = $doc->createElement($a2);
$base->appendChild($newcomm);
}
$base=$newcomm;
}
}
echo $doc->saveXML();
As there is no quick way ( as far as I know) to check for a child with a specific tag name, it just looks through all of the child elements for a DOMElement with the same name.
I started using getElementByTagName, but this finds any child node with the name and not just at the current level.
The output from above is...
<?xml version="1.0"?>
<root>
<A>
<B>
<C>
<D/>
<F>
<G/>
</F>
</C>
<E/>
<G>
<Q/>
</G>
</B>
</A>
</root>
I added a few other items in to show that it adds things in at the right place.

Remove empty elements from XML in php

Say I have this XML and I need to remove empty elements (elements that don't contain data at all) such as:
...
<date>
<!-- keep oneDay -->
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
<!-- remove range entirely -->
<range>
<startDate/>
<endDate/>
</range>
<!-- remove deadline entirely -->
<deadline>
<date/>
</deadline>
<data>
...
The output then should be
...
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
...
I'm looking for a dynamic solution that would work on any cases like this regardless of the literal name of the element.
SOLUTION (UPDATED)
It turns out that using //*[not(normalize-space())] returns all elements without non-empty text content (no need for recursion).
foreach($xpath->query('//*[not(normalize-space())]') as $node ) {
$node->parentNode->removeChild($node);
}
Check out #har07's solution for more details
SOLUTION
The xPath approach provided by #manuelbc works but only on child elements (meaning that the children will be gone but the parent nodes of those will stay... empty as well).
However, this will work recursively until the XML document is out of empty nodes.
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<XML STRING GOES HERE>');
$xpath = new DOMXPath($doc);
while (($notNodes = $xpath->query('//*[not(node())]')) && ($notNodes->length)) {
foreach($notNodes as $node) {
$node->parentNode->removeChild($node);
}
}
$doc->formatOutput = true;
echo $doc->saveXML();
You can do it with XPath
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<date>
<!-- keep oneDay -->
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
<!-- remove range entirely -->
<range>
<startDate/>
<endDate/>
</range>
<!-- remove deadline entirely -->
<deadline>
<date/>
</deadline>
<data>');
$xpath = new DOMXPath($doc);
foreach( $xpath->query('//*[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}
$doc->formatOutput = true;
echo $doc->savexml();
See original solution here:
Remove empty tags from a XML with PHP
The XPath in the other answer only returns empty elements in the sense that the element has no child node of any kind (no element node, no text node, nothing). To get all empty elements according to your definition, that is element without non-empty text content, try using the following XPath instead :
//*[not(normalize-space())]
eval.in demo
output :
<?xml version="1.0"?>
<data>
<!-- keep oneDay -->
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
<!-- remove range entirely -->
<!-- remove deadline entirely -->
</data>

appendChild using DOMDocument/PHP/XML

I am trying to update my XML file based on an HTML form processed by PHP but the new XML snippet I am trying to append to specific areas of my current XML just keeps getting added to the end of my document.
$specific_node = "0"; //this is normally set by a select input from the form.
$doc = new DOMDocument();
$doc->load( 'rfp_files.xml' );
$doc->formatOutput = true;
//below is where my issue is having problems the variable '$specific_node' can be one of three options 0,1,2 and what I am trying to do is find the child of content_sets. So the first second or third child elemts and that is where I will add my new bit of XML
$r = $doc->getElementsByTagname('content_sets')->item($specific_node);
//This is where I build out my new XML to append
$fileName = $doc->createElement("file_name");
$fileName->appendChild(
$doc->createTextNode( $Document_Array["url"] )
);
$b->appendChild( $fileName );
//this is were I add the new XML to the child node mention earlier in the script.
$r->appendChild( $b );
XML Example:
<?xml version="1.0" encoding="UTF-8"?>
<content_sets>
<doc_types>
<article>
<doc_name>Additional</doc_name>
<file_name>Additional.docx</file_name>
<doc_description>Test Word document. Please remove when live.</doc_description>
<doc_tags>word document,test,rfp template,template,rfp</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</doc_types>
<video_types>
<article>
<doc_name>Test Video</doc_name>
<file_name>test_video.avi</file_name>
<doc_description>Test video. Please remove when live.</doc_description>
<doc_tags>test video,video, avi,xvid,svid avi</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</video_types>
<image_types>
<article>
<doc_name>Test Image</doc_name>
<file_name>logo.png</file_name>
<doc_description>Logo transparent background. Please remove when live.</doc_description>
<doc_tags>png,logo,logo png,no background,graphic,hi res</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</image_types>
</content_sets>
This is getting the root element:
$specific_node = "0";
$r = $doc->getElementsByTagname('content_sets')->item($specific_node);
So you are appending a child onto the root which is why you always see it added near the end of the document. You need to get the children of the root element like this:
$children = $doc->documentElement->childNodes;
This can return several types of node, but you are only interested in 'element' type nodes. It's not very elegant, but the only way I've found to get a child element by position is looping like this...
$j = 0;
foreach ($doc->documentElement->childNodes as $r)
if ($r->nodeType === XML_ELEMENT_NODE && $j++ == $specific_node)
break;
if ($j <= $specific_node)
// handle situation where $specific_node is more than number of elements
You could use getElementsByTagName() if you can pass the name of the node required instead of the ordinal position, or change the XML so that the child elements all have the same name and use an attribute to differentiate them.

Categories