Remove empty elements from XML in php - php

Say I have this XML and I need to remove empty elements (elements that don't contain data at all) such as:
...
<date>
<!-- keep oneDay -->
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
<!-- remove range entirely -->
<range>
<startDate/>
<endDate/>
</range>
<!-- remove deadline entirely -->
<deadline>
<date/>
</deadline>
<data>
...
The output then should be
...
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
...
I'm looking for a dynamic solution that would work on any cases like this regardless of the literal name of the element.
SOLUTION (UPDATED)
It turns out that using //*[not(normalize-space())] returns all elements without non-empty text content (no need for recursion).
foreach($xpath->query('//*[not(normalize-space())]') as $node ) {
$node->parentNode->removeChild($node);
}
Check out #har07's solution for more details
SOLUTION
The xPath approach provided by #manuelbc works but only on child elements (meaning that the children will be gone but the parent nodes of those will stay... empty as well).
However, this will work recursively until the XML document is out of empty nodes.
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<XML STRING GOES HERE>');
$xpath = new DOMXPath($doc);
while (($notNodes = $xpath->query('//*[not(node())]')) && ($notNodes->length)) {
foreach($notNodes as $node) {
$node->parentNode->removeChild($node);
}
}
$doc->formatOutput = true;
echo $doc->saveXML();

You can do it with XPath
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<date>
<!-- keep oneDay -->
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
<!-- remove range entirely -->
<range>
<startDate/>
<endDate/>
</range>
<!-- remove deadline entirely -->
<deadline>
<date/>
</deadline>
<data>');
$xpath = new DOMXPath($doc);
foreach( $xpath->query('//*[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}
$doc->formatOutput = true;
echo $doc->savexml();
See original solution here:
Remove empty tags from a XML with PHP

The XPath in the other answer only returns empty elements in the sense that the element has no child node of any kind (no element node, no text node, nothing). To get all empty elements according to your definition, that is element without non-empty text content, try using the following XPath instead :
//*[not(normalize-space())]
eval.in demo
output :
<?xml version="1.0"?>
<data>
<!-- keep oneDay -->
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
<!-- remove range entirely -->
<!-- remove deadline entirely -->
</data>

Related

how could i import xml list with same name using php domdocument

using php domdocument, to import xml file, i can't have the list of "tags"
I have tried multiple way but i can't
xml document :
<resource>
<title>hello world</title>
<tags>
<resource>great</resource>
<resource>fun</resource>
<resource>omg</resource>
</resource>
php :
<?php
$url='test.xml';
$doc = new DOMDocument();
$doc->load($url);
$feed = $doc->getElementsByTagName("resource");
foreach($feed as $entry) {
echo $entry->getElementsByTagName("username")->item(0)->nodeValue;
echo '<br>';
echo $entry->getElementsByTagName("tags")->item(0)->nodeValue;
echo '<br>';
}
i expect the outpout to be a list like that :
hello world
great
fun
omg
but the actual output is NOT a list the result is a sentence without space :
hello world greatfunomg
DOMDocument::getElementsByTagName() returns all descendant element nodes with the specified name. DOMElement::$nodeValue will return the text content of an element node including all its descendants.
In your case echo $entry->getElementsByTagName("tags")->item(0)->nodeValue fetches all tags, access the first node of that list and outputs its text content. That is greatfunomg.
Using the DOM methods to access nodes is verbose and requires a lot of code and if you want to do it right a lot of conditions. It is a lot easier if you use Xpath expressions. The allow you to scalar values and lists of nodes from an DOM.
$xml = <<<'XML'
<_>
<resource>
<title>hello world</title>
<tags>
<resource>great</resource>
<resource>fun</resource>
<resource>omg</resource>
</tags>
</resource>
</_>
XML;
$document = new DOMDocument();
$document->loadXML($xml);
// create an Xpath instance for the document
$xpath = new DOMXpath($document);
// fetch resource nodes that are a direct children of the document element
$entries = $xpath->evaluate('/*/resource');
foreach($entries as $entry) {
// fetch the title node of the current entry as a string
echo $xpath->evaluate('string(title)', $entry), "\n";
// fetch resource nodes that are children of the tags node
// and map them into an array of strings
$tags = array_map(
function(\DOMElement $node) {
return $node->textContent;
},
iterator_to_array($xpath->evaluate('tags/resource', $entry))
);
echo implode(', ', $tags), "\n";
}
Output:
hello world
great, fun, omg
If you just need to output the first piece of text for each <resource> element - wherever it is, then using XPath and (making sure you ignore whitespace on load) pick the first child element of this and output the node value.
Ignoring the whitespace on load is important as the whitespace will create nodes for all the padding around each element and so the first child of each <resource> element may just be a new line or tab.
$xml = '<root>
<resource>
<title>hello world</title>
<tags>
<resource>great</resource>
<resource>fun</resource>
<resource>omg</resource>
</tags>
</resource>
</root>';
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
// $doc->load($filename); // If loading from a file
$xpath = new DOMXpath($doc);
$resources = $xpath->query("//resource");
foreach ( $resources as $resource ){
echo $resource->firstChild->nodeValue.PHP_EOL;
}
The output of which is
hello world
great
fun
omg
Or without using XPath...
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
//$doc->load($filename);
$resources = $doc->getElementsByTagName("resource");
foreach ( $resources as $resource ){
echo $resource->firstChild->nodeValue.PHP_EOL;
}

PHP DOM Cut xml in to pieces and save each child with parent separately

I have next type of XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test SYSTEM "dtd">
<root>
<tag1>
<1>Name</1>
<2>Num1</2>
<3>NumOrder</3>
<4>test</5>
<6>line</6>
<7>HTTP </7>
<8>1</8>
<9></9>
</tag1>
<tag2>
<1>Name</1>
<2>Num1</2>
<3>NumOrder</3>
<4>test</5>
<6>line</6>
<7>HTTP </7>
<8>1</8>
<9></9>
</tag2>
...
<tagN>
<1>Name</1>
<2>Num1</2>
<3>NumOrder</3>
<4>test</5>
<6>line</6>
<7>HTTP </7>
<8>1</8>
<9></9>
</tagN>
</root>
And i need to get root with each child element separately in array saved as HTML:
array = [rootwithchild1,rootwithchild2...N];
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test SYSTEM "dtd">
<root>
<tagN>
<1>Name</1>
<2>Num1</2>
<3>NumOrder</3>
<4>test</5>
<6>line</6>
<7>HTTP </7>
<8>1</8>
<9></9>
</tagN>
</root>
For now i make 2 doms, in one i get all child separately, in another i have deleted all child and left only root. At these step i wanted to add each child to root, save as html, delete child, and so on with each child, but this doesn't work.
$bodyNode = $copydoc->getElementsByTagName('root')->item(0);
foreach ($mini as $value) {
$bodyNode->appendChild($value);
$result[] = $copydoc->saveHTML();
$bodyNode->removeChild($value);
}
Error on $bodyNode->appendChild($value);
Mini is array of cut child.
Lib: $doc = new DOMDocument();
Can anyone advice how to do this right, maybe better to use xpath or something else..?
Thanks
I would simply create a new document that contains only the root element and a “fake” initial child:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test SYSTEM "dtd">
<root>
<fakechild />
</root>
After that, loop over the child elements of the original document – and for each of those perform the following steps:
import the child node from the original document into the new document using DOMDocument::importNode
replace the current child node of the root element of the new document with the imported node using DOMNode::replaceChild with the firstChild of the root element as second parameter
save the new document
(Having the <fakechild /> in the root element to begin with is not technically necessary, a simple whitespace text node should do as well – but with an empty root element this would not work in such a straight fashion, because the firstChild would give you NULL in the first loop iteration, so you would not have a node to feed to DOMNode::replaceChild as second parameter. Of course you could do additional checks for that and use appendChild instead of replaceChild for the first item … but why complicate stuff more than necessary.)
DOMNode::getElemementsByTagName() returns a live result. So if you remove the node from the DOM it is removed from the node list as well.
You can iterate the list backwards...
for ($i = $nodes->length - 1; $i >= 0; $i--) {
$node = $nodes->item($i);
...
}
... or copy it to an array:
foreach (iterator_to_array($nodes) as $node) {
...
}
Node lists from DOMXpath::evaluate() are not affected that way. XPath allows a more specific selection of nodes, too.
$xpath = new DOMXpath($domDocument);
$nodes = $xpath->evaluate('/root/*');
foreach (iterator_to_array($nodes) as $node) {
...
}
But I wonder why are you modifying (destroying) the original XML source?
If would create a new document to act as a template and. Never removing nodes, only creating new documents and importing them:
// load the original source
$source= new DOMDocument();
$source->loadXml($xml);
$xpath = new DOMXpath($source);
// create a template dom
$template = new DOMDocument();
$parent = $template;
// add a node and all its ancestors to the template
foreach ($xpath->evaluate('/root/part[1]/ancestor-or-self::*') as $node) {
$parent = $parent->appendChild($template->importNode($node, FALSE));
}
// for each of the child element nodes
foreach ($xpath->evaluate('/root/part/*') as $node) {
// create a new target
$target = new DOMDocument();
// import the nodes from the template
$target->appendChild($target->importNode($template->documentElement, TRUE));
// find the first element node that has no child element nodes
$targetXpath = new DOMXpath($target);
$targetNode = $targetXpath->evaluate('//*[count(*) = 0]')->item(0);
// append the child node from the original xml
$targetNode->appendChild($target->importNode($node, TRUE));
echo $target->saveXml(), "\n\n";
}
Demo: https://eval.in/191304

Remove empty XML tags with PHP but ignore tags with attributes

I know its possible to remove empty XML tags using XPath (as seen here - Remove empty tags from a XML with PHP)
$xpath = new DOMXPath($doc);
foreach( $xpath->query('//*[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}
$doc->formatOutput = true;
echo $doc->savexml();
but is it possible to use a similar method to still remove empty tags but keep ones that have attributes?
e.g.
<range starts_at="2012-11-22" ends_at="2012-11-26"></range>
Try with this XPath
'//*[not(node()) and not(#*)]'

appendChild using DOMDocument/PHP/XML

I am trying to update my XML file based on an HTML form processed by PHP but the new XML snippet I am trying to append to specific areas of my current XML just keeps getting added to the end of my document.
$specific_node = "0"; //this is normally set by a select input from the form.
$doc = new DOMDocument();
$doc->load( 'rfp_files.xml' );
$doc->formatOutput = true;
//below is where my issue is having problems the variable '$specific_node' can be one of three options 0,1,2 and what I am trying to do is find the child of content_sets. So the first second or third child elemts and that is where I will add my new bit of XML
$r = $doc->getElementsByTagname('content_sets')->item($specific_node);
//This is where I build out my new XML to append
$fileName = $doc->createElement("file_name");
$fileName->appendChild(
$doc->createTextNode( $Document_Array["url"] )
);
$b->appendChild( $fileName );
//this is were I add the new XML to the child node mention earlier in the script.
$r->appendChild( $b );
XML Example:
<?xml version="1.0" encoding="UTF-8"?>
<content_sets>
<doc_types>
<article>
<doc_name>Additional</doc_name>
<file_name>Additional.docx</file_name>
<doc_description>Test Word document. Please remove when live.</doc_description>
<doc_tags>word document,test,rfp template,template,rfp</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</doc_types>
<video_types>
<article>
<doc_name>Test Video</doc_name>
<file_name>test_video.avi</file_name>
<doc_description>Test video. Please remove when live.</doc_description>
<doc_tags>test video,video, avi,xvid,svid avi</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</video_types>
<image_types>
<article>
<doc_name>Test Image</doc_name>
<file_name>logo.png</file_name>
<doc_description>Logo transparent background. Please remove when live.</doc_description>
<doc_tags>png,logo,logo png,no background,graphic,hi res</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</image_types>
</content_sets>
This is getting the root element:
$specific_node = "0";
$r = $doc->getElementsByTagname('content_sets')->item($specific_node);
So you are appending a child onto the root which is why you always see it added near the end of the document. You need to get the children of the root element like this:
$children = $doc->documentElement->childNodes;
This can return several types of node, but you are only interested in 'element' type nodes. It's not very elegant, but the only way I've found to get a child element by position is looping like this...
$j = 0;
foreach ($doc->documentElement->childNodes as $r)
if ($r->nodeType === XML_ELEMENT_NODE && $j++ == $specific_node)
break;
if ($j <= $specific_node)
// handle situation where $specific_node is more than number of elements
You could use getElementsByTagName() if you can pass the name of the node required instead of the ordinal position, or change the XML so that the child elements all have the same name and use an attribute to differentiate them.

update XML using php issues with getElementsByTagName and identifying the correct childnode

how do I identify the correct XML node based off a $_POST variable from a user submitted form. Below is my current XML with a note on were I want the new XML data to be placed and the PHP that takes the form data and prepares it to be inserted into the XML document.
XML:
<?xml version="1.0" encoding="UTF-8"?>
<content_sets>
<!-- The new article node will be placed inside of one of the content_sets child nodes. Either doc_types, video_types, image_types. -->
<doc_types>
<article>
<doc_name>Test Proposal</doc_name>
<file_name>tes_prop.docx</file_name>
<doc_description>Test Word document. Please remove when live.</doc_description>
<doc_tags>word document,test,rfp template,template,rfp</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</doc_types>
<video_types>
<article>
<doc_name>Test Video</doc_name>
<file_name>test_video.avi</file_name>
<doc_description>Test video. Please remove when live.</doc_description>
<doc_tags>test video,video, avi,xvid,svid avi</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</video_types>
<image_types>
<article>
<doc_name>Test Image</doc_name>
<file_name>logo.png</file_name>
<doc_description>Logo transparent background. Please remove when live.</doc_description>
<doc_tags>png,logo,logo png,no background,graphic,hi res</doc_tags>
<last_update>01/26/2013 23:07</last_update>
</article>
</image_types>
</content_sets>
PHP on submit:
$file_type = $_POST['file_type'];
//This is where the node name comes from
$doc = new DOMDocument();
$doc->load( 'rfp_files.xml' );
$doc->formatOutput = true;
$r = $doc->getElementsByTagName("content_sets")->getElementsByTagName($file_type);
*****//The above code is where my issue is coming from. I am not identifying the child node of content_sets correctly.
$b = $doc->createElement("article");
$titleName = $doc->createElement("doc_name");
$titleName->appendChild(
$doc->createTextNode( $Document_Array["name"] )
);
$b->appendChild( $titleName );
$r->appendChild( $b );
$doc->save("rfp_files.xml");
I did not show the form or the rest of article's child nodes. If needed I can post more of my code.
When using getElementsByTagName(), you need to use the item() method so you can retrieve a specific node in the node list - even if there is only one item in the node list, you still have to do this.
getElementsByTagName() will always return a DOM Node List, so you either have to loop through the list, or you have to retrieve a specific item via the item() method - does that make sense? There is an example here: http://php.net/manual/en/domnodelist.item.php

Categories