Retrieving a subset of XML nodes with PHP - php

Using PHP, how do I get an entire subset of nodes from an XML document? I can retrieve something like:
<?xml version="1.0" encoding="utf-8"?>
<people>
<certain>
<name>Jane Doe</name>
<age>21</age>
</certain>
<certain>
<certain>
<name>John Smith</name>
<age>34</age>
</certain>
</people>
But what if I only want to return the child nodes of like this?
<certain>
<name>Jane Doe</name>
<age>21</age>
</certain>
<certain>
<certain>
<name>John Smith</name>
<age>34</age>
</certain>
EDIT: I'm trying to get a subset of XML and pass that directly, not an object like simplexml would give me. I am basically trying to get PHP to do what .NET's OuterXml does... return literally the above subset of XML as is... no interpreting or converting or creating a new XML file or anything... just extract those nodes in situ and pass them on. Am I going to have to get the XML file, parse out what I need and then rebuild it as a new XML file? If so then I need to get rid of the <?xml version="1.0" encoding="utf-8"?> bit... ugh.

The answer would be to use XPath.
$people = simplexml_load_string(
'<?xml version="1.0" encoding="utf-8"?>
<people>
<certain>
<name>Jane Doe</name>
<age>21</age>
</certain>
<certain>
<name>John Smith</name>
<age>34</age>
</certain>
</people>'
);
// get all <certain/> nodes
$people->xpath('//certain');
// get all <certain/> nodes whose <name/> is "John Smith"
print_r($people->xpath('//certain[name = "John Smith"]'));
// get all <certain/> nodes whose <age/> child's value is greater than 21
print_r($people->xpath('//certain[age > 21]'));
Take 2
So apparently you want to copy some nodes from a document into another document? SimpleXML doesn't support that. DOM has methods for that but they're kind of annoying to use. Which one are you using? Here's what I use: SimpleDOM. In fact, it's really SimpleXML augmented with DOM's methods.
include 'SimpleDOM.php';
$results = simpledom_load_string('<results/>');
foreach ($people->xpath('//certain') as $certain)
{
$results->appendChild($certain);
}
That routine finds all <certain/> node via XPath, then appends them to the new document.

You could use DOMDocument.GetElementsByTagName or you could:
Use XPath?
<?php
$xml = simplexml_load_file("test.xml");
$result = $xml->xpath("//certain");
print_r($result);
?>

Use DOM and XPath. Xpath allows you to select nodes (and values) from an XML DOM.
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$result = '';
foreach ($xpath->evaluate('/people/certain') as $node) {
$result .= $dom->saveXml($node);
}
echo $result;
Demo: https://eval.in/162149
DOMDocument::saveXml() has a context argument. If provided it saves that node as XML. Much like outerXml(). PHP is able to register your own classes for the DOM nodes, too. So it is even possible to add an outerXML() function to element nodes.
class MyDomElement extends DOMElement {
public function outerXml() {
return $this->ownerDocument->saveXml($this);
}
}
class MyDomDocument extends DOMDocument {
public function __construct($version = '1.0', $encoding = 'utf-8') {
parent::__construct($version, $encoding);
$this->registerNodeClass('DOMElement', 'MyDomElement');
}
}
$dom = new MyDomDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$result = '';
foreach ($xpath->evaluate('/people/certain') as $node) {
$result .= $node->outerXml();
}
echo $result;
Demo: https://eval.in/162157

See http://www.php.net/manual/en/domdocument.getelementsbytagname.php

The answer turned out to be a combination of the xpath suggestion and outputting with asXML().
Using the example given by Josh Davis:
$people = simplexml_load_string(
<?xml version="1.0" encoding="utf-8"?>
<people>
<certain>
<name>Jane Doe</name>
<age>21</age>
</certain>
<certain>
<name>John Smith</name>
<age>34</age>
</certain>
</people>'
);
// get all <certain/> nodes
$nodes = $people->xpath('/people/certain');
foreach ( $nodes as $node ) {
$result .= $node->asXML()."\n";
}
echo $result;

Related

PHP - Remove leading and trailing spaces from XML tags

How to remove the leading and trailing white space between open and closing XML?
$sampleXML = '<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<PersonName>
<GivenName> David </GivenName>
<MiddleName> Raj</MiddleName>
<Affix>JR</Affix>
</PersonName>
<Aliases>
<PersonName>
<GivenName></GivenName>
<MiddleName></MiddleName>
<FamilyName></FamilyName>
</PersonName>
</Aliases>
<DemographicDetail>
<GovernmentId countryCode="US">testIDs data </GovernmentId>
<DateOfBirth>2000-12-12</DateOfBirth>
</DemographicDetail>
</note>
<anothertag>
<data type="credit">
<Vendor score="yes"> vendor name </Vendor>
</data>
</anothertag>';
$doc = new DOMDocument;
$doc->loadXML($xml);
foreach ($doc->documentElement->childNodes as $node) {
}
$xpath = new DOMXpath($doc);
$xml = $doc->saveXML($doc, LIBXML_NOEMPTYTAG);
I have tried using getElementsByTagName. But the tag name is dynamic. So it doesn't work for me in this case.
Is their any bulid in php class?
Expected XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<PersonName>
<GivenName>David</GivenName>
<MiddleName>Raj</MiddleName>
<Affix>JR</Affix>
</PersonName>
<Aliases>
<PersonName>
<GivenName></GivenName>
<MiddleName></MiddleName>
<FamilyName></FamilyName>
</PersonName>
</Aliases>
<DemographicDetail>
<GovernmentId countryCode="US">testIDs data</GovernmentId>
<DateOfBirth>2000-12-12</DateOfBirth>
</DemographicDetail>
</note>
<anothertag>
<data type="credit">
<Vendor score="yes">vendor name</Vendor>
</data>
</anothertag>
Thanks in advance.
You can traverse on all the nodes and trim the node value, for this you need a recursive function to traverse all the nodes:
function trimNodes(DOMNode $node) {
foreach ($node->childNodes as $child){
if($child->hasChildNodes()) {
trimNodes($child);
} else{
$child->nodeValue = trim($child->nodeValue);
}
}
}
call this function and send the $doc to it, then you will have your expected XML.
Please note your XML needs to have a root node. You have two root nodes currently (anothertag, note), wrap them in one root node.
If $sampleXML is just a string, then I think a regular expression could give you what you are looking for:
$pattern = '/(?<=\>)\s+(?=[a-zA-Z0-9,\.\_])|(?<=[a-zA-Z0-9,\.\_])\s+(?=\<)/';
$sampleXML = preg_replace($pattern, '', $sampleXML);

php dom Xml loop throught elements preserve childrens

So i have an Xml file like
<cars>
<id>1</id>
<photos>
<img>http://sit.com/img.jpg</img>
<img>http://sit.com/img.jpg</img>
<img>http://sit.com/img.jpg</img>
<img>http://sit.com/img.jpg</img>
</photos>
</cars>
So i need to change all tag name to alternative and i need get something like
<cars>
<ex_id>1</ex_id>
<images>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
</images>
</cars>
My code is
foreach ($dom->getElementsByTagName('cars') as $item) {
for ($i = 0; $i < $item->childNodes->length; ++$i) {
$car = $item->childNodes->item($i);
$NewElement = $dom->createElement($newName,$value);
$car->parentNode->replaceChild($NewElement->cloneNode(TRUE), $car);
}
}
Do something like that
<cars>
<ex_id>1</ex_id>
<images/>
</cars>
So it cut all childrens of <photos>, so my question is how to preserve children and also change childrens tags from <img> to <photo>
Here are several issues:
getElementByTagName() and $childNodes return 'live' lists, they change if you change the DOM. You can use iterator_to_array() to copy them into an array.
Here are not only element nodes. Comments, cdata sections and text (even containing only whitespaces) are nodes, too. If you iterate $childNodes you will have to validate the DOMNode::$nodeType.
Do not use the second argument of DOMDocument::createElement(). It has a broken escaping. Create a text node and append it.
1 and 2 go away if you use Xpath to fetch the nodes.
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('/cars/images/img') as $photo) {
$newNode = $dom->createElement('photo');
$newNode->appendChild($dom->createTextNode($photo->textContent));
$photo->parentNode->replaceChild($newNode, $photo);
}
echo $dom->saveXml();
Output:
<?xml version="1.0"?>
<cars>
<ex_id>1</ex_id>
<images>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
</images>
</cars>
Changing an DOM document is often a bad idea. It is easier to extract data from a source document and build a new target document:
$source = new DOMDocument();
$source->loadXml($xml);
$xpath = new DOMXPath($source);
$target = new DOMDocument();
$target->formatOutput = TRUE;
$cars = $target->appendChild($target->createElement('cars'));
$cars
->appendChild($target->createElement('ex_id'))
->appendChild(
$target->createTextNode(
$xpath->evaluate('string(/cars/id)')
)
);
$images = $cars->appendChild($target->createElement('images'));
foreach ($xpath->evaluate('/cars/photos/img') as $photo) {
$images
->appendChild($target->createElement('photo'))
->appendChild($target->createTextNode($photo->textContent));
}
echo $target->saveXml();
Output:
<?xml version="1.0"?>
<cars>
<ex_id>1</ex_id>
<images>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
<photo>http://sit.com/img.jpg</photo>
</images>
</cars>
Here is a language dedicated to transforming XML - XSLT. XSLT is supported by PHPs ext/xsl.

Prepending raw XML using PHP's SimpleXML

Given a base $xml and a file containing a <something> tag with attributes, children and children of its children, I would like to append it as first child and all of its children as raw XML.
Original XML:
<root>
<people>
<person>
<name>John Doe</name>
<age>47</age>
</person>
<person>
<name>James Johnson</name>
<age>13</age>
</person>
</people>
</root>
XML in file:
<something someval="x" otherthing="y">
<child attr="val" ..> { some children and values ... }</child>
<child attr="val2" ..> { some children and values ... }</child>
...
</something>
Result XML:
<root>
<something someval="x" otherthing="y">
<child attr="val" ..> { some children and values ... }</child>
<child attr="val2" ..> { some children and values ... }</child>
...
</something>
<people>
<person>
<name>John Doe</name>
<age>47</age>
</person>
<person>
<name>James Johnson</name>
<age>13</age>
</person>
</people>
</root>
This tag would contain several children both direct and recursively, so it would not be practical to build the XML via the SimpleXML operations. Besides, keeping it in a file would result in lower maintenance costs.
Technically it would simply be prepending one child. The problem is that this child would have other children and so on.
On the PHP addChild page there's a comment that says:
$x = new SimpleXMLElement('<root name="toplevel"></root>');
$f1 = new SimpleXMLElement('<child pos="1">alpha</child>');
$x->{$f1->getName()} = $f1; // adds $f1 to $x
However, this does not seem to treat my XML as raw XML therefore causing < and > escaped tags to appear. Several warnings concerning namespaces seem to appear as well.
I suppose I could do a quick replace of such tags but I am not sure whether it could cause future problems and it certainly does not feel right.
Manually hacking the XML is not an option and neither is adding children one by one. Choosing a different library could be.
Any clues on how to get this working?
Thanks!
I'm really not sure if that will work. Try this or downvote this, but I hope it helps. Using DOMDocument (Reference)
<?php
$xml = new DOMDocument();
$xml->loadHTML($yourOriginalXML);
$newNode = DOMDocument::createElement($someXMLtoPrepend);
$nodeRoot = $xml->getElementsByTagName('root')->item(0);
$nodeOriginal = $xml->getElementsByTagName('people')->item(0);
$nodeRoot->insertBefore($newNode,$nodeOriginal);
$finalXmlAsString = $xml->saveXML();
?>
Sometimes UTF-8 can make problems, then try this:
<?php
$xml = new DOMDocument();
$xml->loadHTML(mb_convert_encoding($yourOriginalXML, 'HTML-ENTITIES', 'UTF-8'));
$newNode = DOMDocument::createElement(mb_convert_encoding($someXMLtoPrepend, 'HTML-ENTITIES', 'UTF-8'));
$nodeRoot = $xml->getElementsByTagName('root')->item(0);
$nodeOriginal = $xml->getElementsByTagName('people')->item(0);
$nodeRoot->insertBefore($newNode,$nodeOriginal);
$finalXmlAsString = $xml->saveXML();
?>

XML Split Node on . value

I've used SO for many years and always found an answer but this time I have got myself well and truly lost.
I have an xml file I would like to split the compatbility into well formed xml
`<product>
<item>
<partno>abc123</partno>
<Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility>
</item>
</product>`
In Compatbility the word model changes with each item entry although the : after model is always there as is the . after each model group.
Should I use SimpleXml DomXml or an xpath to get the following result
`<product>
<item>
<partno>abc123</partno>
<Compatbility>
<model>model1: 110C, 115C, 117C.</model>
<model>model2: 1835C, 1840C.</model>
<model>model3: 210C, 215C, 3240C.</model>
</Compatbility>
</item>
</product>`
Thanks
For simplexml, you can run a regular expression matching on the text-value of an element.
You can then remove all inner text and add the parsed result as new child elements.
This can be done with all you said: DOMDocument, SimpleXMLElement - both with or without xpath.
Here is a commented example in SimpleXML (online demo):
<?php
/**
* #link http://stackoverflow.com/q/24304095/367456
* #link https://eval.in/164934
*/
$buffer = <<<XML
<product>
<item>
<partno>abc123</partno>
<Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility>
</item>
</product>
XML;
# load the xml string
$xml = simplexml_load_string($buffer);
# obtain the element in question
$compatbility = $xml->item->Compatbility;
# parse it's inner text-value for the models by a regex
$pattern = '~(model\\d?: [^.]+\\.) ?~u';
$result = preg_match_all($pattern, $compatbility, $matches);
# remove the text (so called simplexml self-reference)
$compatbility->{0} = '';
# add the parsed models as new model elements
foreach ($matches[1] as $model) {
$compatbility->model[] = $model;
}
# output the xml
$xml->asXML('php://output');
The output it gives is:
<?xml version="1.0"?>
<product>
<item>
<partno>abc123</partno>
<Compatbility><model>model1: 110C, 115C, 117C.</model><model>model2: 1835C, 1840C.</model><model>model3: 210C, 215C, 3240C.</model></Compatbility>
</item>
</product>
First ofcourse, you need to convert that first into something that you can manipulate (arrays). Then the usual parsing (using explode). In the end, you will need to create a new xml again. Consider this example:
$xml_string = '<product><item><partno>abc123</partno><Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility></item></product>';
$original_xml = simplexml_load_string($xml_string);
$data = json_decode(json_encode($original_xml), true);
$compatbility = $data['item']['Compatbility']; // get all compatibility values
// explode values
$compatbility = array_filter(array_map('trim', explode('.', $compatbility)));
$new_xml = new SimpleXMLElement('<product/>'); // initialize new xml
// add necessary values
$new_xml->addChild('item')->addChild('partno', $data['item']['partno']);
$new_xml->item->addChild('Compatbility');
// loop the values and add them as children
foreach($compatbility as $value) {
$value = trim(preg_replace('/(\w+):/', '', $value));
$new_xml->item->Compatbility->addChild('model', $value);
}
echo $new_xml->asXML(); // output as xml

Using insertBefore method in php

I have actually asked this before, but alas the PC got nicked that I had the solution on, and I no longer can get the previous solution to work.
I'm trying to add a new element to the XML below:
<?xml version="1.0" encoding="ISO-8859-1"?>
<data>
<comments>
<comment>
<date>20120509</date>
<time>10:21:05</time>
<name>Lucy</name>
<text>Hello etc</text>
</comment>
<comment> ...etc
The PHP code I'm using is:
$xml = new DOMDocument('1.0', 'utf-8');
$xml->load(filename.xml);
$parent = $xml->firstChild;
$refnode = $parent->firstChild;
$new = $parent->insertBefore($xml->createElement('comment'), $refnode);
However, this adds a new "comment" immediately after the "data" tag, and if I try to add children (such as "date", "time" etc...) with $new->addChild(tag, value), I get an "undefined method" error. I've tried all manner of permutations, but nothing works.
desired result would be:
<?xml version="1.0" encoding="ISO-8859-1"?>
<data>
<comments>
*<comment>
<date>20140225</date>
<time>17:39:05</time>
<name>Derek</name>
<text>New comment text</text>
</comment>*
<comment>
<date>20120509</date>
<time>10:21:05</time>
<name>Lucy</name>
<text>Hello etc</text>
</comment>
<comment> ...etc
Your XML file most likely contained the whitespace characters like in your example. These are interpreted as text nodes, which renders firstChild useless to obtain one of those elements you want.
You have to iterate over the children instead and get the first one, which is actually a DOMElement. Also you had to go one level deeper than you did. See appended sourcecode which outputs the result you want.
<?php
$xml = new DOMDocument();
$xml->loadXML('<?xml version="1.0" encoding="UTF-8"?>
<data>
<comments>
<comment>
<date>20120509</date>
<time>10:21:05</time>
<name>Lucy</name>
<text>Hello etc</text>
</comment>
</comments>
</data>');
$parent = $xml->firstChild;
foreach ($parent->childNodes as $c) {
if ($c instanceof DOMElement) {
$refnode = $c;
break;
}
}
foreach ($refnode->childNodes as $c) {
if ($c instanceof DOMElement) {
$refnode2 = $c;
break;
}
}
$insert = $xml->createElement('comment', 'test');
$refnode->insertBefore($insert, $refnode2);
echo $xml->saveHTML();

Categories