PHP SimpleXML recursive function to list children and attributes - php

I need some help on the SimpleXML calls for a recursive function that lists the elements name and attributes. Making a XML config file system but each script will have it's own config file as well as a new naming convention. So what I need is an easy way to map out all the elements that have attributes, so like in example 1 I need a simple way to call all the processes but I don't know how to do this without hard coding the elements name is the function call. Is there a way to recursively call a function to match a child element name? I did see the xpath functionality but I don't see how to use this for attributes.
Also does the XML in the examples look correct? can I structure my XML like this?
Example 1:
<application>
<processes>
<process id="123" name="run batch A" />
<process id="122" name="run batch B" />
<process id="129" name="run batch C" />
</processes>
<connections>
<databases>
<database usr="test" pss="test" hst="test" dbn="test" />
</databases>
<shells>
<ssh usr="test" pss="test" hst="test-2" />
<ssh usr="test" pss="test" hst="test-1" />
</shells>
</connections>
</application>
Example 2:
<config>
<queues>
<queue id="1" name="test" />
<queue id="2" name="production" />
<queue id="3" name="error" />
</queues>
</config>
Pseudo code:
// Would return matching process id
getProcess($process_id) {
return the process attributes as array that are in the XML
}
// Would return matching DBN (database name)
getDatabase($database_name) {
return the database attributes as array that are in the XML
}
// Would return matching SSH Host
getSSHHost($ssh_host) {
return the ssh attributes as array that are in the XML
}
// Would return matching SSH User
getSSHUser($ssh_user) {
return the ssh attributes as array that are in the XML
}
// Would return matching Queue
getQueue($queue_id) {
return the queue attributes as array that are in the XML
}
EDIT:
Can I pass two parms? on the first method you have suggested #Gordon
I just got it, thnx, see below
public function findProcessById($id, $name)
{
$attr = false;
$el = $this->xml->xpath("//process[#id='$id'][#name='$name']"); // How do I also filter by the name?
if($el && count($el) === 1) {
$attr = (array) $el[0]->attributes();
$attr = $attr['#attributes'];
}
return $attr;
}

The XML looks good to me. The only thing I wouldn't do is making name an attribute in process, because it contains spaces and should be a textnode then (in my opinion). But since SimpleXml does not complain about it, I guess it boils down to personal preference.
I'd likely approach this with a DataFinder class, encapsulating XPath queries, e.g.
class XmlFinder
{
protected $xml;
public function __construct($xml)
{
$this->xml = new SimpleXMLElement($xml);
}
public function findProcessById($id)
{
$attr = false;
$el = $this->xml->xpath("//process[#id='$id']");
if($el && count($el) === 1) {
$attr = (array) $el[0]->attributes();
$attr = $attr['#attributes'];
}
return $attr;
}
// ... other methods ...
}
and then use it with
$finder = new XmlFinder($xml);
print_r( $finder->findProcessById(122) );
Output:
Array
(
[id] => 122
[name] => run batch B
)
XPath tutorial:
http://www.w3schools.com/XPath/default.asp
An alternative would be to use SimpleXmlIterator to iterate over the elements. Iterators can be decorated with other Iterators, so you can do:
class XmlFilterIterator extends FilterIterator
{
protected $filterElement;
public function setFilterElement($name)
{
$this->filterElement = $name;
}
public function accept()
{
return ($this->current()->getName() === $this->filterElement);
}
}
$sxi = new XmlFilterIterator(
new RecursiveIteratorIterator(
new SimpleXmlIterator($xml)));
$sxi->setFilterElement('process');
foreach($sxi as $el) {
var_dump( $el ); // will only give process elements
}
You would have to add some more methods to have the filter work for attributes, but this is a rather trivial task.
Introduction to SplIterators:
http://www.phpro.org/tutorials/Introduction-to-SPL.html

Related

PHP DOM: How do I replace the DocumentType node in my XML document?

I'm writing a script to clean up the so-called HTML document that MS Word creates when you Save As "Web Page, Filtered". I want the resulting document to be valid XHTML1.
The first thing I want to do is to change the !DOCTYPE so it will be XHTML 1.0 Strict instead of ...4.0 Transitional.
I wrote code that looked as if it should work, but when I run it I get a Segmentation fault from PHP. At first, I thought this was occurring in the save function, but after adding some echo statements for debugging I now think that the problem is at the places marked {{{1}}} and {{{2}}} in the code (below).
Here's what I think is going on: at {{{1}}} I am iterating through the DOMNodeList, treating it as if it were an ordinary array that I can traverse with foreach.
But at {{{2}}} I change the parent's subnode list. I suspect this breaks my foreach: either the DOMNodeList or my foreach pointer becomes invalid.
So what is the "right" way to make changes to a a DOM tree while you're traversing it? I came up with two possible options:
Copy the DOMNodeList into an ordinary array:
$nodelist = [];
foreach ($node->childNodes as $subnode) {
$nodelist[] = $subnode;
// Or perhaps an object that contains the appropriate code and parameters for the change I want to make
}
foreach ($nodelist as $subnode) {
// make the appropriate change
}
Traverse the DOM tree, but do not make any changes. Instead, create an array of all the places where I want to make changes. When finished, go through that array and make the changes.
Maybe there's some "official" way of doing this????
The relevant parts of my code below:
<?
$dom = new DOMDocument();
$dom->loadHTMLFile($htmFName);
$trav = new DOMTraverser($dom);
$storyParms = new StoryParams("some string");
$callback = new StoryDocCallback($htmFName);
$trav->traverse($callback, $storyParms);
$dom->save("y");
class DOMTraverser
{
private $docNode;
private $callback;
private $param;
public function __construct(DOMNode $node)
{
$this->docNode = $node;
}
public function traverse(GeneralCallBack $cb, $param)
{
$this->callback = $cb;
$this->param = $param;
$this->traverseNode($this->docNode);
}
public function traverseNode($node)
{
$this->callback->callBefore($node, $this->param);
if ($node->hasChildNodes()) {
{{1}} foreach ($node->childNodes as $subnode) {
if($subnode != null) {
$this->traverseNode($subnode);
}
}
}
}
}
class StoryDocCallback implements GeneralCallback
{
public function callbefore($node, $param)
{
$name = $node->nodeName;
if (is_a($node, "DOMDocumentType")) {
$this->repairDocType($node);
return;
}
...
}
protected function repairDocType(DOMNode $node)
{
$impl = new DomImplementation();
$rootName = "html";
$pubID = "-//W3C//DTD XHTML 1.0 Strict//EN";
$sysID = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";
$newDocType = $impl->createDocumentType($rootName, $pubID, $sysID);
$parent = $node->parentNode;
{{2}} $rc = $parent->replaceChild($newDocType, $node);
assert($rc != false);
}
...
}

How to remove unwanted HTML tags from user input but keep text inside the tags in PHP using DOMDocument

I have around ~2 Million stored HTML pages in S3 that contain various HTML. I'm trying to extract only the content from those stored pages, but I wish to retain the HTML structure with certain constraints. This HTML is all user-supplied input and should be considered unsafe. So for display purposes, I want to retain only some of the HTML tags with a constraint on attributes and attribute values, but still retain all of the properly encoded text content inside even disallowed tags.
For example, I'd like to allow only specific tags like <p>, <h1>, <h2>, <h3>, <ul>, <ol>, <li>, etc.. But I also want to keep whatever text is found between disallowed tags and maintain its structure. I also want to be able to restrict attributes in each tag or force certain attributes to be applied to specific tags.
For example, in the following HTML...
<div id="content">
Some text...
<p class="someclass">Hello <span style="color: purple;">PHP</span>!</p>
</div>
I'd like the result to be...
Some text...
<p>Hello PHP!</p>
Thus stripping out the unwanted <div> and <span> tags, the unwanted attributes of all tags, and still maintaining the text inside <div> and <span>.
Simply using strip_tags() won't work here. So I tried doing the following with DOMDocuemnt.
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach($dom->childNodes as $node) {
if ($node->nodeName != "p") { // only allow paragraph tags
$text = $node->nodeValue;
$node->parentNode->nodeValue .= $text;
$node->parentNode->removeChild($node);
}
}
echo $dom->saveHTML();
Which would work on simple cases where there aren't nested tags, but obviously fails when the HTML is complex.
I can't exactly call this function recursively on each of the node's child nodes because if I delete the node I lose all further nested children. Even if I defer node deletion until after the recursion the order of text insertion becomes tricky. Because I try to go deep and return all valid nodes then start concatenating the values of the invalid child nodes together and the result is really messy.
For example, let's say I want to allow <p> and <em> in the following HTML
<p>Hello <strong>there <em>PHP</em>!</strong></p>
But I don't want to allow <strong>. If the <strong> has nested <em> my approach gets really confusing. Because I'd get something like ...
<p>Hello there !<em>PHP</em></p>
Which is obviously wrong. I realized getting the entire nodeValue is a bad way of doing this. So instead I started digging into other ways to go through the entire tree one node at a time. Just finding it very difficult to generalize this solution so that it works sanely every time.
Update
A solution to use strip_tags() or the answer provided here isn't helpful to my use case, because the former does not allow me to control the attributes and the latter removes any tag that has attributes. I don't want to remove any tag that has an attribute. I want to explicitly allow certain tags but still have extensible control over what attributes can be kept/modified in the HTML.
It seems this problem needs to be broken down into two smaller steps in order to generalize the solution.
First, Walking the DOM Tree
In order to get to a working solution I found I need to have a sensible way to traverse every node in the DOM tree and inspect it in order to determine if it should be kept as-is or modified.
So I used wrote the following method as a simple generator extending from DOMDocument.
class HTMLFixer extends DOMDocument {
public function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}
This way doing something like foreach($dom->walk($dom) as $node) gives me a simple loop to traverse the entire tree. Of course this is a PHP 7 only solution because of the yield from syntax, but I'm OK with that.
Second, Removing Tags but Keeping their Text
The tricky part was figuring out how to keep the text and not the tag while making modifications inside the loop. So after struggling with a few different approaches I found the simplest way was to build a list of tags to be removed from inside the loop and then remove them later using DOMNode::insertBefore() to append the text nodes up the tree. That way removing those nodes later has no side effects.
So I added another generalized stripTags method to this child class for DOMDocument.
public function stripTags(DOMNode $node) {
$change = $remove = [];
/* Walk the entire tree to build a list of things that need removed */
foreach($this->walk($node) as $n) {
if ($n instanceof DOMText || $n instanceof DOMDocument) {
continue;
}
$this->stripAttributes($n); // strips all node attributes not allowed
$this->forceAttributes($n); // forces any required attributes
if (!in_array($n->nodeName, $this->allowedTags, true)) {
// track the disallowed node for removal
$remove[] = $n;
// we take all of its child nodes for modification later
foreach($n->childNodes as $child) {
$change[] = [$child, $n];
}
}
}
/* Go through the list of changes first so we don't break the
referential integrity of the tree */
foreach($change as list($a, $b)) {
$b->parentNode->insertBefore($a, $b);
}
/* Now we can safely remove the old nodes */
foreach($remove as $a) {
if ($a->parentNode) {
$a->parentNode->removeChild($a);
}
}
}
The trick here is because we use insertBefore, on the child nodes (i.e. text node) of the disallowed tags, to move them up to the parent tag, we could easily break the tree (we're copying). This confused me a lot at first, but looking at the way the method works, it makes sense. Deferring the move of the node makes sure we don't break parentNode reference when the deeper node is the one that's allowed, but its parent is not in the allowed tags list for example.
Complete Solution
Here's the complete solution I came up with to more generally solve this problem. I'll include in my answer since I struggled to find a lot of the edge cases in doing this with DOMDocument elsewhere. It allows you to specify which tags to allow, and all other tags are removed. It also allows you to specify which attributes are allowed and all other attributes can be removed (even forcing certain attributes on certain tags).
class HTMLFixer extends DOMDocument {
protected static $defaultAllowedTags = [
'p',
'h1',
'h2',
'h3',
'h4',
'h5',
'h6',
'pre',
'code',
'blockquote',
'q',
'strong',
'em',
'del',
'img',
'a',
'table',
'thead',
'tbody',
'tfoot',
'tr',
'th',
'td',
'ul',
'ol',
'li',
];
protected static $defaultAllowedAttributes = [
'a' => ['href'],
'img' => ['src'],
'pre' => ['class'],
];
protected static $defaultForceAttributes = [
'a' => ['target' => '_blank'],
];
protected $allowedTags = [];
protected $allowedAttributes = [];
protected $forceAttributes = [];
public function __construct($version = null, $encoding = null, $allowedTags = [],
$allowedAttributes = [], $forceAttributes = []) {
$this->setAllowedTags($allowedTags ?: static::$defaultAllowedTags);
$this->setAllowedAttributes($allowedAttributes ?: static::$defaultAllowedAttributes);
$this->setForceAttributes($forceAttributes ?: static::$defaultForceAttributes);
parent::__construct($version, $encoding);
}
public function setAllowedTags(Array $tags) {
$this->allowedTags = $tags;
}
public function setAllowedAttributes(Array $attributes) {
$this->allowedAttributes = $attributes;
}
public function setForceAttributes(Array $attributes) {
$this->forceAttributes = $attributes;
}
public function getAllowedTags() {
return $this->allowedTags;
}
public function getAllowedAttributes() {
return $this->allowedAttributes;
}
public function getForceAttributes() {
return $this->forceAttributes;
}
public function saveHTML(DOMNode $node = null) {
if (!$node) {
$node = $this;
}
$this->stripTags($node);
return parent::saveHTML($node);
}
protected function stripTags(DOMNode $node) {
$change = $remove = [];
foreach($this->walk($node) as $n) {
if ($n instanceof DOMText || $n instanceof DOMDocument) {
continue;
}
$this->stripAttributes($n);
$this->forceAttributes($n);
if (!in_array($n->nodeName, $this->allowedTags, true)) {
$remove[] = $n;
foreach($n->childNodes as $child) {
$change[] = [$child, $n];
}
}
}
foreach($change as list($a, $b)) {
$b->parentNode->insertBefore($a, $b);
}
foreach($remove as $a) {
if ($a->parentNode) {
$a->parentNode->removeChild($a);
}
}
}
protected function stripAttributes(DOMNode $node) {
$attributes = $node->attributes;
$len = $attributes->length;
for ($i = $len - 1; $i >= 0; $i--) {
$attr = $attributes->item($i);
if (!isset($this->allowedAttributes[$node->nodeName]) ||
!in_array($attr->name, $this->allowedAttributes[$node->nodeName], true)) {
$node->removeAttributeNode($attr);
}
}
}
protected function forceAttributes(DOMNode $node) {
if (isset($this->forceAttributes[$node->nodeName])) {
foreach ($this->forceAttributes[$node->nodeName] as $attribute => $value) {
$node->setAttribute($attribute, $value);
}
}
}
protected function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}
So if we have the following HTML
<div id="content">
Some text...
<p class="someclass">Hello <span style="color: purple;">P<em>H</em>P</span>!</p>
</div>
And we only want to allow <p>, and <em>.
$html = <<<'HTML'
<div id="content">
Some text...
<p class="someclass">Hello <span style="color: purple;">P<em>H</em>P</span>!</p>
</div>
HTML;
$dom = new HTMLFixer(null, null, ['p', 'em']);
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
echo $dom->saveHTML($dom);
We'd get something like this...
Some text...
<p>Hello P<em>H</em>P!</p>
Since you can limit this to a specific subtree in the DOM as well the solution could be generalized even more.
You can use strip_tags() like this:
$html = '<div id="content">
Some text...
<p class="someclass">Hello <span style="color: purple;">PHP</span>!</p>
</div>';
$updatedHTML = strip_tags($text,"<p><h1><h2><h3><ul><ol><li>");
//in second parameter we need to provide which html tag we need to retain.
You can get more information here: http://php.net/manual/en/function.strip-tags.php

Pass XML node as parameter on object instantiation and then calling subnodes from it

I want to know if passing an XML node and then calling upon a method to access it is legal syntax in PHP. I tried converting to string, but that didn't work.
What am I doing wrong?
What would be the best/simplest alternative?
XML
<user>
<widgets>
<widget>Widget 1</widget>
<stuff>
<morestuff>Things</morestuff>
</stuff>
<stuff>
<morestuff>Things</morestuff>
</stuff>
<widget>Widget 2</widget>
</widgets>
</user>
PHP
<?php
$xmlfile = 'widgets/widgets_files/widgets.xml';
$widgets = array();
$user = new SimpleXMLElement($xmlfile, NULL, true);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom = dom_import_simplexml($user)->ownerDocument;
foreach ($user->widgets->widget as $widget) {
$new_widget = new Widget($widget); //Where the node gets passed
array_push($widgets, $new_widget);
}
//For example
$new_widget[0]->set_subnodes();
$new_widget[0]->get_subnodes();
class Widget {
private $widget;
private $stuffArray = array();
public function __construct($widget) {
$this->widget = $widget;
}
public function set_subnodes() {
foreach ($this->widget->stuff->morestuff as $morestuff => $value) {
$this->stuffArray[$morestuff] = $value;
}
}
public function get_subnodes() {
foreach ($this->stuffArray as $stuff) {
echo$stuff;
}
}
}
It is indeed possible to pass XML objects as parameters to objects and to call methods on them, but there are a number of errors in your code which are stopping it from working. In particular, the XML that you are using isn't the structure that you think it is--the stuff and morestuff nodes are not children of widget, so none of the actions that you're trying to perform with them will work. Here's a corrected version of the XML and some PHP code that does what I think you're trying to do above:
$widgets = array();
# you can load your code from a file, obviously--for the purposes of the example,
# I'm loading mine using a function.
$sxe = simplexml_load_string( get_my_xml() );
foreach ($sxe->widgets->widget as $widget) {
$new_widget = new Widget($widget); // Where the node gets passed
array_push($widgets, $new_widget);
}
// For example
foreach ($widgets as $w) {
$w->set_subnodes();
$w->get_subnodes();
}
function get_my_xml() {
return <<<XML
<user>
<widgets>
<widget>Widget 1
<stuff>
<morestuff>Things</morestuff>
</stuff>
<stuff>
<morestuff>Other Things</morestuff>
</stuff>
</widget>
<widget>Widget 2
<stuff>
<morestuff>Widget Two's Things</morestuff>
</stuff>
<stuff>
<morestuff>Widget Two's Other Things</morestuff>
</stuff>
</widget>
</widgets>
</user>
XML;
}
The Widget object:
class Widget {
private $widget;
private $stuffArray = array();
public function __construct($widget) {
$this->widget = $widget;
}
public function set_subnodes() {
# put all the "morestuff" nodes into the stuffArray
foreach ($this->widget->xpath("stuff/morestuff") as $ms) {
print "pushing $ms on to array" . PHP_EOL;
array_push($this->stuffArray, $ms);
}
}
public function get_subnodes() {
foreach ($this->stuffArray as $stuff) {
print "Running get_subnodes: got $stuff" . PHP_EOL;
}
}
}
Output:
pushing Things on to array
pushing Other Things on to array
Running get_subnodes: got Things
Running get_subnodes: got Other Things
pushing Widget Two's Things on to array
pushing Widget Two's Other Things on to array
Running get_subnodes: got Widget Two's Things
Running get_subnodes: got Widget Two's Other Things

Pass container object in XSLTProcessor

is there any way to pass or bind Container object and call Service object's method in XSLTProcessor. some thing like.
XSLTProcessor::registerFunction(); //in php file.
in xsltStylesheet
<xslt:value-of select="php:function('serviceobject::serviceObjectMethod',string($xsltProcessingVariable))"/>
In "normal" php code you can do something like
<?php
class Foo {
public function __construct($prefix) {
$this->prefix = $prefix;
}
public function myMethod($id) {
return sprintf('%s#%s', $this->prefix, $id);
}
}
$fooA = new Foo('A');
$fooB = new Foo('B');
echo call_user_func_array( array($fooA, 'myMethod'), array('id1') ), "\r\n";
echo call_user_func_array( array($fooB, 'myMethod'), array('id1') ), "\r\n";
i.e. instead of giving call_user_func_array just the name of the function you pass an array($obj, 'methodName') to invoke an instance method.
Unfortunatley that doesn't seem to work with php:function(...) and I haven't found another easy/clean way to do it.
But you could register your objects in a lookup table string_id->object and then use something like
select="php:function('invoke', 'obj1', 'myMethod', string(#param1), string(#param2))"
in your stylesheet. function invoke($objectId, $methodName) now has to find the object that has been registered under $objectId and then invoke the method like in the previous example.
func_get_args() lets you retrieve all parameters passed to a function, even those that are not declared in the function signature. Cut off the first two elements (i.e. $objectId and $methodName) and pass the remaining array as arguments to call_user_func_array.
self-contained example:
<?php
class Foo {
public function __construct($prefix) {
$this->prefix = $prefix;
}
public function myMethod($id) {
return sprintf('%s#%s', $this->prefix, $id);
}
}
function invoke($objectId, $methodname)
{
static $lookup = array();
$args = func_get_args();
if ( is_null($methodname) ) {
$lookup[$objectId] = $args[2];
}
else {
$args = array_slice($args, 2);
return call_user_func_array( array($lookup[$objectId], $methodname), $args);
}
}
// second parameter null -> register object
// sorry, it's just a quick hack
// don't do this in production code, no one will remember after two weeks
invoke('obj1', null, new Foo('A'));
invoke('obj2', null, new Foo('B'));
$proc = new XSLTProcessor();
$proc->registerPHPFunctions();
$proc->importStyleSheet(new SimpleXMLElement( style() ));
echo $proc->transformToXML(new SimpleXMLElement( document() ));
function document() {
return <<<EOB
<doc>
<element id="id1" />
<element id="id2" />
</doc>
EOB;
}
function style() {
return <<<EOB
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:php="http://php.net/xsl">
<xsl:output method="text"/>
<xsl:template match="element">
Obj1-<xsl:value-of select="php:function('invoke', 'obj1', 'myMethod', string(#id))"/>
|||
Obj2-<xsl:value-of select="php:function('invoke', 'obj2', 'myMethod', string(#id))"/>
</xsl:template>
</xsl:stylesheet>
EOB;
}
prints
Obj1-A#id1
|||
Obj2-B#id1
Obj1-A#id2
|||
Obj2-B#id2
btw: don't implement your invoke() function like I did in this example. I just failed to come up with a better way to implement a register()/invoke() functionality for this quick example ;-)

How do I dynamically create a PHP SimpleXMLElement Object while keeping current properties?

I am reading in a an xml file which returns me a SimpleXMLElement Object representation of the xml. I am going to take an array and feed in new values to that object. I don't know what I am going to be in that array.
if I were to brute force this I would do something like this.
//Solution 1: Brute Force
//Just creating an array and value for purposes of demonstration.
$arOfData = array( [0]=>"theFirstNode", [1]=>"theSecondNode",[2]=>"theThirdNode" );
$value = "The XML Node Value";
$simpleXml->$arOfData[0]->$arOfData[1]->$arOfData[2] = $value;
//The next best thing I can think of doing is something like this.
//Solution 2: Semi-brute force
//
foreach($this->arrayData as $key => $value) {
$xmlNodes = explode( '-', $key);
$numNodes = count($xmlNodes);
switch($numNodes) {
case 1:
$simpleXml->$xmlNodes[0] = $value;
break;
case 2:
$simpleXml->$xmlNodes[0]->$xmlNodes[1] = $value;
break;
case 3:
$simpleXml->$xmlNodes[0]->$xmlNodes[1]->$xmlNodes[2] = $value;
break;
case 4:
$simpleXml->$xmlNodes[0]->$xmlNodes[1]->$xmlNodes[2]->$xmlNodes[3] = $value;
break;
case 5:
$simpleXml->$xmlNodes[0]->$xmlNodes[1]->$xmlNodes[2]->$xmlNodes[3]->$xmlNodes[4] = $value;
break;
}
}
*note This solution uses the array key and explodes it to an array delimited by a dash and then uses the array value as the new xml value. So don't let that distract you.
The problem with solution #2 is: what happens when we get a xml node that is deeper than 5? Its not going to be stuffed into our new object we are creating. Oh oh. It's also not very elegant ;). I am not sure how to do this in a more recursive manner.
Like you already wrote in your question, you need to have this dynamically because you do not know about the number of parent elements.
You need to dig a little deeper into how simpexml works to get this done.
But first let me suggest you to have a different notation, not with the minus sign you have but with a slash like in a path.
first/second/third
This is also common with Xpath and I think it's pretty well speaking for itself. Also the minus sign can be part of an element name, but the slash can not. So this is just a bit better.
Before I show you how you can easily access that <third> element node to set its value, first lets look at some assignment basics in simplexml.
To access and set this element-node in a SimpleXMLElement see the following example:
$xml = new SimpleXMLElement('<root><first><second><third/></second></first></root>');
$element = $xml->first->second->third;
$element[0] = "value";
This is pretty straight forward but you can see two things here:
The <third> element already exists in the document.
The code uses as simplexml-self-reference ([0]) which allows to set the XML value of the element variable (and not the variable). This is specific to how SimpleXMLElement works.
The second point also contains the solution to the problem how to deal with non-existent elements. $element[0] is NULL in case the element does not exists:
$xml = new SimpleXMLElement('<root><first><second/></first></root>');
$element = $xml->first->second->third;
var_dump($element[0]); # NULL
So let's try to conditionally add the third element in case it does not exists:
if ($xml->first->second->third[0] === NULL) {
$xml->first->second->third = "";
}
This does solve that problem. So the only thing left to do is to do that in an iterative fashion for all parts of the path:
first/second/third
To keep this easy, create a function for this:
/**
* Modify an elements value specified by a string-path.
*
* #param SimpleXMLElement $parent
* #param string $path
* #param string $value (optional)
*
* #return SimpleXMLElement the modified element-node
*/
function simplexml_deep_set(SimpleXMLElement $parent, $path, $value = '')
{
### <mocked> to be removed later: ###
if ($parent->first->second->third[0] === NULL) {
$parent->first->second->third = "";
}
$element = $parent->first->second->third;
### </mocked> ###
$element[0] = $value;
return $element;
}
Because the function is mocked, it can be used directly:
$xml = new SimpleXMLElement('<root><first><second/></first></root>');
simplexml_deep_set($xml, "first/second/third", "The XML Node Value");
$xml->asXML('php://output');
And this works:
<?xml version="1.0"?>
<root><first><second><third>The XML Node Value</third></second></first></root>
So now removing the mock. First insert the explode like you have it as well. Then all that needs to be done is to go along each step of the path and create the element conditionally if it yet does not exist. In the end $element will be the element to modify:
$steps = explode('/', $path);
$element = $parent;
foreach ($steps as $step)
{
if ($element->{$step}[0] === NULL) {
$element->$step = '';
}
$element = $element->$step;
}
This foreach is needed to replace the mock with a working version. Compare with the full function definition at a glance:
function simplexml_deep_set(SimpleXMLElement $parent, $path, $value = '')
{
$steps = explode('/', $path);
$element = $parent;
foreach ($steps as $step)
{
if ($element->{$step}[0] === NULL) {
$element->$step = "";
}
$element = $element->$step;
}
$element[0] = $value;
return $element;
}
Lets modify more crazy things to test it out:
$xml = new SimpleXMLElement('<root><first><second/></first></root>');
simplexml_deep_set($xml, "first/second/third", "The XML Node Value");
simplexml_deep_set(
$xml, "How/do/I/dynamically/create/a/php/simplexml/object/while/keeping/current/properties"
, "The other XML Node Value"
);
$xml->asXML('php://output');
Example-Output (beautified):
<?xml version="1.0"?>
<root>
<first>
<second>
<third>The XML Node Value</third>
</second>
</first>
<How>
<do>
<I>
<dynamically>
<create>
<a>
<php>
<simplexml>
<object>
<while>
<keeping>
<current>
<properties>The other XML Node Value</properties>
</current>
</keeping>
</while>
</object>
</simplexml>
</php>
</a>
</create>
</dynamically>
</I>
</do>
</How>
</root>
See it in action.

Categories