How can i get values inside <![CDATA[values]] > using php DOM.
This is few code from my xml.
<Destinations>
<Destination>
<![CDATA[Aghia Paraskevi, Skiatos, Greece]]>
<CountryCode>GR</CountryCode>
</Destination>
<Destination>
<![CDATA[Amettla, Spain]]>
<CountryCode>ES</CountryCode>
</Destination>
<Destination>
<![CDATA[Amoliani, Greece]]>
<CountryCode>GR</CountryCode>
</Destination>
<Destination>
<![CDATA[Boblingen, Germany]]>
<CountryCode>DE</CountryCode>
</Destination>
</Destinations>
Working with PHP DOM is fairly straightforward, and is very similar to Javascript's DOM.
Here are the important classes:
DOMNode — The base class for anything that can be traversed inside an XML/HTML document, including text nodes, comment nodes, and CDATA nodes
DOMElement — The base class for tags.
DOMDocument — The base class for documents. Contains the methods to load/save XML, as well as normal DOM document methods (see below).
There are a few staple methods and properties:
DOMDocument->load() — After creating a new DOMDocument, use this method on that object to load from a file.
DOMDocument->getElementsByTagName() — this method returns a node list of all elements in the document with the given tag name. Then you can iterate (foreach) on this list.
DOMNode->childNodes — A node list of all children of a node. (Remember, a CDATA section is a node!)
DOMNode->nodeType — Get the type of a node. CDATA nodes have type XML_CDATA_SECTION_NODE, which is a constant with the value 4.
DOMNode->textContent — get the text content of any node.
Note: Your CDATA sections are malformed. I don't know why there is an extra ]] in the first one, or an unclosed CDATA section at the end of the line, but I think it should simply be:
<![CDATA[Aghia Paraskevi, Skiatos, Greece]]>
Putting this all together we:
Create a new document object and load the XML
Get all Destination elements by tag name and iterate over the list
Iterate over all child nodes of each Destination element
Check if the node type is XML_CDATA_SECTION_NODE
If it is, echo the textContent of that node.
Code:
$doc = new DOMDocument();
$doc->load('test.xml');
$destinations = $doc->getElementsByTagName("Destination");
foreach ($destinations as $destination) {
foreach($destination->childNodes as $child) {
if ($child->nodeType == XML_CDATA_SECTION_NODE) {
echo $child->textContent . "<br/>";
}
}
}
Result:
Aghia Paraskevi, Skiatos, Greece
Amettla, Spain
Amoliani, Greece
Boblingen, Germany
Use this:
$parseFile = simplexml_load_file($myXML,'SimpleXMLElement', LIBXML_NOCDATA)
and next :
foreach ($parseFile->yourNode as $node ){
etc...
}
Best and easy way
$xml = simplexml_load_string($xmlData, 'SimpleXMLElement', LIBXML_NOCDATA);
$xmlJson = json_encode($xml);
$xmlArr = json_decode($xmlJson, 1); // Returns associative array
Use replace CDATA before parsing PHP DOM element after that you can get the innerXml or innerHtml:
str_replace(array('<\![CDATA[',']]>'), '', $xml);
I use following code.
Its not only read all xml data with
<![CDATA[values]] >
but also convert xml object to php associative array. So we can apply loop on the data.
$xml_file_data = json_decode(json_encode(simplexml_load_string($xml, 'SimpleXMLElement', LIBXML_NOCDATA),true), true);
Hope this will work for you.
function inBetweenOf(string $here, string $there, string $content) : string {
$left_over = strlen(substr($content, strpos($content, $there)));
return substr($content, strpos($content, $here) + strlen($here), -$left_over);
}
Iterate over "Destination" tags and then call inBetweenOf on each iteration.
$doc = inBetweenOf('<![CDATA[', ']]>', $xml);
Related
I've got an xml like this:
<father>
<son>Text with <b>HTML</b>.</son>
</father>
I'm using simplexml_load_string to parse it into SimpleXmlElement. Then I get my node like this
$xml->father->son->__toString(); //output: "Text with .", but expected "Text with <b>HTML</b>."
I need to handle simple HTML such as:
<b>text</b> or <br/> inside the xml which is sent by many users.
Me problem is that I can't just ask them to use CDATA because they won't be able to handle it properly, and they are already use to do without.
Also, if it's possible I don't want the file to be edited because the information need to be the one sent by the user.
The function simplexml_load_string simply erase anything inside HTML node and the HTML node itself.
How can I keep the information ?
SOLUTION
To handle the problem I used the asXml as explained by #ThW:
$tmp = $xml->father->son->asXml(); //<son>Text with <b>HTML</b>.</son>
I just added a preg_match to erase the node.
A CDATA section is a character node, just like a text node. But it does less encoding/decoding. This is mostly a downside, actually. On the upside something in a CDATA section might be more readable for a human and it allows for some BC in special cases. (Think HTML script tags.)
For an XML API they are nearly the same. Here is a small DOM example (SimpleXML abstracts to much).
$document = new DOMDocument();
$father = $document->appendChild(
$document->createElement('father')
);
$son = $father->appendChild(
$document->createElement('son')
);
$son->appendChild(
$document->createTextNode('With <b>HTML</b><br>It\'s so nice.')
);
$son = $father->appendChild(
$document->createElement('son')
);
$son->appendChild(
$document->createCDataSection('With <b>HTML</b><br>It\'s so nice.')
);
$document->formatOutput = TRUE;
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<father>
<son>With <b>HTML</b><br>It's so nice.</son>
<son><![CDATA[With <b>HTML</b><br>It's so nice.]]></son>
</father>
As you can see they are serialized very differently - but from the API view they are basically exchangeable. If you're using an XML parser the value you get back should be the same in both cases.
So the first possibility is just letting the HTML fragment be stored in a character node. It is just a string value for the outer XML document itself.
The other way would be using XHTML. XHTML is XML compatible HTML. You can mix an match different XML formats, so you could add the XHTML fragment as part of the outer XML.
That seems to be what you're receiving. But SimpleXML has some problems with mixed nodes. So here is an example how you can read it in DOM.
$xml = <<<'XML'
<father>
<son>With <b>HTML</b><br/>It's so nice.</son>
</father>
XML;
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$result = '';
foreach ($xpath->evaluate('/father/son[1]/node()') as $child) {
$result .= $document->saveXml($child);
}
echo $result;
Output:
With <b>HTML</b><br/>It's so nice.
Basically you need to save each child of the son element as XML.
SimpleXML is based on the same DOM library internally. That allows you to convert a SimpleXMLElement into a DOM node. From there you can again save each child as XML.
$father = new SimpleXMLElement($xml);
$sonNode = dom_import_simplexml($father->son);
$document = $sonNode->ownerDocument;
$result = '';
foreach ($sonNode->childNodes as $child) {
$result .= $document->saveXml($child);
}
echo $result;
I'm using SimpleXML & PHP to parse an XML element in the following form:
<element>
random text with <inlinetag src="http://url.com/">inline</inlinetag> XML to parse
</element>
I know I can reach inlinetag using $element->inlinetag, but I don't know how to reach it in such a way that I can basically replace the inlinetag with a link to the attribute source without using it's location in the text. The result would basically have to look like this:
here is a random text with inline XML
This may be a stupid questions, I hope someone here can help! :)
I found a way to do this using DOMElement.
One way to replace the element is by cloning it with a different name/attributes. Here is is a way to do this, using the accepted answer given on How do you rename a tag in SimpleXML through a DOM object?
function clonishNode(DOMNode $oldNode, $newName, $replaceAttrs = [])
{
$newNode = $oldNode->ownerDocument->createElement($newName);
foreach ($oldNode->attributes as $attr)
{
if (isset($replaceAttrs[$attr->name]))
$newNode->setAttribute($replaceAttrs[$attr->name], $attr->value);
else
$newNode->appendChild($attr->cloneNode());
}
foreach ($oldNode->childNodes as $child)
$newNode->appendChild($child->cloneNode(true));
$oldNode->parentNode->replaceChild($newNode, $oldNode);
}
Now, we use this function to clone the inline element with a new element and attribute name. Here comes the tricky part: iterating over all the nodes will not work as expected. The length of the selected nodes will change as you clone them, as the original node is removed. Therefore, we only select the first element until there are no elements left to clone.
$xml = '<element>
random text with <inlinetag src="http://url.com/">inline</inlinetag> XML to parse
</element>';
$dom = new DOMDocument;
$dom->loadXML($xml);
$nodes= $dom->getElementsByTagName('inlinetag');
echo $dom->saveXML(); //<element>random text with <inlinetag src="http://url.com/">inline</inlinetag> XML to parse</element>
while($nodes->length > 0) {
clonishNode($nodes->item(0), 'a', ['src' => 'href']);
}
echo $dom->saveXML(); //<element>random text with inline XML to parse</element>
That's it! All that's left to do is getting the content of the element tag.
Is this the result you want to achieve?
<?php
$data = '<element>
random text with
<inlinetag src="http://url.com/">inline
</inlinetag> XML to parse
</element>';
$xml = simplexml_load_string($data);
foreach($xml->inlinetag as $resource)
{
echo 'Your SRC attribute = '. $resource->attributes()->src; // e.g. name, price, symbol
}
?>
I am trying to get the value (text) of a specific node from an xml document using php DOM classes but I cannot do it right because I get the text content of that node merged with its descendants.
Let's suppose that I need to get the trees from this document:
<?xml version="1.0"?>
<trees>
LarchRedwoodChestnutBirch
<trimmed>Larch</trimmed>
<trimmed>Redwood</trimmed>
</trees>
And I get:
LarchRedwoodChestnutBirchLarchRedwood
You can see that I cannot remove the substring LarchRedwood made by the trimmed trees from the whole text because I would get only ChestnutBirch and it is not what I need.
Any suggest? (Thanx)
I got it. This works:
function specificNodeValue($node, $implode = true) {
$value = array();
if ($node->childNodes) {
for ($i = 0; $i < $node->childNodes->length; $i++) {
if (!(#$node->childNodes->item($i)->tagName)) {
$value[] = $node->childNodes->item($i)->nodeValue;
}
}
}
return (is_string($implode) ? implode($implode, $value) : ($implode === true ? implode($value) : $value));
}
A given node is like a root, if you get no tagName when you parse its child nodes then it is itself, so the value of that child node it is its own value.
Inside a bad formed xml document a node could have many pieces of value, put them all into an array to get the whole value of the node.
Use the function above to get needed node value without subnode values merged within.
Parameters are:
$node (required) must be a DOMElement object
$implode (optional) if you want to get a string (true by default) or an array (false) made up by many pieces of value. (Set a string instead of a boolean value if you wish to implode the array using a "glue" string).
You can try this to remove the trimmed node
$doc = new DOMDocument('1.0', 'utf-8');
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
$trees = $doc->getElementsByTagName('trees')->item(0);
foreach ($xpath->query('/trees/*') as $node)
{
$trees->removeChild($node);
}
echo $trees->textContent;
echo $trees->nodeValue;
Use $node->nodeValue to get a node's text content. If you use $node->textContent, you get all text from the current node and all child nodes.
Ideally, the XML should be:
<?xml version="1.0"?>
<trees>
<tree>Larch</tree>
<tree>Redwood</tree>
<tree>Chestnut</tree>
<tree>Birch</tree>
</trees>
To split "LarchRedwoodChestnutBirch" into separate words (by capital letter), you'll need to use PHP's "PCRE" functions:
http://www.php.net/manual/en/book.pcre.php
'Hope that helps!
I'm using simpleXML to add in a child node within one of my XML documents... when I do a print_r on my simpleXML object, the < is still being displayed as a < in the view source. However, after I save this object back to XML using DOMDocument, the < is converted to < and the > is converted to >
Any ideas on how to change this behavior? I've tried adding dom->substituteEntities = false;, but this did no good.
//Convert SimpleXML element to DOM and save
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = false;
$dom->substituteEntities = false;
$dom->loadXML($xml->asXML());
$dom->save($filename);
Here is where I'm using the <:
$new_hint = '<![CDATA[' . $value[0] . ']]>';
$PrintQuestion->content->multichoice->feedback->hint->Passage->Paragraph->addChild('TextFragment', $new_hint);
The problem, is I'm using simple XML to iterate through certain nodes in the XML document, and if an attribute matches a given ID, a specific child node is added with CDATA. Then after all processsing, I save the XML back to file using DOMDocument, which is where the < is converted to <, etc.
Here is a link to my entire class file, so you can get a better idea on what I'm trying to accomplish. Specifically refer to the hint_insert() method at the bottom.
http://pastie.org/1079562
SimpleXML and php5's DOM module use the same internal representation of the document (facilitated by libxml). You can switch between both apis without having to re-parse the document via simplexml_import_dom() and dom_import_simplexml().
I.e. if you really want/have to perform the iteration with the SimpleXML api once you've found your element you can switch to the DOM api and create the CData section within the same document.
<?php
$doc = new SimpleXMLElement('<a>
<b id="id1">a</b>
<b id="id2">b</b>
<b id="id3">c</b>
</a>');
foreach( $doc->xpath('b[#id="id2"]') as $b ) {
$b = dom_import_simplexml($b);
$cdata = $b->ownerDocument->createCDataSection('0<>1');
$b->appendChild($cdata);
unset($b);
}
echo $doc->asxml();
prints
<?xml version="1.0"?>
<a>
<b id="id1">a</b>
<b id="id2">b<![CDATA[0<>1]]></b>
<b id="id3">c</b>
</a>
The problem is that you're likely adding that as a string, instead of as an element.
So, instead of:
$simple->addChild('foo', '<something/>');
which will be treated as text:
$child = $simple->addChild('foo');
$child->addChild('something');
You can't have a literal < in the body of the XML document unless it's the opening of a tag.
Edit: After what you describe in the comments, I think you're after:
DomDocument::createCDatatSection()
$child = $dom->createCDataSection('your < cdata > body ');
$dom->appendChild($child);
Edit2: After reading your edit, there's only one thing I can say:
You're doing it wrong... You can't add elements as a string value for another element. Sorry, you just can't. That's why it's escaping things, because DOM and SimpleXML are there to make sure you always create valid XML. You need to create the element as an object... So, if you want to create the CDATA child, you'd have to do something like this:
$child = $PrintQuestion.....->addChild('TextFragment');
$domNode = dom_import_simplexml($child);
$cdata = $domNode->ownerDocument->createCDataSection($value[0]);
$domNode->appendChild($cdata);
That's all there should be to it...
I have a SimpleXMLElement object $child, and a SimpleXMLElement object $parent.
How can I add $child as a child of $parent? Is there any way of doing this without converting to DOM and back?
The addChild() method only seems to allow me to create a new, empty element, but that doesn't help when the element I want to add $child also has children. I'm thinking I might need recursion here.
Unfortunately SimpleXMLElement does not offer anything to bring two elements together. As #nickf wrote, it's more fitting for reading than for manipulation. However, the sister extension DOMDocument is for editing and you can bring both together via dom_import_simplexml(). And #salathe shows in a related answer how this works for specific SimpleXMLElements.
The following shows how this work with input checking and some more options. I do it with two examples. The first example is a function to insert an XML string:
/**
* Insert XML into a SimpleXMLElement
*
* #param SimpleXMLElement $parent
* #param string $xml
* #param bool $before
* #return bool XML string added
*/
function simplexml_import_xml(SimpleXMLElement $parent, $xml, $before = false)
{
$xml = (string)$xml;
// check if there is something to add
if ($nodata = !strlen($xml) or $parent[0] == NULL) {
return $nodata;
}
// add the XML
$node = dom_import_simplexml($parent);
$fragment = $node->ownerDocument->createDocumentFragment();
$fragment->appendXML($xml);
if ($before) {
return (bool)$node->parentNode->insertBefore($fragment, $node);
}
return (bool)$node->appendChild($fragment);
}
This exemplary function allows to append XML or insert it before a certain element, including the root element. After finding out if there is something to add, it makes use of DOMDocument functions and methods to insert the XML as a document fragment, it is also outlined in How to import XML string in a PHP DOMDocument. The usage example:
$parent = new SimpleXMLElement('<parent/>');
// insert some XML
simplexml_import_xml($parent, "\n <test><this>now</this></test>\n");
// insert some XML before a certain element, here the first <test> element
// that was just added
simplexml_import_xml($parent->test, "<!-- leave a comment -->\n ", $before = true);
// you can place comments above the root element
simplexml_import_xml($parent, "<!-- this works, too -->", $before = true);
// but take care, you can produce invalid XML, too:
// simplexml_add_xml($parent, "<warn><but>take care!</but> you can produce invalid XML, too</warn>", $before = true);
echo $parent->asXML();
This gives the following output:
<?xml version="1.0"?>
<!-- this works, too -->
<parent>
<!-- leave a comment -->
<test><this>now</this></test>
</parent>
The second example is inserting a SimpleXMLElement. It makes use of the first function if needed. It basically checks if there is something to do at all and which kind of element is to be imported. If it is an attribute, it will just add it, if it is an element, it will be serialized into XML and then added to the parent element as XML:
/**
* Insert SimpleXMLElement into SimpleXMLElement
*
* #param SimpleXMLElement $parent
* #param SimpleXMLElement $child
* #param bool $before
* #return bool SimpleXMLElement added
*/
function simplexml_import_simplexml(SimpleXMLElement $parent, SimpleXMLElement $child, $before = false)
{
// check if there is something to add
if ($child[0] == NULL) {
return true;
}
// if it is a list of SimpleXMLElements default to the first one
$child = $child[0];
// insert attribute
if ($child->xpath('.') != array($child)) {
$parent[$child->getName()] = (string)$child;
return true;
}
$xml = $child->asXML();
// remove the XML declaration on document elements
if ($child->xpath('/*') == array($child)) {
$pos = strpos($xml, "\n");
$xml = substr($xml, $pos + 1);
}
return simplexml_import_xml($parent, $xml, $before);
}
This exemplary function does normalize list of elements and attributes like common in Simplexml. You might want to change it to insert multiple SimpleXMLElements at once, but as the usage example shows below, my example does not support that (see the attributes example):
// append the element itself to itself
simplexml_import_simplexml($parent, $parent);
// insert <this> before the first child element (<test>)
simplexml_import_simplexml($parent->children(), $parent->test->this, true);
// add an attribute to the document element
$test = new SimpleXMLElement('<test attribute="value" />');
simplexml_import_simplexml($parent, $test->attributes());
echo $parent->asXML();
This is a continuation of the first usage-example. Therefore the output now is:
<?xml version="1.0"?>
<!-- this works, too -->
<parent attribute="value">
<!-- leave a comment -->
<this>now</this><test><this>now</this></test>
<!-- this works, too -->
<parent>
<!-- leave a comment -->
<test><this>now</this></test>
</parent>
</parent>
I hope this is helpful. You can find the code in a gist and as online demo / PHP version overview.
I know this isn't the most helpful answer, but especially since you're creating/modifying XML, I'd switch over to using the DOM functions. SimpleXML's good for accessing simple documents, but pretty poor at changing them.
If SimpleXML is treating you kindly in all other places and you want to stick with it, you still have the option of jumping over to the DOM functions temporarily to perform what you need to and then jump back again, using dom_import_simplexml() and simplexml_import_dom(). I'm not sure how efficient this is, but it might help you out.
Actually, it's possible (dynamically) if you look carefully on how addChild() is defined. I used this technique to convert any array into XML using recursion and pass-by-reference
addChild() returns SimpleXMLElement of added child.
to add leaf node, use $xml->addChilde($nodeName, $nodeValue).
to add a node which may have subnode or value, use
$xml->addChilde($nodeName), no value is passed to addChild(). This
will result in having a subnode of type SimpleXMLElement! not a
string!
target XML
<root>
<node>xyz</node>
<node>
<node>aaa</node>
<node>bbb</node>
</node>
</root>
Code:
$root = new SimpleXMLElement('<root />');
//add child with name and string value.
$root.addChild('node', 'xyz');
//adds child with name as root of new SimpleXMLElement
$sub = $root->addChild('node');
$sub.addChild('node', 'aaa');
$sub.addChild('node', 'bbb');
Leaving this here as I just stumbled upon this page and found that SimpleXML now supports this functionality through the ::addChild method.
You can use this method to do add any cascading elements as well:
$xml->addChild('parent');
$xml->parent->addChild('child');
$xml->parent->child->addChild('child_id','12345');