Getting cdata content while parsing xml file - php

I have an xml file
<?xml version="1.0" encoding="utf-8"?>
<xml>
<events date="01-10-2009" color="0x99CC00" selected="true">
<event>
<title>You can use HTML and CSS</title>
<description><![CDATA[This is the description ]]></description>
</event>
</events>
</xml>
I used xpath and and xquery for parsing the xml.
$xml_str = file_get_contents('xmlfile');
$xml = simplexml_load_string($xml_str);
if(!empty($xml))
{
$nodes = $xml->xpath('//xml/events');
}
i am getting the title properly, but iam not getting description.How i can get data inside
the cdata

SimpleXML has a bit of a problem with CDATA, so use:
$xml = simplexml_load_file('xmlfile', 'SimpleXMLElement', LIBXML_NOCDATA);
if(!empty($xml))
{
$nodes = $xml->xpath('//xml/events');
}
print_r( $nodes );
This will give you:
Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[date] => 01-10-2009
[color] => 0x99CC00
[selected] => true
)
[event] => SimpleXMLElement Object
(
[title] => You can use HTML and CSS
[description] => This is the description
)
)
)

You are probably being misled into thinking that the CDATA is missing by using print_r or one of the other "normal" PHP debugging functions. These cannot see the full content of a SimpleXML object, as it is not a "real" PHP object.
If you run echo $nodes[0]->Description, you'll find your CDATA comes out fine. What's happening is that PHP knows that echo expects a string, so asks SimpleXML for one; SimpleXML responds with all the string content, including CDATA.
To get at the full string content reliably, simply tell PHP that what you want is a string using the (string) cast operator, e.g. $description = (string)$nodes[0]->Description.
To debug SimpleXML objects and not be fooled by quirks like this, use a dedicated debugging function such as one of these: https://github.com/IMSoP/simplexml_debug

This could also be another viable option, which would remove that code and make life a little easier.
$xml = str_replace("<![CDATA[", "", $xml);
$xml = str_replace("]]>", "", $xml);

Related

Namespaces and XPath

I'm exploring XML and PHP, mostly XPath and other parsers.
Here be the xml:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
<items>
<item id="7">Pizza</item>
<item id="8">Cheese</item>
<item id="9">Cane</item>
</items>
</root>
Here be my path & code:
$xml = simplexml_load_file('xpath.xml');
$result = $xml -> xpath('/root/actors');
echo '<pre>'.print_r($result,1).'</pre>';
Now, said path returns:
Array
(
[0] => SimpleXMLElement Object
(
[actor] => Array
(
[0] => Christian Bale
[1] => Liam Neeson
[2] => Michael Caine
)
)
)
Whereas a seemingly similar line of code, which I would have though would result in the singers, doesnt. Meaning:
$result = $xml -> xpath('/root/foo:singers');
Results in:
Array
(
[0] => SimpleXMLElement Object
(
)
)
Now I would've thought the foo: namespace in this case is a non-issue and both paths should result in the same sort of array of singers/actors respectively? How come that is not the case?
Thank-you!
Note: As you can probably gather I'm quite new to xml so please be gentle.
Edit: When I go /root/foo:singers/foo:singer I get results, but not before. Also with just /root I only get actors and items as results, foo:singers are completely omitted.
SimpleXML is, for a number of reasons, simply a bad API.
For most purposes I suggest PHP's DOM extension. (Or for very large documents a combination of it along with XMLReader.)
For using namespaces in xpath you'll want to register those you'd like to use, and the prefix you want to use them with, with your xpath processor.
Example:
$dom = new DOMDocument();
$dom->load('xpath.xml');
$xpath = new DOMXPath($dom);
// The prefix *can* match that used in the document, but it's not necessary.
$xpath->registerNamespace("ns", "http://www.foo.org/");
foreach ($xpath->query("/root/ns:singers") as $node) {
echo $dom->saveXML($node);
}
Output:
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
DOMXPath::query returns a DOMNodeList containing matched nodes. You can work with it essentially the same way you would in any other language with a DOM implementation.
You can use // expression like:
$xml -> xpath( '//foo:singer' );
to select all foo:singer elements no matter where they are.
EDIT:
SimpleXMLElement is selected, you just can't see the child nodes with print_r(). Use SimpleXMLElement methods like SimpleXMLElement::children to access them.
// example 1
$result = $xml->xpath( '/root/foo:singers' );
foreach( $result as $value ) {
print_r( $value->children( 'foo', TRUE ) );
}
// example 2
print_r( $result[0]->children( 'foo', TRUE )->singer );

Json Encode or Serialize an XML

I have some xml, this is a simple version of it.
<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>
Using SimpleXML on the content,
$obj = simplexml_load_string( $xml );
I can use $obj->xpath( '//items/item' ); and get access to the #attributes.
I need an array result, so I have tried the json_decode(json_encode($obj),true) trick, but that looks to be removing access to the #attributes (ie. abc="123").
Is there another way of doing this, that provides access to the attributes and leaves me with an array?
You need to call attributes() function.
Sample code:
$xmlString = '<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>';
$xml = new SimpleXMLElement($xmlString);
foreach( $xml->items->item as $value){
$my_array[] = strval($value->attributes());
}
print_r($my_array);
Eval
You can go the route with json_encode and json_decode and you can add the stuff you're missing because that json_encode-ing follows some specific rules with SimpleXMLElement.
If you're interested into the rules and their details, I have written two blog-posts about it:
SimpleXML and JSON Encode in PHP – Part I
SimpleXML and JSON Encode in PHP – Part II
For you perhaps more interesing is the third part which shows how you can modify the json serialization and provide your own format (e.g. to preserve the attributes):
SimpleXML and JSON Encode in PHP – Part III and End
It ships with a full blown example, here is an excerpt in code:
$xml = '<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>';
$obj = simplexml_load_string($xml, 'JsonXMLElement');
echo $json = json_encode($obj, JSON_PRETTY_PRINT), "\n";
print_r(json_decode($json, TRUE));
Output of JSON and the array is as following, note that the attributes are part of it:
{
"items": {
"item": [
{
"#attributes": {
"abc": "123"
},
"#text": "item one"
},
{
"#attributes": {
"abc": "456"
},
"#text": "item two"
}
]
}
}
Array
(
[items] => Array
(
[item] => Array
(
[0] => Array
(
[#attributes] => Array
(
[abc] => 123
)
[#text] => item one
)
[1] => Array
(
[#attributes] => Array
(
[abc] => 456
)
[#text] => item two
)
)
)
)
$xml = new SimpleXMLElement($xmlString);
$xml is now an object. To get the value of an attribute:
$xml->something['id'];
Where 'id' is the name of the attribute.
While it's theoretically possible to write a generic conversion from XML to PHP or JSON structures, it is very hard to capture all the subtleties that might be present - the distinction between child elements and attributes, text content alongside attributes (as you have here) or even alongside child elements, multiple child nodes with the same name, whether order of child elements and text nodes is important (e.g. in XHTML or DocBook), etc, etc.
If you have a specific format you need to produce, it will generally be much easier to use an API - like SimpleXML - to loop over the XML and produce the structure you need.
You don't specify the structure you want to achieve, but the general approach given your input would be to loop over each item, and either access known attributes, or loop over each attribute:
$sxml = simplexml_load_string( $xml );
$final_array = array();
foreach ( $sxml->items->item as $xml_item )
{
$formatted_item = array();
// Text content of item
$formatted_item['content'] = (string)$xml_item;
// Specifically get 'abc' attribute
$formatted_item['abc'] = (string)$xml_item['abc'];
// Maybe one of the attributes is an integer
$formatted_item['foo_id'] = (int)$xml_item['foo_id'];
// Or maybe you want to loop over lots of possible attributes
foreach ( $xml_item->attributes() as $attr_name => $attr_value )
{
$formatted_item['attrib:' . $attr_name] = (string)$attr_value;
}
// Add it to a final list
$final_array[] = $formatted_item;
// Or maybe you want that array to be keyed on one of the attributes
$final_array[ (string)$xml_item['key'] ] = $formatted_item;
}
Here is a class I've found that is able to process XML into array very nicely: http://outlandish.com/blog/xml-to-json/ (backup). Converting to json is a matter of a json_encode() call.

PHP SimpleXML Element parsing issue

I've come across a weird but apparently valid XML string that I'm being returned by an API. I've been parsing XML with SimpleXML because it's really easy to pass it to a function and convert it into a handy array.
The following is parsed incorrectly by SimpleXML:
<?xml version="1.0" standalone="yes"?>
<Response>
<CustomsID>010912-1
<IsApproved>NO</IsApproved>
<ErrorMsg>Electronic refunds...</ErrorMsg>
</CustomsID>
</Response>
Simple XML results in:
SimpleXMLElement Object ( [CustomsID] => 010912-1 )
Is there a way to parse this in XML? Or another XML library that returns an object that reflects the XML structure?
That is an odd response with the text along with other nodes. If you manually traverse it (not as an array, but as an object) you should be able to get inside:
<?php
$xml = '<?xml version="1.0" standalone="yes"?>
<Response>
<CustomsID>010912-1
<IsApproved>NO</IsApproved>
<ErrorMsg>Electronic refunds...</ErrorMsg>
</CustomsID>
</Response>';
$sObj = new SimpleXMLElement( $xml );
var_dump( $sObj->CustomsID );
exit;
?>
Results in second object:
object(SimpleXMLElement)#2 (2) {
["IsApproved"]=>
string(2) "NO"
["ErrorMsg"]=>
string(21) "Electronic refunds..."
}
You already parse the XML with SimpleXML. I guess you want to parse it into a handy array which you not further define.
The problem with the XML you have is that it's structure is not very distinct. In case it does not change much, you can convert it into an array using a SimpleXMLIterator instead of a SimpleXMLElement:
$it = new SimpleXMLIterator($xml);
$mode = RecursiveIteratorIterator::SELF_FIRST;
$rit = new RecursiveIteratorIterator($it, $mode);
$array = array_map('trim', iterator_to_array($rit));
print_r($array);
For the XML-string in question this gives:
Array
(
[CustomsID] => 010912-1
[IsApproved] => NO
[ErrorMsg] => Electronic refunds...
)
See as well the online demo and How to parse and process HTML/XML with PHP?.

file_get_contents() warning when load xml data

what i want to do is : i have an CRM API which produce data in xml format,
i am using DOMDocument() to fetch the data into pieces. The problem is when i tried to load the data like : $dom->loadXML($data);
i got the error :
DOMDocument::loadXML() [domdocument.loadxml]: Input is not proper UTF-8
i google i got an alternative solution to use with file_get_contents() but still i am getting warning like:
here is the code that i am using to fetch the xml data:
$text = file_get_contents(stripslashes(utf8_decode($data)));
$doc = new DOMDocument();
$doc->loadXML($text);
and here i put my XML return data, not complete but just a small piece:
<?xml version="1.0" encoding="utf-8"?>
<whmcsapi version="5.0.3">
<action>getticket</action>
<result>success</result>
<ticketid>5767</ticketid>
<tid>409865</tid>
<c>NwLldOG6</c>
<deptid>2</deptid>
<deptname>Technical</deptname>
<userid>27476</userid>
<name>shirley b broyles (home)</name>
<email>sbroyles1#stx.rr.com</email>
<cc></cc>
<date>2012-08-27 19:52:12</date>
<subject>printer not working</subject>
<status>Customer-Reply</status>
<priority>High</priority>
<admin></admin>
<lastreply>2012-08-28 23:34:17</lastreply>
<flag>0</flag>
<service></service>
<replies>
<reply>
can anyone please tell me where is the problem? or any alternative ??
Okay, so let's strip down the problem. If $data contains invalid UTF-8, you should be thinking how to make it valid; one way (not sure if that works for you) is with utf8_encode() rather than utf8_decode() (which is used to turn UTF-8 into ISO-8859-1):
$doc = new DOMDocument();
$doc->loadXML(utf8_encode($data));
Otherwise you will need to find out which part of the text includes the bad input. I'll see what I can come up with.
Resources
Ensuring valid utf-8 in PHP
you can try simplexml_load_file php function..
$xml = simplexml_load_file("addinsrss.xml");
echo "<pre>";print_r($xml);echo "</pre>";
If you already have the xml source then dont pass it to FGC. Pass it directly to domDocument.
Or you can also use simplexml_load_string(),
<?php
$xml = '<?xml version="1.0" encoding="utf-8"?>
<whmcsapi version="5.0.3">
<action>getticket</action>
<result>success</result>
<ticketid>5767</ticketid>
<tid>409865</tid>
<c>NwLldOG6</c>
<deptid>2</deptid>
<deptname>Technical</deptname>
<userid>27476</userid>
</whmcsapi>
';
$xml = simplexml_load_string($xml);
print_r($xml);
/*
SimpleXMLElement Object
(
[#attributes] => Array
(
[version] => 5.0.3
)
[action] => getticket
[result] => success
[ticketid] => 5767
[tid] => 409865
[c] => NwLldOG6
[deptid] => 2
[deptname] => Technical
[userid] => 27476
...
...
)
*/
echo $xml->tid; //409865
?>
Or directly use $xml = simplexml_load_file('http://example.com/somexml.xml'); to load from an external url.

Breaking up an XML file

I've got an XML output that produces code such as:
<loadavg>
<one>0.00</one>
<five>0.02</five>
<fifteen>0.02</fifteen>
</loadavg>
<!-- whostmgrd -->
I would like to know how I can use PHP to parse that file, and grab the contents between <one>,<five> and <fifteen>. It would be useful if it were to be stored as $loadavg[1] an array.
Thanks
Yep, SimpleXML can do it:
$xml = <<<XML
<loadavg>
<one>0.00</one>
<five>0.02</five>
<fifteen>0.02</fifteen>
</loadavg>
XML;
$root = new SimpleXMLElement($xml);
$loadavg = array((string) $root->one,
(string) $root->five,
(string) $root->fifteen);
print_r($loadavg);
prints
Array (
[0] => 0.00
[1] => 0.02
[2] => 0.02
)
The easiest way to parse an XML file is using SimpleXML. It should be easy to load the XML as a SimpleXML object and get the data using that. If you could post a sample of a complete XML file I could offer some sample code.

Categories