I have some xml, this is a simple version of it.
<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>
Using SimpleXML on the content,
$obj = simplexml_load_string( $xml );
I can use $obj->xpath( '//items/item' ); and get access to the #attributes.
I need an array result, so I have tried the json_decode(json_encode($obj),true) trick, but that looks to be removing access to the #attributes (ie. abc="123").
Is there another way of doing this, that provides access to the attributes and leaves me with an array?
You need to call attributes() function.
Sample code:
$xmlString = '<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>';
$xml = new SimpleXMLElement($xmlString);
foreach( $xml->items->item as $value){
$my_array[] = strval($value->attributes());
}
print_r($my_array);
Eval
You can go the route with json_encode and json_decode and you can add the stuff you're missing because that json_encode-ing follows some specific rules with SimpleXMLElement.
If you're interested into the rules and their details, I have written two blog-posts about it:
SimpleXML and JSON Encode in PHP – Part I
SimpleXML and JSON Encode in PHP – Part II
For you perhaps more interesing is the third part which shows how you can modify the json serialization and provide your own format (e.g. to preserve the attributes):
SimpleXML and JSON Encode in PHP – Part III and End
It ships with a full blown example, here is an excerpt in code:
$xml = '<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>';
$obj = simplexml_load_string($xml, 'JsonXMLElement');
echo $json = json_encode($obj, JSON_PRETTY_PRINT), "\n";
print_r(json_decode($json, TRUE));
Output of JSON and the array is as following, note that the attributes are part of it:
{
"items": {
"item": [
{
"#attributes": {
"abc": "123"
},
"#text": "item one"
},
{
"#attributes": {
"abc": "456"
},
"#text": "item two"
}
]
}
}
Array
(
[items] => Array
(
[item] => Array
(
[0] => Array
(
[#attributes] => Array
(
[abc] => 123
)
[#text] => item one
)
[1] => Array
(
[#attributes] => Array
(
[abc] => 456
)
[#text] => item two
)
)
)
)
$xml = new SimpleXMLElement($xmlString);
$xml is now an object. To get the value of an attribute:
$xml->something['id'];
Where 'id' is the name of the attribute.
While it's theoretically possible to write a generic conversion from XML to PHP or JSON structures, it is very hard to capture all the subtleties that might be present - the distinction between child elements and attributes, text content alongside attributes (as you have here) or even alongside child elements, multiple child nodes with the same name, whether order of child elements and text nodes is important (e.g. in XHTML or DocBook), etc, etc.
If you have a specific format you need to produce, it will generally be much easier to use an API - like SimpleXML - to loop over the XML and produce the structure you need.
You don't specify the structure you want to achieve, but the general approach given your input would be to loop over each item, and either access known attributes, or loop over each attribute:
$sxml = simplexml_load_string( $xml );
$final_array = array();
foreach ( $sxml->items->item as $xml_item )
{
$formatted_item = array();
// Text content of item
$formatted_item['content'] = (string)$xml_item;
// Specifically get 'abc' attribute
$formatted_item['abc'] = (string)$xml_item['abc'];
// Maybe one of the attributes is an integer
$formatted_item['foo_id'] = (int)$xml_item['foo_id'];
// Or maybe you want to loop over lots of possible attributes
foreach ( $xml_item->attributes() as $attr_name => $attr_value )
{
$formatted_item['attrib:' . $attr_name] = (string)$attr_value;
}
// Add it to a final list
$final_array[] = $formatted_item;
// Or maybe you want that array to be keyed on one of the attributes
$final_array[ (string)$xml_item['key'] ] = $formatted_item;
}
Here is a class I've found that is able to process XML into array very nicely: http://outlandish.com/blog/xml-to-json/ (backup). Converting to json is a matter of a json_encode() call.
Related
We are using SimpleXML to try and convert XML to JSON, and in turn convert to a PHP object, so that we can compare out Soap API with our Rest API. We have a request that returns quite a lot of data, but the part in question is where we have a nested array.
The array is returned with the tag in XML, however we do not want this translated into the JSON.
The XML that we get is as follows:
<apns>
<item>
<apn>apn</apn>
</item>
</apns>
So when it is translated into JSON it looks like this:
{"apns":{"item":{"apn":"apn"}}
In reality, we want SimpleXML to convert to the same JSON as in our Rest API, which looks like the following:
{"apns":[{"apn":"apn"}]}
The array could contain more than one thing, for example:
<apns>
<item>
<apn>apn</apn>
</item>
<item>
<apn>apn2</apn>
</item>
</apns>
Which I'm assuming will just error in JSON or have the first one overwritten.
I'd expect SimpleXML to be able to handle this natively, but if not has anyone got a fix that doesn't involve janky string manipulation?
TIA :)
A generic conversion has no possibility to know that a single element should be an array in JSON.
SimpleXMLElement properties can be treated as an Iterable to traverse sibling with the same name. They can be treated as an list or a single value.
This allows you to build up your own array/object structure and serialize it to JSON.
$xml = <<<'XML'
<apns>
<item>
<apn>apn1</apn>
</item>
<item>
<apn>apn2</apn>
</item>
</apns>
XML;
$apns = new SimpleXMLElement($xml);
$json = [
'apns' => []
];
foreach ($apns->item as $item) {
$json['apns'][] = ['apn' => (string)$item->apn];
}
echo json_encode($json, JSON_PRETTY_PRINT);
This still allows you to read/convert parts in a general way. Take a more in deep look at the SimpleXMLElement class. Here are method to iterate over all children or to get the name of the current node.
I hope this code is useful as a template to what your after, the problem is that it's difficult to know if this is the only instance of what your trying to do...
What this does is first looks for any nodes which have a item/apn structure underneath using XPath (//*[item/apn] says any node //* with the following nodes underneath).
Then it loops through these items and adds new <apn> nodes underneath the start node (the <apns> node in this case) from each <item> with the value ($list->addChild("apn", (string)$item->apn);.
Once the nodes are copied it removes all of the <item> nodes (unset($list->item);).
$input = '<apns>
<item>
<apn>apn</apn>
</item>
<item>
<apn>apn2</apn>
</item>
</apns>';
$xml = simplexml_load_string($input);
$itemList = $xml->xpath("//*[item/apn]");
foreach ( $itemList as $list ) {
foreach ( $list->item as $item ) {
$list->addChild("apn", (string)$item->apn);
}
unset($list->item);
}
echo $xml->asXML();
gives...
<?xml version="1.0"?>
<apns>
<apn>apn</apn><apn>apn2</apn></apns>
and
echo json_encode($xml);
gives...
{"apn":["apn","apn2"]}
If you just want the last value, then you can just keep track of the last value and set the new element outside the inner loop...
$itemList = $xml->xpath("//*[item/apn]");
foreach ( $itemList as $list ) {
foreach ( $list->item as $item ) {
$apn = (string)$item->apn;
}
$list->addChild("apn", $apn);
unset($list->item);
}
I'm exploring XML and PHP, mostly XPath and other parsers.
Here be the xml:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
<items>
<item id="7">Pizza</item>
<item id="8">Cheese</item>
<item id="9">Cane</item>
</items>
</root>
Here be my path & code:
$xml = simplexml_load_file('xpath.xml');
$result = $xml -> xpath('/root/actors');
echo '<pre>'.print_r($result,1).'</pre>';
Now, said path returns:
Array
(
[0] => SimpleXMLElement Object
(
[actor] => Array
(
[0] => Christian Bale
[1] => Liam Neeson
[2] => Michael Caine
)
)
)
Whereas a seemingly similar line of code, which I would have though would result in the singers, doesnt. Meaning:
$result = $xml -> xpath('/root/foo:singers');
Results in:
Array
(
[0] => SimpleXMLElement Object
(
)
)
Now I would've thought the foo: namespace in this case is a non-issue and both paths should result in the same sort of array of singers/actors respectively? How come that is not the case?
Thank-you!
Note: As you can probably gather I'm quite new to xml so please be gentle.
Edit: When I go /root/foo:singers/foo:singer I get results, but not before. Also with just /root I only get actors and items as results, foo:singers are completely omitted.
SimpleXML is, for a number of reasons, simply a bad API.
For most purposes I suggest PHP's DOM extension. (Or for very large documents a combination of it along with XMLReader.)
For using namespaces in xpath you'll want to register those you'd like to use, and the prefix you want to use them with, with your xpath processor.
Example:
$dom = new DOMDocument();
$dom->load('xpath.xml');
$xpath = new DOMXPath($dom);
// The prefix *can* match that used in the document, but it's not necessary.
$xpath->registerNamespace("ns", "http://www.foo.org/");
foreach ($xpath->query("/root/ns:singers") as $node) {
echo $dom->saveXML($node);
}
Output:
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
DOMXPath::query returns a DOMNodeList containing matched nodes. You can work with it essentially the same way you would in any other language with a DOM implementation.
You can use // expression like:
$xml -> xpath( '//foo:singer' );
to select all foo:singer elements no matter where they are.
EDIT:
SimpleXMLElement is selected, you just can't see the child nodes with print_r(). Use SimpleXMLElement methods like SimpleXMLElement::children to access them.
// example 1
$result = $xml->xpath( '/root/foo:singers' );
foreach( $result as $value ) {
print_r( $value->children( 'foo', TRUE ) );
}
// example 2
print_r( $result[0]->children( 'foo', TRUE )->singer );
I am parsing XML strings using simplexml_load_string(), but I noticed that i don't get the name of the very first tag.
For example, I have these two xml strings:
$s = '<?xml version="1.0" encoding="UTF-8"?>
<ParentTypeABC>
<chidren1>
<children2>1000</children2>
</chidren1>
</ParentTypeABC>
';
$t = '<?xml version="1.0" encoding="UTF-8"?>
<ParentTypeDEF>
<chidren1>
<children2>1000</children2>
</chidren1>
</ParentTypeDEF>
';
NOTICE that they are nearly identical, the only difference being that one has the first node as <ParentTypeABC> and the other as <ParentTypeDEF>
then I just convert them to SimpleXML objects:
$o = simplexml_load_string($s);
$p = simplexml_load_string($t);
but then i have two equal objects, none of them having the "top" node's name appearing, either ParentTypeABC or ParentTypeDEF (I examine the objects using print_r()):
// with top node "ParentTypeABC"
SimpleXMLElement Object
(
[chidren1] => SimpleXMLElement Object
(
[children2] => 1000
)
)
// with top node "ParentTypeDEF"
SimpleXMLElement Object
(
[chidren1] => SimpleXMLElement Object
(
[children2] => 1000
)
)
So how I am supposed to know the top node's name? If I parse unknown XMLs and I need to know what's the top node name, what can I do?
Is there an option in simplexml_load_string() I could use?
I know there are MANY ways to parse XML's with PHP, but I'd like it to be as simple as posible, and to get a simple object or array I could navigate easily.
I made a simple example here to fiddle with.
SimpleXML has a getName() method.
echo $xml->getName();
This should return the name of the respective node, no matter if root or not.
http://php.net/manual/en/simplexmlelement.getname.php
I've come across a weird but apparently valid XML string that I'm being returned by an API. I've been parsing XML with SimpleXML because it's really easy to pass it to a function and convert it into a handy array.
The following is parsed incorrectly by SimpleXML:
<?xml version="1.0" standalone="yes"?>
<Response>
<CustomsID>010912-1
<IsApproved>NO</IsApproved>
<ErrorMsg>Electronic refunds...</ErrorMsg>
</CustomsID>
</Response>
Simple XML results in:
SimpleXMLElement Object ( [CustomsID] => 010912-1 )
Is there a way to parse this in XML? Or another XML library that returns an object that reflects the XML structure?
That is an odd response with the text along with other nodes. If you manually traverse it (not as an array, but as an object) you should be able to get inside:
<?php
$xml = '<?xml version="1.0" standalone="yes"?>
<Response>
<CustomsID>010912-1
<IsApproved>NO</IsApproved>
<ErrorMsg>Electronic refunds...</ErrorMsg>
</CustomsID>
</Response>';
$sObj = new SimpleXMLElement( $xml );
var_dump( $sObj->CustomsID );
exit;
?>
Results in second object:
object(SimpleXMLElement)#2 (2) {
["IsApproved"]=>
string(2) "NO"
["ErrorMsg"]=>
string(21) "Electronic refunds..."
}
You already parse the XML with SimpleXML. I guess you want to parse it into a handy array which you not further define.
The problem with the XML you have is that it's structure is not very distinct. In case it does not change much, you can convert it into an array using a SimpleXMLIterator instead of a SimpleXMLElement:
$it = new SimpleXMLIterator($xml);
$mode = RecursiveIteratorIterator::SELF_FIRST;
$rit = new RecursiveIteratorIterator($it, $mode);
$array = array_map('trim', iterator_to_array($rit));
print_r($array);
For the XML-string in question this gives:
Array
(
[CustomsID] => 010912-1
[IsApproved] => NO
[ErrorMsg] => Electronic refunds...
)
See as well the online demo and How to parse and process HTML/XML with PHP?.
I have an xml file
<?xml version="1.0" encoding="utf-8"?>
<xml>
<events date="01-10-2009" color="0x99CC00" selected="true">
<event>
<title>You can use HTML and CSS</title>
<description><![CDATA[This is the description ]]></description>
</event>
</events>
</xml>
I used xpath and and xquery for parsing the xml.
$xml_str = file_get_contents('xmlfile');
$xml = simplexml_load_string($xml_str);
if(!empty($xml))
{
$nodes = $xml->xpath('//xml/events');
}
i am getting the title properly, but iam not getting description.How i can get data inside
the cdata
SimpleXML has a bit of a problem with CDATA, so use:
$xml = simplexml_load_file('xmlfile', 'SimpleXMLElement', LIBXML_NOCDATA);
if(!empty($xml))
{
$nodes = $xml->xpath('//xml/events');
}
print_r( $nodes );
This will give you:
Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[date] => 01-10-2009
[color] => 0x99CC00
[selected] => true
)
[event] => SimpleXMLElement Object
(
[title] => You can use HTML and CSS
[description] => This is the description
)
)
)
You are probably being misled into thinking that the CDATA is missing by using print_r or one of the other "normal" PHP debugging functions. These cannot see the full content of a SimpleXML object, as it is not a "real" PHP object.
If you run echo $nodes[0]->Description, you'll find your CDATA comes out fine. What's happening is that PHP knows that echo expects a string, so asks SimpleXML for one; SimpleXML responds with all the string content, including CDATA.
To get at the full string content reliably, simply tell PHP that what you want is a string using the (string) cast operator, e.g. $description = (string)$nodes[0]->Description.
To debug SimpleXML objects and not be fooled by quirks like this, use a dedicated debugging function such as one of these: https://github.com/IMSoP/simplexml_debug
This could also be another viable option, which would remove that code and make life a little easier.
$xml = str_replace("<![CDATA[", "", $xml);
$xml = str_replace("]]>", "", $xml);