Resolve namespaces with SimpleXML regardless of structure or namespace - php

I got a Google Shopping feed like this (extract):
<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
...
<g:id><![CDATA[Blah]]></g:id>
<title><![CDATA[Blah]]></title>
<description><![CDATA[Blah]]></description>
<g:product_type><![CDATA[Blah]]></g:product_type>
Now, SimpleXML can read the "title" and "description" tags but it can't read the tags with "g:" prefix.
There are solutions on stackoverflow for this specific case, using the "children" function.
But I don't only want to read Google Shopping XMLs, I need it to be undependend from structure or namespace, I don't know anything about the file (I recursively loop through the nodes as an multidimensional array).
Is there a way to do it with SimpleXML? I could replace the colons, but I want to be able to store the array and reassemble the XML (in this case specifically for Google Shopping) so I do not want to lose information.

You want to use SimpleXMLElement to extract data from XML and convert it into an array.
This is generally possible but comes with some caveats. Before XML Namespaces your XML comes with CDATA. For XML to array conversion with Simplexml you need to convert CDATA to text when you load the XML string. This is done with the LIBXML_NOCDATA flag. Example:
$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA);
print_r($xml); // print_r shows how SimpleXMLElement does array conversion
This gives you the following output:
SimpleXMLElement Object
(
[#attributes] => Array
(
[version] => 2.0
)
[title] => Blah
[description] => Blah
)
As you can already see, there is no nice form to present the attributes in an array, therefore Simplexml by convention puts these into the #attributes key.
The other problem you have is to handle those multiple XML namespaces. In the previous example no specific namespace was used. That is the default namespace. When you convert a SimpleXMLElement to an array, the namespace of the SimpleXMLElement is used. As none was explicitly specified, the default namespace has been taken.
But if you specify a namespace when you create the array, that namespace is taken.
Example:
$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA, "http://base.google.com/ns/1.0");
print_r($xml);
This gives you the following output:
SimpleXMLElement Object
(
[id] => Blah
[product_type] => Blah
)
As you can see, this time the namespace that has been specified when the SimpleXMLElement was created is used in the array conversion: http://base.google.com/ns/1.0.
As you write you want to take all namespaces from the document into account, you need to obtain those first - including the default one:
$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA);
$namespaces = [null] + $xml->getDocNamespaces(true);
Then you can iterate over all namespaces and recursively merge them into the same array shown below:
$array = [];
foreach ($namespaces as $namespace) {
$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA, $namespace);
$array = array_merge_recursive($array, (array) $xml);
}
print_r($array);
This then finally should create and output the array of your choice:
Array
(
[#attributes] => Array
(
[version] => 2.0
)
[title] => Blah
[description] => Blah
[id] => Blah
[product_type] => Blah
)
As you can see, this is perfectly possible with SimpleXMLElement. However it's important you understand how SimpleXMLElement converts into an array (or serializes to JSON which does follow the same rules). To simulate the SimpleXMLElement-to-array conversion, you can make use of print_r for a quick output.
Note that not all XML constructs can be equally well converted into an array. That's not specifically a limitation of Simplexml but lies in the nature of which structures XML can represent and which structures an array can represent.
Therefore it is most often better to keep the XML inside an object like SimpleXMLElement (or DOMDocument) to access and deal with the data - and not with an array.
However it's perfectly fine to convert data into an array as long as you know what you do and you don't need to write much code to access members deeper down the tree in the structure. Otherwise SimpleXMLElement is to be favored over an array because it allows dedicated access not only to many of the XML feature but also querying like a database with the SimpleXMLElement::xpath method. You would need to write many lines of own code to access data inside the XML tree that comfortable on an array.
To get the best of both worlds, you can extend SimpleXMLElement for your specific conversion needs:
$buffer = <<<BUFFER
<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
...
<g:id><![CDATA[Blah]]></g:id>
<title><![CDATA[Blah]]></title>
<description><![CDATA[Blah]]></description>
<g:product_type><![CDATA[Blah]]></g:product_type>
</rss>
BUFFER;
$feed = new Feed($buffer, LIBXML_NOCDATA);
print_r($feed->toArray());
Which does output:
Array
(
[#attributes] => stdClass Object
(
[version] => 2.0
)
[title] => Blah
[description] => Blah
[id] => Blah
[product_type] => Blah
[#text] => ...
)
For the underlying implementation:
class Feed extends SimpleXMLElement implements JsonSerializable
{
public function jsonSerialize()
{
$array = array();
// json encode attributes if any.
if ($attributes = $this->attributes()) {
$array['#attributes'] = iterator_to_array($attributes);
}
$namespaces = [null] + $this->getDocNamespaces(true);
// json encode child elements if any. group on duplicate names as an array.
foreach ($namespaces as $namespace) {
foreach ($this->children($namespace) as $name => $element) {
if (isset($array[$name])) {
if (!is_array($array[$name])) {
$array[$name] = [$array[$name]];
}
$array[$name][] = $element;
} else {
$array[$name] = $element;
}
}
}
// json encode non-whitespace element simplexml text values.
$text = trim($this);
if (strlen($text)) {
if ($array) {
$array['#text'] = $text;
} else {
$array = $text;
}
}
// return empty elements as NULL (self-closing or empty tags)
if (!$array) {
$array = NULL;
}
return $array;
}
public function toArray() {
return (array) json_decode(json_encode($this));
}
}
Which is an adoption with namespaces of the Changing JSON Encoding Rules example given in SimpleXML and JSON Encode in PHP – Part III and End.

The answer given by hakre was well written and exactly what I was looking for, especially the Feed class he provided at the end. But it was incomplete in a couple of ways, so I modified his class to be more generically useful and wanted to share the changes:
One of the most important issues which was missed in the original is that Attributes may also have namespaces, and without taking that into account, you are quite likely to miss attributes on elements.
The other bit which is important is that when converting to an array, if you have something which may contain elements of the same name but different namespaces, there is no way to tell which namespace the element was from. (Yes, it's a really rare situation... but I ran into it with a government standard based on NIEM...) So I added a static option which will cause the namespace prefix to be added to all keys in the final array that belong to a namespace. To use it, set
Feed::$withPrefix = true; before calling toArray()
Finally, more for my own preferences, I added an option to toArray() to return the final array as associative instead of using objects.
Here's the updated class:
class Feed extends \SimpleXMLElement implements \JsonSerializable
{
public static $withPrefix = false;
public function jsonSerialize()
{
$array = array();
$attributes = array();
$namespaces = [null] + $this->getDocNamespaces(true);
// json encode child elements if any. group on duplicate names as an array.
foreach ($namespaces as $prefix => $namespace) {
foreach ($this->attributes($namespace) as $name => $attribute) {
if (static::$withPrefix && !empty($namespace)) {
$name = $prefix . ":" . $name;
}
$attributes[$name] = $attribute;
}
foreach ($this->children($namespace) as $name => $element) {
if (static::$withPrefix && !empty($namespace)) {
$name = $prefix . ":" . $name;
}
if (isset($array[$name])) {
if (!is_array($array[$name])) {
$array[$name] = [$array[$name]];
}
$array[$name][] = $element;
} else {
$array[$name] = $element;
}
}
}
if (!empty($attributes)) {
$array['#attributes'] = $attributes;
}
// json encode non-whitespace element simplexml text values.
$text = trim($this);
if (strlen($text)) {
if ($array) {
$array['#text'] = $text;
} else {
$array = $text;
}
}
// return empty elements as NULL (self-closing or empty tags)
if (!$array) {
$array = NULL;
}
return $array;
}
public function toArray($assoc=false) {
return (array) json_decode(json_encode($this), $assoc);
}
}

Related

In the second foreach loop strpos not working proberly [duplicate]

Let's say I have some XML like this
<channel>
<item>
<title>This is title 1</title>
</item>
</channel>
The code below does what I want in that it outputs the title as a string
$xml = simplexml_load_string($xmlstring);
echo $xml->channel->item->title;
Here's my problem. The code below doesn't treat the title as a string in that context so I end up with a SimpleXML object in the array instead of a string.
$foo = array( $xml->channel->item->title );
I've been working around it like this
$foo = array( sprintf("%s",$xml->channel->item->title) );
but that seems ugly.
What's the best way to force a SimpleXML object to a string, regardless of context?
Typecast the SimpleXMLObject to a string:
$foo = array( (string) $xml->channel->item->title );
The above code internally calls __toString() on the SimpleXMLObject. This method is not publicly available, as it interferes with the mapping scheme of the SimpleXMLObject, but it can still be invoked in the above manner.
You can use the PHP function
strval();
This function returns the string values of the parameter passed to it.
There is native SimpleXML method SimpleXMLElement::asXML
Depending on parameter it writes SimpleXMLElement to xml 1.0 file or just to a string:
$xml = new SimpleXMLElement($string);
$validfilename = '/temp/mylist.xml';
$xml->asXML($validfilename); // to a file
echo $xml->asXML(); // to a string
Another ugly way to do it:
$foo = array( $xml->channel->item->title."" );
It works, but it's not pretty.
The accepted answer actually returns an array containing a string, which isn't exactly what OP requested (a string).
To expand on that answer, use:
$foo = [ (string) $xml->channel->item->title ][0];
Which returns the single element of the array, a string.
To get XML data into a php array you do this:
// this gets all the outer levels into an associative php array
$header = array();
foreach($xml->children() as $child)
{
$header[$child->getName()] = sprintf("%s", $child);
}
echo "<pre>\n";
print_r($header);
echo "</pre>";
To get a childs child then just do this:
$data = array();
foreach($xml->data->children() as $child)
{
$header[$child->getName()] = sprintf("%s", $child);
}
echo "<pre>\n";
print_r($data);
echo "</pre>";
You can expand $xml-> through each level until you get what you want
You can also put all the nodes into one array without the levels or
just about any other way you want it.
Not sure if they changed the visibility of the __toString() method since the accepted answer was written but at this time it works fine for me:
var_dump($xml->channel->item->title->__toString());
OUTPUT:
string(15) "This is title 1"
Try strval($xml->channel->item->title)
There is native SimpleXML method SimpleXMLElement::asXML Depending on parameter it writes SimpleXMLElement to xml 1.0 file, Yes
$get_file= read file from path;
$itrate1=$get_file->node;
$html = $itrate1->richcontent->html;
echo $itrate1->richcontent->html->body->asXML();
print_r((string) $itrate1->richcontent->html->body->asXML());
Just put the ''. before any variable, it will convert into string.
$foo = array( ''. $xml->channel->item->title );
The following is a recursive function that will typecast all single-child elements to a String:
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// FUNCTION - CLEAN SIMPLE XML OBJECT
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
function cleanSimpleXML($xmlObject = ''){
// LOOP CHILDREN
foreach ($xmlObject->children() as $child) {
// IF CONTAINS MULTIPLE CHILDREN
if(count($child->children()) > 1 ){
// RECURSE
$child = cleanSimpleXML($child);
}else{
// CAST
$child = (string)$child;
}
}
// RETURN CLEAN OBJECT
return $xmlObject;
} // END FUNCTION

php simplexmlelement directly access element

I have a simple XML structure like, that when parse with simplexml_load_string generates this:
SimpleXMLElement Object
(
[#attributes] => Array
(
[token] => rs2rglql9c8ztem
)
[attachments] => SimpleXMLElement Object
(
[attachment] => 112979696
)
)
the XML structure:
<uploads token="vwl3u75llktsdzi">
<attachments>
<attachment>123456789</attachment>
</attachments>
</uploads>
I can get to the only actually important value "123456789" through iteration but that is a faf. Is there a way I can access it directly, ideally using the names of the elements.
I need to able to get attributes to ideally.
The simplest way to store the textual node value of a SimpleXMLElement in its own variable is to cast the element to a string:
$xml = simplexml_load_string($str);
$var = (string) $xml->attachments->attachment;
echo $var;
UPDATE
In accordance with the further question in your comment, the SimpleXMLElement::attributesdocs method will also return a SimpleXMLElement object which can be accessed in the same manner as the above solution. Consider:
$str = '<uploads token="vwl3u75llktsdzi">
<attachments>
<attachment myattr="attribute value">123456789</attachment>
</attachments>
</uploads>';
$xml = simplexml_load_string($str);
$attr = (string) $xml->attachments->attachment->attributes()->myattr;
echo $attr; // outputs: attribute value
Yes you can through the {} sytax here it goes:
$xmlString = '<uploads token="vwl3u75llktsdzi">
<attachments>
<attachment myattr="attribute value">123456789</attachment>
</attachments>
</uploads>';
$xml = simplexml_load_string($xmlString);
echo "attachment attribute: " . $xml->{"attachments"}->attributes()["myattr"] " \n";
echo " uploads attribute: " . $xml->{"uploads"}->attributes()["token"] . "\n";
you can replace "attachments" with $myVar or something
remember attributes() returns an associative array so you can get the keys with php array_keys() or do a foreach cycle.

PHP Array generated from Xpath Query - Trying to extract data

I am currently attempting to extract values from an array that was generated by an SIMPLEXML/XPATH Query. I have had no success with my attempts if anyone can take look would be greatly appreciated.
I am looking just to extract ReclaimDate. I have tried a few functions and some of the advice on this post with no luck.
Array
(
[0] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[ReclaimDate] => 05/15/2008
[ReclaimPrice] => 555555555
[_Owner] => ownername
)
)
)
If I just had to take a stab, I'd agree that what #Frank Farmer said should work:
// if $myVar is what you print_r'ed
echo (string)$myVar[0][0]['ReclaimDate'];
or this
echo (string)$myVar[0][0]->attributes('ReclaimDate');
http://www.php.net/manual/en/simplexml.examples-basic.php#example-4587
This is resolved thanks to Frank Farmer and Dan Beam.
This worked : echo (string)$check_reo[0][0]['ReclaimDate']
For anyone that is looking to use SimpleXML and XPATH to extract and write some basic logic from an XML file this is what worked for me.
$xmlstring = <<<XML <?xml version='1.0' standalone='yes'?> <YOURXMLGOESHERE>TEST</YOURXMLGOESHERE> XML;
$xpathcount = simplexml_load_string($xmlstring); // Load XML for XPATH Node Counts
$doc = new DOMDocument(); // Create new DOM Instance for Parsing
$xpathcountstr = $xpathcount->asXML(); // Xpath Query
$doc->loadXML($xpathcountstr); // Load Query Results
$xpathquery = array($xpathcount->xpath("//XMLNODEA[1]/XMLNODEB/*[name()='KEYWORDTOCHECKIFXMLCEXISTS']"));
print_r ($xpathquery) // CHECK Array that is returned from the XPATH query
`Array
(
[0] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[ReclaimDate] => 05/15/2008
[ReclaimPrice] => 555555555
[_Owner] => ownername
)
)
) // Array RETURNED`
echo (string)$xpathquery[0][0]['ReclaimDate'] // EXTRACT THE VALUE FROM THE ARRAY COLUMN;
This site helped me receive a better understanding on how XPATH can search XML very easily with a lot more features than what I had previously known.
http://zvon.org/xxl/XPathTutorial/Output/examples.html
Here is the simple XML Function that worked for me
$xmlread = simplexml_load_string($xmlstring, "simple_xml_extended");
`class simple_xml_extended extends SimpleXMLElement { // Read XML and get attribute
public function Attribute($name){
foreach($this->Attributes() as $key=>$val) {
if($key == $name)
return (string)$val;
}
}
}`
Here is the Function action when extracting Single Values with an attribute based XML results
$GETVAR1 = $xmlread->XMLNODE1->XMLNODE2->XMLNODE3->XMLNODE4->XMLNODE5[0]->Attribute('XMLNODE5ATTRIBUTE');
This might not be the most efficient or best method, but its what ended working our for me. Hope this helps someone out who is still unclear about SIMPLEXML and XPATH.
An additional link for further insight: http://www.phpfreaks.com/tutorial/handling-xml-data

Why am I getting an array of SimpleXMLElement Objects here?

I have some code that pulls HTML from an external source:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$xml = #simplexml_import_dom($doc); // just to make xpath more simple
$images = $xml->xpath('//img');
$sources = array();
Then, if I add all of the sources with this code:
foreach ($images as $i) {
array_push($sources, $i['src']);
}
echo "<pre>";
print_r($sources);
die();
I get this result:
Array
(
[0] => SimpleXMLElement Object
(
[0] => /images/someimage.gif
)
[1] => SimpleXMLElement Object
(
[0] => /images/en/someother.jpg
)
....
)
But when I use this code:
foreach ($images as $i) {
$sources[] = (string)$i['src'];
}
I get this result (which is what is desired):
Array
(
[0] => /images/someimage.gif
[1] => /images/en/someother.jpg
...
)
What is causing this difference?
What is so different about array_push()?
Thanks,
EDIT: While I realize the answers match what I am asking (I've awarded), I more wanted to know why whether using array_push or other notation adds the SimpleXMLElement Object and not a string when both arent casted. I knew when explicitly casting to a string I'd get a string. See follow up question here:Why aren't these values being added to my array as strings?
The difference is not caused by array_push() -- but by the type-cast you are using in the second case.
In your first loop, you are using :
array_push($sources, $i['src']);
Which means you are adding SimpleXMLElement objects to your array.
While, in the second loop, you are using :
$sources[] = (string)$i['src'];
Which means (thanks to the cast to string), that you are adding strings to your array -- and not SimpleXMLElement objects anymore.
As a reference : relevant section of the manual : Type Casting.
Sorry, just noticed better answers above, but the regex itself is still valid.
Are you trying to get all images in HTML markup?
I know you are using PHP, but you can convert use this C# example of where to go:
List<string> links = new List<string>();
if (!string.IsNullOrEmpty(htmlSource))
{
string regexImgSrc = #"<img[^>]*?src\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";
MatchCollection matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
foreach (Match m in matchesImgSrc)
{
string href = m.Groups[1].Value;
links.Add(href);
}
}
In your first example, you should:
array_push($sources, (string) $i['src']);
Your second example gives an array of strings because you are converting the SimpleXMLElements to strings using the (string) cast. In your first example you are not, so you get an array of SimpleXMLElements instead.

Referencing a dynamic associative array position

I'm requesting data from an online source which I then decode into json StdClass objects (using php). Once I've done this I have the following (see below). I'm trying to extract the elements in 'otherstuff' by doing echo $response->stuff->WHAT GOES HERE?->otherstuff
However I cant hard code the [2010-12] because its a date, is there any way I can call e.g. $response->stuff->nextsibling->stuff
I hope this makes sense to someone :D Currently i'm bastardising this with a $key => $value for loop and extracting the key value and using it in my $response->stuff->$key->stuff call.
stdClass Object
(
[commentary] =>
[stuff] => stdClass Object
(
**[2010-12]** => stdClass Object
(
[otherstuff] => stdClass Object
(
[otherstuffrate] => 1
[otherstufflevel] => 1
[otherstufftotal] => 1
)
)
)
)
StdClass instances can be used with some Array Functions, among them
current — Return the current element in an array and
key — Fetch a key from an array
So you can do (codepad)
$obj = new StdClass;
$obj->{"2012-10"} = 'foo';
echo current($obj); // foo
echo key($obj); // 2012-10
On a sidenote, object properties should not start with a number and they may not contain dashes, so instead of working with StdClass objects, pass in TRUE as the second argument to json_decode. Returned objects will be converted into associative arrays then.
The date key must be a string, otherwise PHP breaks ;).
echo $response->stuff['2010-12']->otherstuff
Retrieve it using a string.
Edited again: added object code also
json decode it as associative array, and use key fetched through array_keys . See it work here : http://codepad.org/X8HCubIO
<?php
$str = '{
"commentary" : null,
"stuff" : {
"ANYDATE" : {
"otherstuff": {
"otherstuffrate" : 1,
"otherstufflevel" : 1,
"otherstufftotal" : 1
}
}
}
}';
$obj = json_decode($str,true);
$reqKey = array_keys($obj["stuff"]);
$req = $obj["stuff"][$reqKey[0]]["otherstuff"];
print_r($req);
print "====================as object ============\n";
$obj = json_decode($str);
$req = current($obj->stuff)->otherstuff;
print_r($req);
?>

Categories