Trouble with SimpleXMLElement Namespaces - php

I'm having trouble parsing XML with Namespaces using SimpleXMLElement.
I've tried using looping through the xml and also tried using xpath without success.
$data_url="http://isni.oclc.nl/sru/0000000123121970?query=pica.isn+%3D+%220000000123121970%22&version=1.1&operation=searchRetrieve&stylesheet=http%3A%2F%2Fisni.oclc.nl%2Fsru%2FDB%3D1.2%2F%3Fxsl%3DsearchRetrieveResponse&recordSchema=isni-b&maximumRecords=10&startRecord=1&recordPacking=xml&sortKeys=none&x-info-5-mg-requestGroupings=none";
$data = file_get_contents($data_url);
$xml = simplexml_load_string($data);
$org_names = $xml->children('srw', true)->records->children('srw', true)->record->children('srw', true)->recordData->responseRecord->isniassigned->isnimetadata->identity->organisation->organisationnamevariant->mainname;
foreach($org_names as $a)
{
echo "a: $a\n";
}
I'm expecting to get a list of organisationnamevariant->mainname items:
Academia lugduno-batava
Leiden university
Leidse universiteit
etc.
However, I'm getting this error: Trying to get property of non-object

Having such a deep hierarchy is difficult to navigate using the normal -> structure, but you also have to be careful when changing namespace. You only need to do the ->children('srw', true) once and then all of the child nodes will be for that namespace. BUT you also have to switch back at <responseRecord> by using ->children().
You also need to be careful that you use the proper case for each tag name...
$org_names = $xml->children('srw', true)->records->record->recordData->children()->
responseRecord->ISNIAssigned->ISNIMetadata->identity->organisation->
organisationNameVariant->mainName;
echo (string)$org_names;
An alternative is to use XPath (as xpath() returns a list of matches, I use [0] to only use the first one)...
$org_names = $xml->xpath("//organisationNameVariant/mainName");
echo (string)$org_names[0];
I know that echo casts the value to a string, but if you use this in any other scenario, you may end up with a SimpleXMLElement instead, so I tend to add the case to string in just to make the point.

Related

PHP SimpleXMLElement: can I get innertext with saveXML(), or outerHTML with (string)?

I have an application where the user writes XPath queries to use as source data from a given document. Sometimes they need just the contents of an element, sometimes they need the whole element itself. To my understanding they should be able to specify either text() or node() at the end of their query to choose which behavior.
But it seems like the way I get a string out of the SimpleXMLElement determines the behavior, regardless of the query.
When I cast the query to (string), it ALWAYS only returns inner XML.
(string) $xml->xpath('//document/head/Keywords')[0] ===
(string) $xml->xpath('//document/head/Keywords/node()')[0] ===
(string) $xml->xpath('//document/head/Keywords/text()')[0] ===
'17';
If I use ->saveXML(), it ALWAYS returns the entire tag.
$xml->xpath('//document/head/Keywords')[0]->asXML() ===
$xml->xpath('//document/head/Keywords/node()')[0]->asXML() ===
$xml->xpath('//document/head/Keywords/text()')[0]->asXML() ===
'<Keywords topic="611x27keqj">17</Keywords>';
Is there a single way that I can get a string, which allows my users to specify inner vs outer XML as a part of their XPath query?
The SimpleXML xpath() method always returns SimpleXMLElement objects representing either an element or an attribute, never text. The methods you show in the question are the correct way to use that object to get text content or full XML.
If you want richer (but less simple) XPath functionality, you will have to use the DOM, and specifically the DOMXPath class. Note that you can freely mix SimpleXML and DOM using simplexml_import_dom and dom_import_simplexml; the internal representation is the same, so you can switch between the two "wrappers" with minimal cost.

How does a fluid interface return both $this and a value?

I'm relatively new to OOP so browsing through the documentation of Simple HTML DOM I was wondering how its methods use both method chaining and the regular behaviour of returning a value/object.
For example I can do:
$html = new simple_html_dom();
$html -> find('something'); // Returns object or array of objects
but I can also do:
$html -> find('something') -> find('something_else');
which, if I understand method chaining properly, implies that find() returns $this i.e. itself.
Also it's my understanding using method chaining you return $this, after which you use a getter method to actually return a value that you can use/want.
For example:
$object -> add(1) -> add(2) -> getNumber();
What am I missing here?
Thanks in advance!
Actually, it does not make sense to call find on the return value of find since that return value is supposed to be an array.
You could eventually add a second parameter to your find(string, array&) so you could do :
$result1 = [];
$resilt2 = [];
$html -> find('something', $result1)
-> fint('something else', $result2);
var_dump($result1, $result2);
With :
public function find($search, & $output) {
$output = ...
return $this;
}
Depends on your taste.
Chain calls are only usefull if you want to call multiple methods (that return nothing in particular) on the same object to avoid rewriting the variable name each time and thus making a shorter code.
EDIT:
If you want to do something like :
" $html -> find('div#results') -> find('li a'); "
You have a problem because $html respresents a DOM while the value returned by find is not a DOM but a set of results. Your find function could eventually create and return a smaller DOM with #result as a root (this would then be an object of the same class as $html and not an array), and then calling find would perform a search on that new root instead of the document's root, but that looks a bit unintuitive.
The kind of chaining you are referring to there involves returning not $this, but a new object representing the data found. The trick is to make that object usable as though it were an array or scalar, using "magic methods" and pre-defined interfaces.
The SimpleXML extension makes extensive use of this concept so that every object can simultaneously be used in multiple ways:
__toString(), so that casting to a string, or using in an unambiguous string context like echo gives you the text content of an XML node
ArrayAccess so that you can use $a['href'] to access attributes, and $li[42] to access one of multiple matching nodes
Iterator so that you can foreach over multiple matches
__get() and __set() to search for and over-write child elements by tag name
(Actually, SimpleXML is a binary extension, so cheats a bit, but most of its functionality could theoretically be implemented in plain PHP using the above.)
So for instance this statement:
echo $simplexml_element->foo[42]->bar['baz'];
appears to contain arrays, hashes, and strings, but is actually a whole chain of object calls, something like this:
echo $simplexml_element->__get('foo')->offsetGet(42)->__get('bar')->offsetGet('baz')->__toString();

Reading values of a multi-dimensional array populated by xpath

I am a bit of a newbie to PHP and I'm not sure what I'm missing here. I have an multidimensional array that I've created from an XML file using XPath. I'm able to move through the array and retrieve most all values but I am getting stuck on one section.
Example of XML structure:
MasterNode
SubNodeItem1
SubNodeItem2
SubNodeItem3
SubNodeItemList
SubListItem
SubItemProperty1
SubItemProperty2
SubItemProperty3
SubItemList
SubItemProperty1
SubItemProperty2
SubItemProperty3
SubNodeItem4
SubNodeItem5
I am able to retrieve the value of any of the SubNode values by using the following syntax:
$val=$XML[$i]->SubNodeItem1;
however, I can not for the life of me figure out how to retrieve the values of SubListItemProperty.
I figured this would be the logical syntax:
$SubItemPropVal=$XML[$i]->SubNodeItemList->SubListItem[$i]->SubItemProperty1;
I have searched other forums and topics related to PHP multi arrays and have not been able to find the proper way to do this.
I am getting a "Trying to get property of non-object" error when I run the code. I'm pretty sure that's the indication that I'm not pointing the node correctly.
My recommendation would be to keep the XML file, which apparently works fine already, and use it.
Transferring its elements into an array does not make much sense to me.
EDIT: The OP does not actually use an array, but a SimpleXML object.
XPath is extremely flexible and powerful in selecting the needed bits from an XML document:
$doc = new DOMDocument();
$doc->loadXML($your_xml);
$xp = new DOMXPath($doc);
// for example
$result = $xp->query("//SubListItem[2]/SubItemProperty1");
if ($result->length)
{
echo $result->item(0)->textContent;
}
SimpleXML would also work:
$xml = simplexml_load_string($result);
// either this ($node will be an array of matches, or FALSE)
$node = $xml->xpath("//SubNodeItemList/SubListItem[1]/SubItemProperty1");
// or this (unless you add a number, [0] will be assumed)
$node = $xml->SubNodeItemList->SubListItem->SubItemProperty1;
Important: Array notation counts from 0, while XPath always counts from 1.
Note that the second option (array notation) will throw run-time errors when the structure of the document is not what your code expects.
With XPath there would simply be no return value, which is easier to handle (no try/catch block necessary, an if ($node) { ... } suffices).
Also note that with SimpleXML, the document element (<MasterNode>) becomes the document. So you would not use $xml->MasterNode->SubNodeItemList, but $xml->SubNodeItemList.

PHP node not passing by reference

I have a bunch of dom manipulation functions within a class.
One of those functions assigns unique ids to specific nodes.
$resource_info_node->setAttribute('id', 'resource_'.$this->ids);
$details['id'] = 'resource_'.$this->ids;
$details['node'] = $resource_info_node;
$this->resource_nodes['resource_'.$this->ids] = $details;
$this->ids += 1;
later I want to look up and modify those nodes.
I have tried :
$current_node = $this->resource_nodes[$id]['node'];
When I print_r() I find that this node is a duplicate of the original node.
It has the original node's attributes but is not a part of the DOM tree.
I get the same results with :
$this->content->getElementById($id);
I suppose I based this whole thing on storing node references in an array. I thought that was a fine thing to do. Even if not, after that using getElementByID() should have returned the node within the dom.
I thought that, in PHP all objects were passed by reference. Including DOM nodes.
Any ideas on how I can test what is actually going on.
EDIT :
Well I used :
$this->xpath->query('//*[#id]');
That returned the right number of items with ids. The node is just not in the DOM tree when I edit it.
and
$current_node = &$this->resource_nodes[$id]['node'];
Using the reference syntax had no affect.
The strangest part is that get elementById() is not returning a node in the dom. It has all the right attributes except no parentNode.
FIX - not answer :
I just used xpath instead of my reference or getElementById().
Use reference explicity:
$current_node = &$this->resource_nodes[$id]['node'];
And modify $current_node

Some beginner questions about PHP SimpleXML and xpath

I am learning PHP SimpleXML and I have some questions.
I have been playing to get code from a web in the intranet of my work. I need generic code whenever its possible, since the code could change at any time.
In my example I select a div tag and all its children.
...
<div class="cabTabs">
<ul>
<li>Info1</li>
<li>Info2</li>
<li>Info3</li>
</ul>
</div>
...
//Get all web content:
$b = new sfWebBrowser(); //using symfony 1.4.17 sfWebBrower to get a SimpleXML object.
$b->get('http://intranetwebexample'); //returns a sfWebBrower object.
$xml = $b->getResponseXML(); //returns a SimpleXMLElement
//[Eclipse xdebug Watch - $xml]
"$xml" SimpleXMLElement
#attributes Array [3]
head SimpleXMLElement
body SimpleXMLElement
//Get the div class="cabTabs".
$result = $xml->xpath('//descendant::div[#class="cabTabs"]');
//[Eclipse xdebug Watch - $result]
"$result" Array [1]
0 SimpleXMLElement
#attributes Array [1]
class cabTabs
ul SimpleXMLElement
li Array [6]
Questions:
The use of descendant:: prefix:
I have read in other stackoverflow topics that descendant:: prefix is not recommended.
In order to select a tag, and all its content, what should be the right way to do it?
Im using the above code, but dont know if its the right way to do it.
Some questions checking the Eclipse xdebug variable Watch:
2.1 Some times I cant expand the SimpleXML tree more than one or levels. In the example above, I cant access/see the below "li" node, and see its children.
Could it be a limitation of xdebug debugger with SimpleXML objects or maybe a limitation of the Eclipse Watch?
I can perfectly expand/see the "li" node when I access its parent with the usual loop: foreach($ul->li as $li).
However its not a critical bug, I think it would be perfect to see it directly and report it in the proper forum.
2.2 I dont understant at all the result code of the $xml->xpath:
If we take a look at the Eclipse Watch, the "div" tag has been converted to a 0 index key, but the "ul" and "li" tags had their original names, why?
2.3 How to access/loop xpath content with a generic code:
Im using the following Non generic code to access it:
foreach ($result as $record) {
foreach($record->ul as $ul) {
foreach($ul->li as $li) {
foreach($li->a as $a) {
echo ' ' . $a->name;
}
}
}
}
The above code works but only if we write the right tag names. (->ul, ->li, ->a..)
What is the generic way to loop through all its content without having to specify the children name each time? (->ul, ->li, ->a..)
Also I would prefer not having to convert it to an array, unless its the right way.
I have been trying with children() property, but it doesnt work, it stops and crashes in that line: foreach ($result->children() as $ul)
Thank you a lot in advance for taking your time to read my questions. Any help is really welcome :)
System info:
symfony 1.4.17 with sfWebBrowserPlugin, cURL dadapter.
PHP 5.4.0 with cURL support enabled, cURL Information 7.24.0
I dont know I've never used it myself
dont know i usually use Zend Debug - but i dont understand your question anyway... i think you left out some words :-)
2.1 PRobably xdebug/eclipse. Id check preferences theres probably a setting to limit the amount of recursion to help manage memory.
2.2 SimpleXML::xpath Always returns an array of matched Nodes. Thats why you have integer index array as your result. So if you do //someelement you get an array of all someelement tags. You can then access their descendents in the normal fashion like $someelement->itschildelement.
2.3 $result->children() is a good way to get at things in a generic sense. If Xdebug is crashing thats just xdebug. Either turn it off, ignore it, or find a different debugger :-) Xdebug is jsut a tool but shouldnt dictate how you implement things.
I think now I perfectly understand problem 2.2 and 2.3.
Since its xpath is returning an Array[1], as you explained, and not a SimpleXML object, I cant never use $result->children() because a php array doesnt have the children() property hehe. (Im a bit idiot lol).
The solution is simple, as you have explained, counting the number of elements of the array, loop into the elements and then loop again using the children property, if its a SimpleXML object. Ill add the right code below.
I will also submit the point 1 problem of the Eclipse Watch or xdebug, to their forums in order to guess whats the real problem.
Thank you prodigitalson, very usefull answer :)
Worked like a charm hehe.
Here I am adding a complete function which searchs for a substring in all atributes of a node an subnodes recursively, and returns the full string where it has been found.
In my case its perfect to search for some values like href=, and other dinamically generated tag values.
Also shows the implementation of what we have talked above. Probably it can be improved and more safe checks can be added.
/* public function bSimpleXMLfindfullstringwithsubstring($node, $sSearchforsubstring, &$sFullstringfound, &$bfoundsubstring)
* Recursive function to search for the first substring in a list of SimpleXML objects, looking in all its children, in all their attributes.
* Returns true if the substring has been found.
* Parameter return:
* $sFullstringfound: returns the full string where the substring has been found.
* $bfoundsubstring: returns true if the substring has been found.
*/
public function bSimpleXMLfindfullstringwithsubstring($node, $sSearchforsubstring, &$sFullstringfound, &$bfoundsubstring=false)
{
$bRet = false;
if ((isset($node) && ($bfoundsubstring == false)))
{
//If the node has attributes
if ($node->attributes()->count() > 0)
{
//Search the string in all the elements of the current SimpleXML object.
foreach ($node->attributes() AS $name => $attribute) //[$name = class , (string)$attribute = cabTabs, $attribute = SimpleXML object]
{
//(Take care of charset if necessary).
if (stripos((string)$attribute, $sSearchforsubstring) !== false)
{
//substring found in one of the attributes.
$sFullstringfound = (string)$attribute;
$bfoundsubstring = true;
$bRet = true;
break;
}
}
}
//If the node has childrens (subnodes)
if (($node->count() > 0) && ($bfoundsubstring == false))
{
foreach ($node->children() as $nodechildren)
{
if ($bfoundsubstring == false)
{
//Search in the next children.
self::bSimpleXMLfindfullstringwithsubstring($nodechildren, $sSearchforsubstring, $sFullstringfound, $bfoundsubstring);
}
else
{
break;
}
}
}
}
return $bRet;
}
How to call it:
$b = new sfWebBrowser();
$b->get('http://www.example.com/example.html');
$xml = $b->getResponseXMLfixed();
$result = $xml->xpath('//descendant::div[#class="cabTabs"]'); //example
$sFullString = "";
$bfoundsubstring = false;
foreach ($result as $record)
{
self::bSimpleXMLfindfullstringwithsubstring($record, "/substring/tosearch", $sFullString, $bfoundsubstring);
}

Categories