Some beginner questions about PHP SimpleXML and xpath - php

I am learning PHP SimpleXML and I have some questions.
I have been playing to get code from a web in the intranet of my work. I need generic code whenever its possible, since the code could change at any time.
In my example I select a div tag and all its children.
...
<div class="cabTabs">
<ul>
<li>Info1</li>
<li>Info2</li>
<li>Info3</li>
</ul>
</div>
...
//Get all web content:
$b = new sfWebBrowser(); //using symfony 1.4.17 sfWebBrower to get a SimpleXML object.
$b->get('http://intranetwebexample'); //returns a sfWebBrower object.
$xml = $b->getResponseXML(); //returns a SimpleXMLElement
//[Eclipse xdebug Watch - $xml]
"$xml" SimpleXMLElement
#attributes Array [3]
head SimpleXMLElement
body SimpleXMLElement
//Get the div class="cabTabs".
$result = $xml->xpath('//descendant::div[#class="cabTabs"]');
//[Eclipse xdebug Watch - $result]
"$result" Array [1]
0 SimpleXMLElement
#attributes Array [1]
class cabTabs
ul SimpleXMLElement
li Array [6]
Questions:
The use of descendant:: prefix:
I have read in other stackoverflow topics that descendant:: prefix is not recommended.
In order to select a tag, and all its content, what should be the right way to do it?
Im using the above code, but dont know if its the right way to do it.
Some questions checking the Eclipse xdebug variable Watch:
2.1 Some times I cant expand the SimpleXML tree more than one or levels. In the example above, I cant access/see the below "li" node, and see its children.
Could it be a limitation of xdebug debugger with SimpleXML objects or maybe a limitation of the Eclipse Watch?
I can perfectly expand/see the "li" node when I access its parent with the usual loop: foreach($ul->li as $li).
However its not a critical bug, I think it would be perfect to see it directly and report it in the proper forum.
2.2 I dont understant at all the result code of the $xml->xpath:
If we take a look at the Eclipse Watch, the "div" tag has been converted to a 0 index key, but the "ul" and "li" tags had their original names, why?
2.3 How to access/loop xpath content with a generic code:
Im using the following Non generic code to access it:
foreach ($result as $record) {
foreach($record->ul as $ul) {
foreach($ul->li as $li) {
foreach($li->a as $a) {
echo ' ' . $a->name;
}
}
}
}
The above code works but only if we write the right tag names. (->ul, ->li, ->a..)
What is the generic way to loop through all its content without having to specify the children name each time? (->ul, ->li, ->a..)
Also I would prefer not having to convert it to an array, unless its the right way.
I have been trying with children() property, but it doesnt work, it stops and crashes in that line: foreach ($result->children() as $ul)
Thank you a lot in advance for taking your time to read my questions. Any help is really welcome :)
System info:
symfony 1.4.17 with sfWebBrowserPlugin, cURL dadapter.
PHP 5.4.0 with cURL support enabled, cURL Information 7.24.0

I dont know I've never used it myself
dont know i usually use Zend Debug - but i dont understand your question anyway... i think you left out some words :-)
2.1 PRobably xdebug/eclipse. Id check preferences theres probably a setting to limit the amount of recursion to help manage memory.
2.2 SimpleXML::xpath Always returns an array of matched Nodes. Thats why you have integer index array as your result. So if you do //someelement you get an array of all someelement tags. You can then access their descendents in the normal fashion like $someelement->itschildelement.
2.3 $result->children() is a good way to get at things in a generic sense. If Xdebug is crashing thats just xdebug. Either turn it off, ignore it, or find a different debugger :-) Xdebug is jsut a tool but shouldnt dictate how you implement things.

I think now I perfectly understand problem 2.2 and 2.3.
Since its xpath is returning an Array[1], as you explained, and not a SimpleXML object, I cant never use $result->children() because a php array doesnt have the children() property hehe. (Im a bit idiot lol).
The solution is simple, as you have explained, counting the number of elements of the array, loop into the elements and then loop again using the children property, if its a SimpleXML object. Ill add the right code below.
I will also submit the point 1 problem of the Eclipse Watch or xdebug, to their forums in order to guess whats the real problem.
Thank you prodigitalson, very usefull answer :)

Worked like a charm hehe.
Here I am adding a complete function which searchs for a substring in all atributes of a node an subnodes recursively, and returns the full string where it has been found.
In my case its perfect to search for some values like href=, and other dinamically generated tag values.
Also shows the implementation of what we have talked above. Probably it can be improved and more safe checks can be added.
/* public function bSimpleXMLfindfullstringwithsubstring($node, $sSearchforsubstring, &$sFullstringfound, &$bfoundsubstring)
* Recursive function to search for the first substring in a list of SimpleXML objects, looking in all its children, in all their attributes.
* Returns true if the substring has been found.
* Parameter return:
* $sFullstringfound: returns the full string where the substring has been found.
* $bfoundsubstring: returns true if the substring has been found.
*/
public function bSimpleXMLfindfullstringwithsubstring($node, $sSearchforsubstring, &$sFullstringfound, &$bfoundsubstring=false)
{
$bRet = false;
if ((isset($node) && ($bfoundsubstring == false)))
{
//If the node has attributes
if ($node->attributes()->count() > 0)
{
//Search the string in all the elements of the current SimpleXML object.
foreach ($node->attributes() AS $name => $attribute) //[$name = class , (string)$attribute = cabTabs, $attribute = SimpleXML object]
{
//(Take care of charset if necessary).
if (stripos((string)$attribute, $sSearchforsubstring) !== false)
{
//substring found in one of the attributes.
$sFullstringfound = (string)$attribute;
$bfoundsubstring = true;
$bRet = true;
break;
}
}
}
//If the node has childrens (subnodes)
if (($node->count() > 0) && ($bfoundsubstring == false))
{
foreach ($node->children() as $nodechildren)
{
if ($bfoundsubstring == false)
{
//Search in the next children.
self::bSimpleXMLfindfullstringwithsubstring($nodechildren, $sSearchforsubstring, $sFullstringfound, $bfoundsubstring);
}
else
{
break;
}
}
}
}
return $bRet;
}
How to call it:
$b = new sfWebBrowser();
$b->get('http://www.example.com/example.html');
$xml = $b->getResponseXMLfixed();
$result = $xml->xpath('//descendant::div[#class="cabTabs"]'); //example
$sFullString = "";
$bfoundsubstring = false;
foreach ($result as $record)
{
self::bSimpleXMLfindfullstringwithsubstring($record, "/substring/tosearch", $sFullString, $bfoundsubstring);
}

Related

Trouble with SimpleXMLElement Namespaces

I'm having trouble parsing XML with Namespaces using SimpleXMLElement.
I've tried using looping through the xml and also tried using xpath without success.
$data_url="http://isni.oclc.nl/sru/0000000123121970?query=pica.isn+%3D+%220000000123121970%22&version=1.1&operation=searchRetrieve&stylesheet=http%3A%2F%2Fisni.oclc.nl%2Fsru%2FDB%3D1.2%2F%3Fxsl%3DsearchRetrieveResponse&recordSchema=isni-b&maximumRecords=10&startRecord=1&recordPacking=xml&sortKeys=none&x-info-5-mg-requestGroupings=none";
$data = file_get_contents($data_url);
$xml = simplexml_load_string($data);
$org_names = $xml->children('srw', true)->records->children('srw', true)->record->children('srw', true)->recordData->responseRecord->isniassigned->isnimetadata->identity->organisation->organisationnamevariant->mainname;
foreach($org_names as $a)
{
echo "a: $a\n";
}
I'm expecting to get a list of organisationnamevariant->mainname items:
Academia lugduno-batava
Leiden university
Leidse universiteit
etc.
However, I'm getting this error: Trying to get property of non-object
Having such a deep hierarchy is difficult to navigate using the normal -> structure, but you also have to be careful when changing namespace. You only need to do the ->children('srw', true) once and then all of the child nodes will be for that namespace. BUT you also have to switch back at <responseRecord> by using ->children().
You also need to be careful that you use the proper case for each tag name...
$org_names = $xml->children('srw', true)->records->record->recordData->children()->
responseRecord->ISNIAssigned->ISNIMetadata->identity->organisation->
organisationNameVariant->mainName;
echo (string)$org_names;
An alternative is to use XPath (as xpath() returns a list of matches, I use [0] to only use the first one)...
$org_names = $xml->xpath("//organisationNameVariant/mainName");
echo (string)$org_names[0];
I know that echo casts the value to a string, but if you use this in any other scenario, you may end up with a SimpleXMLElement instead, so I tend to add the case to string in just to make the point.

Reading values of a multi-dimensional array populated by xpath

I am a bit of a newbie to PHP and I'm not sure what I'm missing here. I have an multidimensional array that I've created from an XML file using XPath. I'm able to move through the array and retrieve most all values but I am getting stuck on one section.
Example of XML structure:
MasterNode
SubNodeItem1
SubNodeItem2
SubNodeItem3
SubNodeItemList
SubListItem
SubItemProperty1
SubItemProperty2
SubItemProperty3
SubItemList
SubItemProperty1
SubItemProperty2
SubItemProperty3
SubNodeItem4
SubNodeItem5
I am able to retrieve the value of any of the SubNode values by using the following syntax:
$val=$XML[$i]->SubNodeItem1;
however, I can not for the life of me figure out how to retrieve the values of SubListItemProperty.
I figured this would be the logical syntax:
$SubItemPropVal=$XML[$i]->SubNodeItemList->SubListItem[$i]->SubItemProperty1;
I have searched other forums and topics related to PHP multi arrays and have not been able to find the proper way to do this.
I am getting a "Trying to get property of non-object" error when I run the code. I'm pretty sure that's the indication that I'm not pointing the node correctly.
My recommendation would be to keep the XML file, which apparently works fine already, and use it.
Transferring its elements into an array does not make much sense to me.
EDIT: The OP does not actually use an array, but a SimpleXML object.
XPath is extremely flexible and powerful in selecting the needed bits from an XML document:
$doc = new DOMDocument();
$doc->loadXML($your_xml);
$xp = new DOMXPath($doc);
// for example
$result = $xp->query("//SubListItem[2]/SubItemProperty1");
if ($result->length)
{
echo $result->item(0)->textContent;
}
SimpleXML would also work:
$xml = simplexml_load_string($result);
// either this ($node will be an array of matches, or FALSE)
$node = $xml->xpath("//SubNodeItemList/SubListItem[1]/SubItemProperty1");
// or this (unless you add a number, [0] will be assumed)
$node = $xml->SubNodeItemList->SubListItem->SubItemProperty1;
Important: Array notation counts from 0, while XPath always counts from 1.
Note that the second option (array notation) will throw run-time errors when the structure of the document is not what your code expects.
With XPath there would simply be no return value, which is easier to handle (no try/catch block necessary, an if ($node) { ... } suffices).
Also note that with SimpleXML, the document element (<MasterNode>) becomes the document. So you would not use $xml->MasterNode->SubNodeItemList, but $xml->SubNodeItemList.

PHP node not passing by reference

I have a bunch of dom manipulation functions within a class.
One of those functions assigns unique ids to specific nodes.
$resource_info_node->setAttribute('id', 'resource_'.$this->ids);
$details['id'] = 'resource_'.$this->ids;
$details['node'] = $resource_info_node;
$this->resource_nodes['resource_'.$this->ids] = $details;
$this->ids += 1;
later I want to look up and modify those nodes.
I have tried :
$current_node = $this->resource_nodes[$id]['node'];
When I print_r() I find that this node is a duplicate of the original node.
It has the original node's attributes but is not a part of the DOM tree.
I get the same results with :
$this->content->getElementById($id);
I suppose I based this whole thing on storing node references in an array. I thought that was a fine thing to do. Even if not, after that using getElementByID() should have returned the node within the dom.
I thought that, in PHP all objects were passed by reference. Including DOM nodes.
Any ideas on how I can test what is actually going on.
EDIT :
Well I used :
$this->xpath->query('//*[#id]');
That returned the right number of items with ids. The node is just not in the DOM tree when I edit it.
and
$current_node = &$this->resource_nodes[$id]['node'];
Using the reference syntax had no affect.
The strangest part is that get elementById() is not returning a node in the dom. It has all the right attributes except no parentNode.
FIX - not answer :
I just used xpath instead of my reference or getElementById().
Use reference explicity:
$current_node = &$this->resource_nodes[$id]['node'];
And modify $current_node

PHP search function clarification

I found this function in another question on Stack Overflow, but I would like a clarification on something:
function sort_comments($ar)
{
$comments = array();
foreach($ar as $item)
{
if(is_null($item['parent_id'])) $comments[] = $item;
else
{
$parent_array = array_search_key($item['parent_id'],$comments,'id');
if($parent_array !== false) $comments[$parent_array]['replies'][] = $item;
}
}
return $comments;
}
Could someone explain the arguments passed to array_searched_key() ? I searched for this function in php.net but did not find it. Again, I'm a bit confused about the arguments, specially why the $comment array is passed to it.
First, this is not a PHP core function. It's a Wordpress function built specialy to sort comment when displaying them.
But there's a simple explanations as I understand it:
First argument: the ID to search (the query)
Second argument: array to search in (the datas)
Third argument: the column to search in (in the array)
As I understand it, it's this.
If you linked the relevant StackOverflow thread it would put things in context. My guess is that it's this implementation, or similar. The function is not native to PHP and without knowing it's source it's impossible to answer.
I don't think it is the WordPress function being used in this case - the only place I can find this function in the WordPress codebase only accepts two parameters.
I rather think that they are referring to this other function I found on pastebin. Unfortunately, the author hasn't provided comments describing the parameters but we have:
$needle - a value to match inside the array of arrays being searched
$haystack - an array of arrays being searched
$haystackKey - the key within the inner arrays which we want to find in the array of arrays
$strict - if set to true (default false) then type matching is enforced
So, the function returns true if the key and data pair can be located in at least one of the inner arrays, and false if not.

drupal--why $node->taxonomy is an array

someone wrote this code.
foreach ($node->taxonomy as $term) {
$tids[] = 't.tid = %d';
$args[] = $term->tid;
}
how he knows that in foreach "$node->taxonomy" is an array? and when i loop it,
foreach ($node->taxonomy as $term) {
}
the output that i get will be the $term's value. i don't know how it is change into the 't.tid = %d' and $term->tid. thank you.
In Drupal-related code, a $node is almost always an object produced by the node_load() function. Since every module has the opportunity to add its own properties to this object, it's very hard to find a central documentation of these properties.
By experience and by variable inspection, seasoned Drupal developers know that when set $node->taxonomy is always an array of term object (as returned by the taxonomy_get_term() function) indexed by their respective ids (named tids, for Term ID). This array is set by the taxonomy_nodeapi() function when $op == 'load' and is produced by the taxonomy_get_terms() function.
The question give little information but we can guess that the loop is meant to build two arrays used to generate a database query that filter on the tid column matching those of the $node object. Because the terms' data is already stored in the items of $node->taxonomy, let's hope that this query is not used to re-load the terms to display some of their name and/or description. Collecting 't.tid = %d' is probably a bad idea, the query would be better build with a single "tid in (". db_placeholder($args) .")" WHERE clause after collecting all the tids in $args.
The question is very unclear. All Items under the node object are arrays. You can check it yourself bu using:
print_r($node);
die;
Or using any PHP debugger.
for the foreach, It is very simple foreach... I don't understand what is the problem with that.
t.tid is simply an SQL query. %d is a placeholder for $args[], which consists of $term->tid. It's like this structure: PDO connections.

Categories