XPath query is sometimes not showing the right elements

XPath query is sometimes not showing the right elements - php

I am using XPath, and this is my query:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1"]/table/tr/td');
And everything works fine.
Then I change the condition in the div, and the query is like this:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con2"]/table/tr/td');
And I do see what I must see.
But later, if I do this:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1" or #id="con2"]/table/tr/td');
I see again only the elements of con1. Why is that?
The full code is below:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1" or #id="con2"]/table/tr/td');
foreach ( $elements as $element ) {
$str1=$element->getAttribute('class');
$str2="first-td";
$str3="status";
if (strcmp($str1,$str2)==0) {
var_dump( $element->nodeValue);
}
if (strcmp($str1,$str3)==0) {
echo $element->childNodes->item(0)->getAttribute('class'). "<br />";
}
}
To sum up: If my condition is only con1, I see the correct results. If it's only con2, I see the correct results. The problem comes when I am using the or. In that case, I see the results only from con1. It's like it's stopping after fullfilling the first condtions. They are at the same level of the DOM tree.

What you are trying to do is to retrieve <div id="con1"> and <div id="con2"> in the same expression, but what you are actually doing is to retrieve a div which either has an attribute id="con1" or id="con2". The first expression of the condition returns true and then you get the <div id="con1"> node. It makes sense.
To get both nodes you need something like:
//div[#id="con1"]|//div[#id="con2"
Note: //div[#id="con1"] finds whatever node <div id="con1"> in the tree and the id in a document has to be unique. It's not necessary to specify all the path down.

Related

Replace a foreach loop with XPath expression using DOMXPath

I want to replace a foreach loop using a xpath expression, but I need that a DOMXPath object to return more than one list.
I have the following XML (simplified) and I using DOMDocument and DOMXPath to iterate over it:
<a:RoomsType>
<a:Rooms>
<a:Room>
<a:RPH>0</a:RPH>
</a:Room>
<a:Room>
<a:RPH>1</a:RPH>
</a:Room>
<a:Room>
<a:RPH>2</a:RPH>
</a:Room>
<a:Room>
<a:RPH>0</a:RPH>
</a:Room>
<a:Rooms>
<a:RoomsType>
I want to split the rooms by the RPH number, creating a list of rooms for each RPH number. Currently, I'm using the following code:
//$xpath is a DOMXPath object
$roomsToIterate = $this->xpath->query("//a:RoomsType/a:Rooms/a:Room");
$roomList = array();
foreach ($roomsToIterate as $room) {
$rphCandidate = $room->getElementsByTagName("RPH")->item(0)->nodeValue;
if (!isset($roomList[$rphCandidate])) {
$roomList[$rphCandidate] = array();
}
$roomList[$rphCandidate][] = $room;
}
This is working for now, but I want to replace the foreach loop with a Xpath expression. I can use the expression $rooms = $this->xpath->query("//a:RoomsType/a:Rooms/a:Room[a:RPH='{$rph}']"); with $rph being a number, but how can I do it if I don't know the RPH (it could be anything between 0 and 99). Is it possible?
In short, Are there any way to replace my foreach loop using XPath?
I was thinking about the use of registerPhpFunctions and a custom function, but I concerned about the performance of this approach compared with foreach loop

Xpath 1.0 expression will return a list of nodes, they can to some extend flatten an existing structure if you use an axis like descendant or ancestor, but it will be a list of nodes. It can not group or aggregate them.
You could fetch a lists of nodes with a specific RPH value. But you would need to this for each value, the result would be another loop. This would mean to fetch all RPH values, make them unique, iterate them and execute and Xpath expression for each value.
Your current solution is fine.

Check if XML element is existing in loop

For a website i'm making i need to get data from an external XML file.
I load the data like this:
$doc = new DOMDocument();
$url = 'http://myurl/results/xml/12345';
if (!$doc->load($url))
{
echo json_encode(array('error'=> 'error'));
exit;
}
$xpath = new DOMXPath($doc);
$program_date = $xpath->query('//game/date');
Then i use a foreach loop to get all the data
if($program_date){
foreach($program_date as $node){
$programArray['program_date'][] = $node->nodeValue;
}
}
The problem i'm having is that sometimes a certain game doesn't have a date.
So when a game doesn't have a date, i just want it to put "-", instead of the date from the XML file. My problem is that i don't know how to check if a date is present in the data.
I used a lot of ways like isset, !isset, else, !empty, empty
$teamArray['program_kind'][] = "-";
but noting works...
Can someone help me with this problem?
Thanks in advance

You need to iterate the game elements, use them as a context and fetch the data with additional XPath expressions.
But one thing first. Use DOMXPath::evaluate(). DOMXPath::query() only supports location paths. It can only return a node list. But XPath expressions can return scalar values, too.
$xpath = new DOMXPath($doc);
$games = $xpath->evaluate('//game');
The result of //game will always be a DOMNodeList object. It can be an empty list, but you can directly iterate it. A condition like if ($games) will always be true.
foreach ($games as $game) {
Now that you have the game element node, you can use it as an context to fetch other data.
$date = $xpath->evaluate('string(date)', $game);
string() casts the first node of the location path into a string. If it can not match a node, it will return an empty string. Check normalize-space() if you want to remove whitespaces at the same time.
You can validate if the game element has a date node using count().
$hasDate = $xpath->evaluate('count(date) > 0', $game);
The result of this XPath expression is always a boolean.

echo out just first foreach array

I am using this piece of code with using Simple Html dom :
$google = "http://www.google.com/something.";
$html = file_get_html($google_html);
foreach ($html->find('span[class=st]') as $element)
echo $element->innertext;
But i just want to echo out the first one of $element->innertext.
How can i just echo out first one ?
The above code echo's all elements.
Is there any way to stop the searching of simpledom , when the first child of array get found ?
I mean we don't need to get ALL of the elements, we just need the first one, so it's wasting time to picking all elements and them picking up the first one !
the Better is that when the fist one , got found , the SimpleDom get stop for finding new items.

Don't use iteration if you don't need it.
$elements = $html->find('span[class=st]');
echo $elements[0]->innertext;
You can also use the :first modifier in the selector to make it more efficient.

Use break() after the first iteration.
foreach ($html->find('span[class=st]') as $element){
echo $element->innertext;
break;
}
You can read more about break() in this documentation from PHP.net: http://php.net/manual/en/control-structures.break.php
But I'd use this method to get the first element of the array instead:
echo $html->find('span[class=st]')->innertext;
No need to loop.

Accounting for missing array keys, within PHP foreach loop

I'm parsing a document for several different values, with PHP and Xpath. I'm throwing the results/matches of my Xpath queries into an array. So for example, I build my $prices array like this:
$prices = array();
$result = $xpath->query("//div[#class='the-price']");
foreach ($result as $object) {
$prices[] = $object->nodeValue; }
Once I have my array built, I loop through and throw the values into some HTML like this:
$i = 0;
foreach ($links as $link) {
echo <<<EOF
<div class="the-product">
<div class="the-name"><a title="{$names[$i]}" href="{$link}" target="blank">{$names[$i]}</a></div>
<br />
<div class="the-image"><a title="{$names[$i]}" href="{$link}" target="blank"><img src="{$images[$i]}" /></a></div>
<br />
<div class="the-current-price">Price is: <br> {$prices[$i]}</div>
</div>
EOF;
$i++; }
The problem is, some items in the original document that I'm parsing don't have a price, as in, they don't even contain <div class='the-price'>, so my Xpath isn't finding a value, and isn't inserting a value into the $prices array. I end up returning 20 products, and an array which contains only 17 keys/values, leading to Notice: Undefined offset errors all over the place.
So my question is, how can I account for items that are missing key values and throwing off my arrays? Can I insert dummy values into the array for these items? I've tried as many different solutions as I can think of. Mainly, IF statements within my foreach loops, but nothing seems to work.
Thank you

I suggest you look for an element inside your html which is always present in your "price"-loop. After you find this object you start looking for the "price" element, if there is none, you insert an empty string, etc. into your array.

Instead of directly looking for the the-price elements, look for the containing the-product. Loop on those, then do a subquery using those nodes as the starting context. That way you get all of the the-product nodes, plus the prices for those that have them.
e.g.
$products = array();
$products = $xpath->query("//div[#class='the-product']");
$found = 0 ;
foreach ($products as $product) {
$products[$found] = array();
$price = $xpath->query("//div[#class='the-price']", $product);
if ($price->length > 0) {
$products[$found] = $price->item(0)->nodeValue;
}
$found++;
}

If you don't want to show the products that don't have a price attached to them you could check if $prices[$i] is set first.
foreach($links AS $link){
if(isset($prices[$i])){
// echo content
}
}
Or if you wanted to fill it will dummy values you could say
$prices = array_merge($prices,
array_fill(count($prices), count($links)-count($prices),0));
And that would insert 0 as a dummy value for any remaining values. array_fill starts off by taking the first index of the array (so we start one after the amount of keys in $prices), then how many we need to fill, so we subtract how many are in $prices from how many are in $links, then we fill it with the dummy value 0.
Alternatively you could use the same logic in the first example and just apply that by saying:
echo isset($prices[$i]) ? $prices[$i] : '0';

Hard to understand the relation between $links and $prices with the code shown. Since you are building the $prices array without any relation to the $links array, I don't see how you would do this.
Is $links also built via xpath? If so, is 'the-price' div always nested within the DOM element used to populate $links?
If it is you could nest your xpath query to find the price within the query used to find the links and use a counter to match the two.
i.e.
$links_result = $xpath->query('path-to-link')
$i = 0
foreach ($links_result as $link_object) {
$links[$i] = $link_object->nodeValue;
// pass $link_object as context reference to xpath query looking for price
$price_result = $xpath->query('path-to-price-within-link-node', $link_object);
if (false !== $price_result) {
$prices[$i] = $price_result->nodeValue;
} else {
$prices[$i] = 0; // or whatever value you want to show to indicate that no price was available.
}
$i++;
}
Obviously, there could be additional handling in there to verify that only one price value exists per link node and so forth, but that is basic idea.

DOMXpath - Get href attribute and text value of an a element

So I have a HTML string like this:
<td class="name">
Some Name
</td>
<td class="name">
Some Name2
</td>
Using XPath I'm able to get value of href attribute using this Xpath query:
$domXpath = new \DOMXPath($this->domPage);
$hrefs = $domXpath->query("//td[#class='name']/a/#href");
foreach($hrefs as $href) {...}
And It's even easier to get a text value, like this:
// Xpath auto. strips any html tags so we are
// left with clean text value of a element
$domXpath = new \DOMXPath($this->domPage);
$names = $domXpath->query("//td[#class='name']/");
foreach($names as $name) {...}
Now I'm curious to know, how can I combine those two queries to get both values with only one query (If it's something like that even posible?).

Fetch
//td[#class='name']/a
and then pluck the text with nodeValue and the attribute with getAttribute('href').
Apart from that, you can combine Xpath queries with the Union Operator | so you can use
//td[#class='name']/a/#href|//td[#class='name']
as well.

To reduce the code to a single loop, try:
$anchors = $domXpath->query("//td[#class='name']/a");
foreach($anchors as $a)
{
print $a->nodeValue." - ".$a->getAttribute("href")."<br/>";
}
As per above :) Too slow ..

Simplest way, evaluate is for this task!
The simplest way to obtain a value is by evaluate() method:
$xp = new DOMXPath($dom);
$v = $xp->evaluate("string(/etc[1]/#stringValue)");
Note: important to limit XPath returns to 1 item (the first a in this case), and cast the value with string() or round(), etc.
So, in a set of multiple items, using your foreach code,
$names = $domXpath->query("//td[#class='name']/");
foreach($names as $contextNode) {
$text = $domXpath->evaluate("string(./a[1])",$contextNode);
$href = $domXpath->evaluate("string(./a[1]/#href)",$contextNode);
}
PS: this example is only for evaluate's illustration... When the information already exists at the node, use what offers best performance, as methods getAttribute(), saveXML(), etc. and properties as $nodeValue, $textContent, etc. supplied by DOMNode. See #Gordon's answer for this particular problem. The XPath subquery (at context) is good for complex cases — or symplify your code, avoiding to check hasChildNodes() + loop for $childNodes, etc. with no significative gain in performance.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

XPath query is sometimes not showing the right elements - php

Related

Replace a foreach loop with XPath expression using DOMXPath

Check if XML element is existing in loop

echo out just first foreach array

Accounting for missing array keys, within PHP foreach loop

DOMXpath - Get href attribute and text value of an a element

Categories

Resources