PHP DOMXPath content match & select expression

PHP DOMXPath content match & select expression - php

I'm trying to match and select a bunch of cols of a table but don't get it working. Here's a simplified table:
<table>
<tr>
<td>Foo</td>
<td>Bar</td>
<td>Arb</td>
<td>...</td>
</tr>
<tr>
<td>Foo</td>
<td>Rab</td>
</tr>
</table>
So I want to get the TDs wich contain Bar and Arb and others but not Foo and nothing from the 2nd TR Block. Someone knows if this is possible with a XPath expression?
Note: There's nothing static in there. The only way to get the correct cols is to match the first TDs content.

Don't know if my answer could help you out, but an adaptable XPath could look like:
//table/tr[1]/td[x] | //table/tr[1]/td[y]
where x and y depends which column you'd like to target. The | computes two node-sets.
I'd also like to suggest you to install the XPath Checker Firefox AddOn ( https://addons.mozilla.org/en-US/firefox/addon/xpath-checker/ ) where you're able to play around with the xPath syntax, in order to trigger a particular DOM Element of a website.

Related

Returning specific DIV via specific TAG - PHP XPATH [duplicate]

I'm using Html Agility Pack to run xpath queries on a web page. I want to find the rows in a table which contain a certain interesting element. In the example below, I want to fetch the second row.
<table name="important">
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
<tr>
<td>Stuff I'm interested in</td>
<td><interestingtag/></td>
<td>More stuff I'm interested in</td>
</tr>
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
</table>
I'm looking to do something like this:
//table[#name='important']/tr[has a descendant named interestingtag]
Except with valid xpath syntax. ;-)
I suppose I could just find the interesting element itself and then work my way up the parent chain from the node that's returned, but it seemed like there ought to be a way to do this in one step and I'm just being dense.

"has a descendant named interestintag" is spelled .//interestintag in XPath, so the expression you are looking for is:
//table[#name='important']/tr[.//interestingtag]

Actually, you need to look for a descendant, not a child:
//table[#name='important']/tr[descendant::interestingtag]

I know this isn't what the OP was asking, but if you wanted to find an element that had a descendant with a particular attribute, you could do something like this:
//table[#name='important']/tr[.//*[#attr='value']]

I know it is a late answer but why not going the other way around. Finding all <interestingtag/> tags and then select the parent <tr> tag.
//interestingtag/ancestor::tr

Php_simple_html_dom on a table

I would like to extract data from a website, whose code is written like this:
...
<tr>
<td class="something1"><a class="whatever" href="#">NAME</a> </td>
<td class="something2">DATA</td>
<td class="something3">NUMERIC DATA</td>
</tr>
...
In particular, I have my NAME list from my MySQL database, and if my NAME is equal to NAME on this website, I want to print on my website the correspondent NUMERIC DATA.
I know I can do something with php_simple_html_dom but I cannot really achieve this action. Can you please help me?
Thanks!

So you want to read NAME first. if relevant then read the rest? You can read a website Dom as explained here: How do I get the HTML code of a web page in PHP?
$html = file_get_contents('http://pathToTheWebsite.com/thePage');
Now lets parse the $html with some regex. (you can use that library too, the documentation tells you how to do it!
preg_match('/<td class="something1"><a class="whatever" href="#">(?<name>\w)</a> </td>/', $html, $matches);
now $matches['name'] will contain the NAME. You can do the same for the rest and maybe cleanup that regex a little this was just an example.

Testing for a string in a particular element node

in PHP Unit (using Selenium Server) i'm trying to check if a particular element node in xpath has a certain string value, for instance
<table>
<thead></thead>
<tbody>
<tr>
<th>1</th>
<td>value 1</td>
<td>value 2</td>
<td>value c</td>
</tr>
<tr>
<th>2</th>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
with the above code, using the xpath
isElementPresent('//table/tbody/tr[last()-0]/td[last()-1]');
it would return true, if i changed tr[last()-0] to tr[last()-1] it would still return true,
naturally, the isElementPresent would be in a loop with the xpath generated in the loop as well (substituting the integer for $i and $j which are used in the for() loop) as as it is, that would be fine, however, what i want to check is that the has nothing in it
using the same html code above, if i change the xpath
isElementPresent('//table/tbody/tr[last()-$i]/td[last()-1 and text()="${nbsp}"]');
you would think that it would return true at //table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"] and false at //table/tbody/tr[last()-1]/td[last()-1 and text()="${nbsp}"] however here is the kicker
using Selenium IDE 1.10.0 Plugin for Firefox to check the xpath by putting it in the Target Box and hitting find (to check that it will locate the xpath when it should, //table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"] doesn't highlight the 2nd last td in the last tr, it highlights the first td in the last tr, as if the xpath was //table/tbody/tr[last()-0]/td[last()-2]
from my experiments, it seems to be treating the xapth like //table/tbody/tr[last()-0]/td[text()="${nbsp}"] which would only be the FIRST instance in which text is a blank space, not good if the 2nd tr was like
<tr>
<th>2</th>
<td>cows are my friends</td>
<td>let's go to my room pig!</td>
<td> </td>
</tr>
and i was to use isElementPresent('//table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"]'); it would still return true as its not looking at last()-1 but last()-0
so my question is, how can i check if a particular element node has a certain string
NOTE 1: i use last()-# cause on this page http://www.w3schools.com/xpath/xpath_syntax.asp it says IE5 and later says [0] is the first not and not [1] like Firefox or Chrome which in a sense makes sense since that's how an array's index works, for full compatibility, i start from the last Node and work backwards which last() would work with IE5 and later and the logic of moving backwards though the nodes should be the same (unless microsoft wants to redefine that logic)
NOTE 2: i am well aware that a simple fix is to add title or id attributes to the table however the page i'm making the test for was done by someone else, i would like to avoid modifying the page just to suit test cases
NOTE 3: the table i'm testing is populated using a JSON string so my test is when there is no data for the table, is it blank, if not, than the JSON string is adding data to the table when it shouldn't
EDIT 1: seems like ${nbsp} doesn't work in php, only in the Selenium IDE 1.10.0 Plugin seemed to recognize it however inserting a space by holding Alt and typing in 0160 worked just as fine
EDIT 2: for the time being i have added id attributes to the tags to get this working and it works perfectly fine with checking #id=[VALUE] and text()=[VALUE] but it would still be good to get this question answered as while i add id, title and/or class attributes to all my html tags the person who originally made the table i was testing obviously didn't and as i said in NOTE 2, 'i would like to avoid modifying the page just to suit test cases'

#Memor-X Use SimpleXML to load the xml. Then it should be easy to test in phpunit. If I was you I would build in dependency tests to validate the xml structure before testing the contents.

Access child on a table using Xpath

I am trying to access a specific element of the Dom using XPath
Here is an example
<table>
<tbody>
<tr>
<td>
<b>1</b> data<br>
<b>2</b> data<br>
<b>3</b> data<br>
</td>
</tr>
</tbody>
</table>
I want to target "table td" so my query in Xpath is something like
$finder->query('//table/td');
only this doesn't return the td as its a sub child and direct access would be done using
$finder->query('//tr/td');
Is there a better way to write the query which would allow me to use something like the first example ignoring the elements in-between and return the TD?

Is there a better way to write the query which would allow me to use
something like the first example ignoring the elements in-between and
return the TD?
You can write:
//table//td
However, is this really "better"?
In many cases the evaluation of the XPath pseudo-operator // can result in significant inefficiency as it causes the whole subtree rooted in the context-node to be traversed.
Whenever the path to the wanted nodes is statically known, it may be more efficient to replace any // with the specific, known path, thus avoiding the complete subtree traversal.
For the provided XML document, such expression is:
/*/*/tr/td
If there is more than one table element, each a child of the top element and we want to select only the tds of the forst table, a good, specific expression is:
/*/table[1]/*/tr/td
If we want to select only the first td of the first table in the same document, a good way to do this would be:
(/*/table[1]/*/tr//td)[1]
Or if we want to select the first td in the XML document (not knowing its structure in advance), then we could specify this:
(//td)[1]

What you are looking for is:
$finder->query('//table//td');

Oh boy oh boy, there's something not seen often.
As for your first xpath query, you can just return what you want but use double // on before tagnames
But, I don't see why you don't just want to get the td's by tagname...

You can write this way too:-
$finder->query('//td');

Selenium, xpath to match a table row containing multiple elements

I'm trying to find a Selenium/PHP XPath for matching a table row that contains multiple elements (text, and form elements).
Example:
<table class="foo">
<tr>
<td>Car</td><td>123</td><td><input type="submit" name="s1" value="go"></td>
</tr>
</table>
This works for a single text element:
$this->isElementPresent( "//table/tbody/tr/td[contains(text(), 'Car')]" );
while this does NOT (omitting the /td locator):
$this->isElementPresent( "//table/tbody/tr[contains(text(), 'Car')]" );
and thus, this obviously won't work either for multiple elements:
$this->isElementPresent( "//table/tbody/tr[contains(text(), 'Car')][contains(text(), '123')]" );
Another way to do this would be with getTable( "xpath=//table[#class='foo'].x.y") for each and every row x, column y. Cumbersome, but it worked... mostly. It does NOT return the <input> tag! It will return an empty string for that cell :(
Any ideas?

This XPath expression:
/html/body/table[descendant::td[contains(.,'Car')]]
Note: If you know your schema, don't use a starting // operator. Use string value instead of text node (this way you get the concatenation of all descendant text nodes).

Several paths can be combined with | separator.
Tweak this:
//tr/td[contains(text(), 'Car')]/text() | //tr/td/input[#value="s1"]/#name

you might want to use
//td[contains,'Car'] and td[contains,'123']/ancestor::tr
that will select the tr that contains td which matches the two contains arguments
Try to use View Xpath Plugin in firefox, very useful plugin.
Learn more about Axes in Xpath: http://www.w3schools.com/xpath/xpath_axes.asp

Thanks to knb for some syntax hints.
This is slightly off-topic, but relevant to the search that led me here...
I had a table with [ name | value ] cells. I needed to get value from the row with 'name' preceding it.
(fake example, but every link I was looking for had the same text and no IDs - the point is that the context information was in a neighboring cell)
<table id="options"><tbody>
<tr>
<td>other</td>
<td>edit</td>
</tr>
<tr>
<td>this label</td>
<td>edit</td> <!-- I want this button -->
</tr>
<tr>
<td>other</td>
<td>edit</td>
</tr>
</tbody></table>
I could retrieve the button I wanted like this, using nested [[]] conditions:
//table[#id='options']/tbody/tr[td[contains(text(), 'this label')]]/td[2]/a
"get the "a" that is in a row that contains another cell with the text I'm looking for"
I think this sort of task might be a common case, so I'm posting it here FYI

In my problem, I had a list of products where it was identified by a unique SKU/catalog combination. If I wanted to add that product to a cart, I chose it by SKU and catalog.
Using foob.ar's example:
//table[#class='foo']/tr[td[contains(text(), 'Car')] and td[contains(., '123')]]
You can combine it with dman's solution for choosing a specific element/column within that row
//table[#class='foo']/tr[td[contains(text(), 'Car')] and td[contains(., '123')]]//input[#name='s1']
Edit:
The solution above works if I was only looking for those two values in any of the columns. If you want to find a value relative to a specific column, I had to modify it a bit
//table[#class='foo']/tr[td[position()=1 and contains(text(), 'Car')] and td[position()=2 and contains(text(), '123')]]//input[#name='s1']

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP DOMXPath content match & select expression - php

Related

Returning specific DIV via specific TAG - PHP XPATH [duplicate]

Php_simple_html_dom on a table

Testing for a string in a particular element node

Access child on a table using Xpath

Selenium, xpath to match a table row containing multiple elements

Categories

Resources