I am trying to access a specific element of the Dom using XPath
Here is an example
<table>
<tbody>
<tr>
<td>
<b>1</b> data<br>
<b>2</b> data<br>
<b>3</b> data<br>
</td>
</tr>
</tbody>
</table>
I want to target "table td" so my query in Xpath is something like
$finder->query('//table/td');
only this doesn't return the td as its a sub child and direct access would be done using
$finder->query('//tr/td');
Is there a better way to write the query which would allow me to use something like the first example ignoring the elements in-between and return the TD?
Is there a better way to write the query which would allow me to use
something like the first example ignoring the elements in-between and
return the TD?
You can write:
//table//td
However, is this really "better"?
In many cases the evaluation of the XPath pseudo-operator // can result in significant inefficiency as it causes the whole subtree rooted in the context-node to be traversed.
Whenever the path to the wanted nodes is statically known, it may be more efficient to replace any // with the specific, known path, thus avoiding the complete subtree traversal.
For the provided XML document, such expression is:
/*/*/tr/td
If there is more than one table element, each a child of the top element and we want to select only the tds of the forst table, a good, specific expression is:
/*/table[1]/*/tr/td
If we want to select only the first td of the first table in the same document, a good way to do this would be:
(/*/table[1]/*/tr//td)[1]
Or if we want to select the first td in the XML document (not knowing its structure in advance), then we could specify this:
(//td)[1]
What you are looking for is:
$finder->query('//table//td');
Oh boy oh boy, there's something not seen often.
As for your first xpath query, you can just return what you want but use double // on before tagnames
But, I don't see why you don't just want to get the td's by tagname...
You can write this way too:-
$finder->query('//td');
Related
I'm using Html Agility Pack to run xpath queries on a web page. I want to find the rows in a table which contain a certain interesting element. In the example below, I want to fetch the second row.
<table name="important">
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
<tr>
<td>Stuff I'm interested in</td>
<td><interestingtag/></td>
<td>More stuff I'm interested in</td>
</tr>
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
<tr>
<td>Stuff I'm NOT interested in</td>
</tr>
</table>
I'm looking to do something like this:
//table[#name='important']/tr[has a descendant named interestingtag]
Except with valid xpath syntax. ;-)
I suppose I could just find the interesting element itself and then work my way up the parent chain from the node that's returned, but it seemed like there ought to be a way to do this in one step and I'm just being dense.
"has a descendant named interestintag" is spelled .//interestintag in XPath, so the expression you are looking for is:
//table[#name='important']/tr[.//interestingtag]
Actually, you need to look for a descendant, not a child:
//table[#name='important']/tr[descendant::interestingtag]
I know this isn't what the OP was asking, but if you wanted to find an element that had a descendant with a particular attribute, you could do something like this:
//table[#name='important']/tr[.//*[#attr='value']]
I know it is a late answer but why not going the other way around. Finding all <interestingtag/> tags and then select the parent <tr> tag.
//interestingtag/ancestor::tr
I am trying to parse a html file/strings for two things using php and xpath.
<DIV STYLE="top:110px; left:1280px; width:88px" Class="S0">Aug30</DIV>
I tried to look for an unknown value (here: Aug30) with knowing the style top and left value (here: 110px and 1280px).
And the other way. I know the value Aug30 but want to get its values of top and left.
Perhaps XPATH is not the best way to do this. Any idea on how to solve my problem?
Thanks in advance for your help!
To filter <div> element by style attribute value in XPath you can do something like this :
//div[contains(#style, 'top:110px') and contains(#style, 'left:1280px')]
Above XPath will search for <div> node having style attribute value contains two specific strings.
The other requirement isn't supported in XPath 1.0 as far as I can see. We can get the entire value of style attribute, but getting part of it is a dead end. There are some string functions we can use, even though returning a function's result isn't supported.
You'll need to do that using XPath 2.0 or using the host programming language (PHP in this case).
in PHP Unit (using Selenium Server) i'm trying to check if a particular element node in xpath has a certain string value, for instance
<table>
<thead></thead>
<tbody>
<tr>
<th>1</th>
<td>value 1</td>
<td>value 2</td>
<td>value c</td>
</tr>
<tr>
<th>2</th>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
with the above code, using the xpath
isElementPresent('//table/tbody/tr[last()-0]/td[last()-1]');
it would return true, if i changed tr[last()-0] to tr[last()-1] it would still return true,
naturally, the isElementPresent would be in a loop with the xpath generated in the loop as well (substituting the integer for $i and $j which are used in the for() loop) as as it is, that would be fine, however, what i want to check is that the has nothing in it
using the same html code above, if i change the xpath
isElementPresent('//table/tbody/tr[last()-$i]/td[last()-1 and text()="${nbsp}"]');
you would think that it would return true at //table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"] and false at //table/tbody/tr[last()-1]/td[last()-1 and text()="${nbsp}"] however here is the kicker
using Selenium IDE 1.10.0 Plugin for Firefox to check the xpath by putting it in the Target Box and hitting find (to check that it will locate the xpath when it should, //table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"] doesn't highlight the 2nd last td in the last tr, it highlights the first td in the last tr, as if the xpath was //table/tbody/tr[last()-0]/td[last()-2]
from my experiments, it seems to be treating the xapth like //table/tbody/tr[last()-0]/td[text()="${nbsp}"] which would only be the FIRST instance in which text is a blank space, not good if the 2nd tr was like
<tr>
<th>2</th>
<td>cows are my friends</td>
<td>let's go to my room pig!</td>
<td> </td>
</tr>
and i was to use isElementPresent('//table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"]'); it would still return true as its not looking at last()-1 but last()-0
so my question is, how can i check if a particular element node has a certain string
NOTE 1: i use last()-# cause on this page http://www.w3schools.com/xpath/xpath_syntax.asp it says IE5 and later says [0] is the first not and not [1] like Firefox or Chrome which in a sense makes sense since that's how an array's index works, for full compatibility, i start from the last Node and work backwards which last() would work with IE5 and later and the logic of moving backwards though the nodes should be the same (unless microsoft wants to redefine that logic)
NOTE 2: i am well aware that a simple fix is to add title or id attributes to the table however the page i'm making the test for was done by someone else, i would like to avoid modifying the page just to suit test cases
NOTE 3: the table i'm testing is populated using a JSON string so my test is when there is no data for the table, is it blank, if not, than the JSON string is adding data to the table when it shouldn't
EDIT 1: seems like ${nbsp} doesn't work in php, only in the Selenium IDE 1.10.0 Plugin seemed to recognize it however inserting a space by holding Alt and typing in 0160 worked just as fine
EDIT 2: for the time being i have added id attributes to the tags to get this working and it works perfectly fine with checking #id=[VALUE] and text()=[VALUE] but it would still be good to get this question answered as while i add id, title and/or class attributes to all my html tags the person who originally made the table i was testing obviously didn't and as i said in NOTE 2, 'i would like to avoid modifying the page just to suit test cases'
#Memor-X Use SimpleXML to load the xml. Then it should be easy to test in phpunit. If I was you I would build in dependency tests to validate the xml structure before testing the contents.
i am new to Python and to Beatiful Soup also! I heard about BS. It is told to be a great tool to parse and extract content. So here i am...:
I want to take the content of the first td of a table in a html
document. For example, i have this table
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>
<td>
This is the second sample text
</td>
</tr>
</table>
How can i use beautifulsoup to take the text "This is a sample text"?
I use soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.
Thanks... or should i try to get the whole stuff with Perl ... which i am not so familiar with. Another soltion would be a regex in PHP.
See the target [1]: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323
Note; since the html is a bit invalid - i think that we have to do some cleaning. That can cause a lot of PHP code - since we want to solve the job in PHP. Perl would be a good solution too.
Many thanks for some hints and ideas for a starting point
zero
First find the table (as you are doing). Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list):
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
Then use find again to find the first td:
first_td = table.find('td')
Then use renderContents() to extract the textual contents:
text = first_td.renderContents()
... and the job is done (though you may also want to use strip() to remove leading and trailing spaces:
trimmed_text = text.strip()
This should give:
>>> print trimmed_text
This is a sample text
>>>
as desired.
Use "text" to get text between "td"
1) First read table DOM using tag or ID
soup = BeautifulSoup(self.driver.page_source, "html.parser")
htnm_migration_table = soup.find("table", {'id':'htnm_migration_table'})
2) Read tbody
tbody = htnm_migration_table.find('tbody')
3) Read all tr from tbody tag
trs = tbody.find_all('tr')
4) get all tds using tr
for tr in trs:
tds = tr.find_all('td')
for td in tds:
print(td.text)
I find Beautiful Soup very efficient tool so keep learning it :-) It is able to parse a page with invalid markup so it should be able to handle the page you refer. You may want to use command BeautifulSoup(html).prettify() command if you want to get a valid reformatted page source with valid markup.
As for your question, the result of your first soup.findAll(...) command is also a Beautiful Soup object and you can make a second search in it, like this:
table_soup = soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'})
your_sample_text = table_soup.find("td").renderContents().strip()
print your_sample_text
I'm trying to find a Selenium/PHP XPath for matching a table row that contains multiple elements (text, and form elements).
Example:
<table class="foo">
<tr>
<td>Car</td><td>123</td><td><input type="submit" name="s1" value="go"></td>
</tr>
</table>
This works for a single text element:
$this->isElementPresent( "//table/tbody/tr/td[contains(text(), 'Car')]" );
while this does NOT (omitting the /td locator):
$this->isElementPresent( "//table/tbody/tr[contains(text(), 'Car')]" );
and thus, this obviously won't work either for multiple elements:
$this->isElementPresent( "//table/tbody/tr[contains(text(), 'Car')][contains(text(), '123')]" );
Another way to do this would be with getTable( "xpath=//table[#class='foo'].x.y") for each and every row x, column y. Cumbersome, but it worked... mostly. It does NOT return the <input> tag! It will return an empty string for that cell :(
Any ideas?
This XPath expression:
/html/body/table[descendant::td[contains(.,'Car')]]
Note: If you know your schema, don't use a starting // operator. Use string value instead of text node (this way you get the concatenation of all descendant text nodes).
Several paths can be combined with | separator.
Tweak this:
//tr/td[contains(text(), 'Car')]/text() | //tr/td/input[#value="s1"]/#name
you might want to use
//td[contains,'Car'] and td[contains,'123']/ancestor::tr
that will select the tr that contains td which matches the two contains arguments
Try to use View Xpath Plugin in firefox, very useful plugin.
Learn more about Axes in Xpath: http://www.w3schools.com/xpath/xpath_axes.asp
Thanks to knb for some syntax hints.
This is slightly off-topic, but relevant to the search that led me here...
I had a table with [ name | value ] cells. I needed to get value from the row with 'name' preceding it.
(fake example, but every link I was looking for had the same text and no IDs - the point is that the context information was in a neighboring cell)
<table id="options"><tbody>
<tr>
<td>other</td>
<td>edit</td>
</tr>
<tr>
<td>this label</td>
<td>edit</td> <!-- I want this button -->
</tr>
<tr>
<td>other</td>
<td>edit</td>
</tr>
</tbody></table>
I could retrieve the button I wanted like this, using nested [[]] conditions:
//table[#id='options']/tbody/tr[td[contains(text(), 'this label')]]/td[2]/a
"get the "a" that is in a row that contains another cell with the text I'm looking for"
I think this sort of task might be a common case, so I'm posting it here FYI
In my problem, I had a list of products where it was identified by a unique SKU/catalog combination. If I wanted to add that product to a cart, I chose it by SKU and catalog.
Using foob.ar's example:
//table[#class='foo']/tr[td[contains(text(), 'Car')] and td[contains(., '123')]]
You can combine it with dman's solution for choosing a specific element/column within that row
//table[#class='foo']/tr[td[contains(text(), 'Car')] and td[contains(., '123')]]//input[#name='s1']
Edit:
The solution above works if I was only looking for those two values in any of the columns. If you want to find a value relative to a specific column, I had to modify it a bit
//table[#class='foo']/tr[td[position()=1 and contains(text(), 'Car')] and td[position()=2 and contains(text(), '123')]]//input[#name='s1']