Selenium, xpath to match a table row containing multiple elements - php

I'm trying to find a Selenium/PHP XPath for matching a table row that contains multiple elements (text, and form elements).
Example:
<table class="foo">
<tr>
<td>Car</td><td>123</td><td><input type="submit" name="s1" value="go"></td>
</tr>
</table>
This works for a single text element:
$this->isElementPresent( "//table/tbody/tr/td[contains(text(), 'Car')]" );
while this does NOT (omitting the /td locator):
$this->isElementPresent( "//table/tbody/tr[contains(text(), 'Car')]" );
and thus, this obviously won't work either for multiple elements:
$this->isElementPresent( "//table/tbody/tr[contains(text(), 'Car')][contains(text(), '123')]" );
Another way to do this would be with getTable( "xpath=//table[#class='foo'].x.y") for each and every row x, column y. Cumbersome, but it worked... mostly. It does NOT return the <input> tag! It will return an empty string for that cell :(
Any ideas?

This XPath expression:
/html/body/table[descendant::td[contains(.,'Car')]]
Note: If you know your schema, don't use a starting // operator. Use string value instead of text node (this way you get the concatenation of all descendant text nodes).

Several paths can be combined with | separator.
Tweak this:
//tr/td[contains(text(), 'Car')]/text() | //tr/td/input[#value="s1"]/#name

you might want to use
//td[contains,'Car'] and td[contains,'123']/ancestor::tr
that will select the tr that contains td which matches the two contains arguments
Try to use View Xpath Plugin in firefox, very useful plugin.
Learn more about Axes in Xpath: http://www.w3schools.com/xpath/xpath_axes.asp

Thanks to knb for some syntax hints.
This is slightly off-topic, but relevant to the search that led me here...
I had a table with [ name | value ] cells. I needed to get value from the row with 'name' preceding it.
(fake example, but every link I was looking for had the same text and no IDs - the point is that the context information was in a neighboring cell)
<table id="options"><tbody>
<tr>
<td>other</td>
<td>edit</td>
</tr>
<tr>
<td>this label</td>
<td>edit</td> <!-- I want this button -->
</tr>
<tr>
<td>other</td>
<td>edit</td>
</tr>
</tbody></table>
I could retrieve the button I wanted like this, using nested [[]] conditions:
//table[#id='options']/tbody/tr[td[contains(text(), 'this label')]]/td[2]/a
"get the "a" that is in a row that contains another cell with the text I'm looking for"
I think this sort of task might be a common case, so I'm posting it here FYI

In my problem, I had a list of products where it was identified by a unique SKU/catalog combination. If I wanted to add that product to a cart, I chose it by SKU and catalog.
Using foob.ar's example:
//table[#class='foo']/tr[td[contains(text(), 'Car')] and td[contains(., '123')]]
You can combine it with dman's solution for choosing a specific element/column within that row
//table[#class='foo']/tr[td[contains(text(), 'Car')] and td[contains(., '123')]]//input[#name='s1']
Edit:
The solution above works if I was only looking for those two values in any of the columns. If you want to find a value relative to a specific column, I had to modify it a bit
//table[#class='foo']/tr[td[position()=1 and contains(text(), 'Car')] and td[position()=2 and contains(text(), '123')]]//input[#name='s1']

Related

Splitting a Value from a String and Appending it to the end of that Value in PHP

I don't know if this is possible as I'm unable to find how I can do this and I'm very new to PHP but here's an overview of my issue:
I have a script which reads a CSV file. One of the columns contains cells which contain HTML tables. At varying positions within all of the tables there exists a table row which contains <td>Retail</td> and then the price such as <td>$300</td> for example. An example is below which I have formatted so that it's easier for you to read, but this is returned as a continuous string from the CSV file normally:
<table>
<tr>
<td>Designer</td>
<td>Hermes</td>
</tr>
<tr>
<td>Size inch</td>
<td>5.9 x 4.3 x 2.4</td>
</tr>
<tr>
<td>Material</td>
<td>Cotton</td>
</tr>
<tr>
<td>Retail</td>
<td>$300.00</td>
</tr>
<tr>
<td>Made in</td>
<td>France</td>
</tr>
</table>
These tables are then required to have the CAD [Canadian Dollars] retail price added to them. Example below of the desired end result:
<table>
<tr>
<td>Designer</td>
<td>Hermes</td>
</tr>
<tr>
<td>Size inch</td>
<td>5.9 x 4.3 x 2.4</td>
</tr>
<tr>
<td>Material</td>
<td>Cotton</td>
</tr>
<tr>
<td>Retail USD</td>
<td>$300.00</td>
</tr>
<tr>
<td>Retail CAD</td>
<td>$410.00</td>
</tr>
<tr>
<td>Made in</td>
<td>France</td>
</tr>
</table>
I have looked at using substr() but it looks as though you need to specify the length of characters that will be ignored from the start of the string which isn't possible for me here as the data varies.
So therefore my question is whether it's at all possible to specifically split the price out from the string and then append it back in after the </tr> so that the result is as above. If you could point me in the right direction of the functions that I would need to use to achieve this then I would really appreciate it. Please bear in mind I am already using str_replace() to rename Retail to Retail USD and I already have a variable created ready to convert USD price to a CAD price which uses a finance API.
Thank you in advance for any insight you can offer me here.
I have looked at using substr() but it looks as though you need to specify the length of characters that will be ignored from the start of the string which isn't possible for me here as the data varies.
So use stripos to find the start of the string you want to replace.
However the more I dig into this, it because a mess very quickly. It would be better to edit the CSV generator rather than trying to mutate your CSV. It would also in an ideal world be better your CSV contained only data and not HTML.
Apologies the following became a large and probably unwieldy answer:
However to do it, you need to isolate this CSV column, into a variable $csvData. Then work with it directly:
$csvData = "<table data from your question>";
$csvData = str_replace("</td>","*!*</td>",$csvData);
//remove all the HTML junk
$csvDataClean = strip_tags($csvData);
// Form an array.
$csvDataArray = explode("*!*",$csvDataClean);
// trim contents of the array.
$csvDataArray = array_map('trim', $csvDataArray);
// remove empty array values.
$csvDataArray = array_filter($csvDataArray);
// build new contents array.
foreach($csvDataArray as $key=>$value){
if($key%2 == 0){
//odd number. Is a content header.
$value = str_replace(" ","_",$value);
$lastHeader = preg_replace("/[^a-z0-9-_]/i","",$value);
}
else {
//even number, it's a value
$csvArray[$lastHeader] = $value;
}
}
//tidy up.
unset($key,$value,$lastHeader,$csvDataArray,$csvDataClean);
print_r($csvArray);
This will now output for you an array of headers and values from your HTML table. You can then easily reference values from this array and then recompile them into an HTML table as nessecary.
Using phpsandbox I can output:
Array
(
[Designer] => Hermes
[Size_inch] => 5.9 x 4.3 x 2.4
[Material] => Cotton
[Retail] => $300.00
[Made_in] => France
)
So you can then take $csvArray['Retail'] and process this value to get the other currency values, and add them to this array. Then you can run this array through another process to rebuild a table, to save into the CSV (although this doesn't come recommended, it's better to save the arraty as a CSV itself, but I don't know your requirements).
So:
//whatever system you currently use to get conversion.
$csvArray['Retail_CAD'] = convert_currency($csvArray['Retail']);
$csvArray['Retail_USD'] = convert_currency($csvArray['Retail']);
And now rebuild the HTML form:
foreach($csvArray as $key=>$value){
$csvOutput .= "<tr><td>".str_replace("_"," ",$key)."</td><td>".$value."</td></tr>\n";
}
unset($key,$value);
$csvOutput = "<table>".$csvOutput."</table>";
print_r($csvOutput);
You can also manually delete and readd the Made_in array key if you want to maintain this as the final array value:
//whatever system you currently use to get conversion.
$csvArray['Retail_CAD'] = convert_currency($csvArray['Retail']);
$csvArray['Retail_USD'] = convert_currency($csvArray['Retail']);
....
$value = $csvArray['Made_in'];
unset($csvArray['Made_in']);
$csvArray['Made_in'] = $value;
This is a hacky but quick way of keeping the "made in" column after the new Retail columns added above.
What you pasted here is a html table, not csv.
Anyway, there are several ways to manipulate strings. str_replace() is one of the most basic ones, so you got that already. In your case, you're probably best off using regular expressions. It's like str_replace but much more powerful. There are plenty of tutorials out there.
If you want to do a lot and more complex manipulation of html or xml data, you may want to have a look at XSLT.
I had to deal with a similar scenario once, what I would do is:
1.-Form first your desired output block in a variable $output_block i.e :
<td>Retail USD</td><td>$300.00</td></tr><tr><td>Retail CAD</td><td>$410.00</td>
note: you dont need the firs opening tr tag neither the last closing one cause you already have those on your original output.
2.-find the position of <td>Retail</td>
(use strpos)
3.-Save the substring you have before in a a variable i.e: $first_part
4.-find the position of <td>Made in</td>
5.-Save the substring you have after this in a variable : $last_part
6.- Your final output: $final_output = $firstpart . $output_block . $last_part;
easy cake... ;)

Testing for a string in a particular element node

in PHP Unit (using Selenium Server) i'm trying to check if a particular element node in xpath has a certain string value, for instance
<table>
<thead></thead>
<tbody>
<tr>
<th>1</th>
<td>value 1</td>
<td>value 2</td>
<td>value c</td>
</tr>
<tr>
<th>2</th>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
with the above code, using the xpath
isElementPresent('//table/tbody/tr[last()-0]/td[last()-1]');
it would return true, if i changed tr[last()-0] to tr[last()-1] it would still return true,
naturally, the isElementPresent would be in a loop with the xpath generated in the loop as well (substituting the integer for $i and $j which are used in the for() loop) as as it is, that would be fine, however, what i want to check is that the has nothing in it
using the same html code above, if i change the xpath
isElementPresent('//table/tbody/tr[last()-$i]/td[last()-1 and text()="${nbsp}"]');
you would think that it would return true at //table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"] and false at //table/tbody/tr[last()-1]/td[last()-1 and text()="${nbsp}"] however here is the kicker
using Selenium IDE 1.10.0 Plugin for Firefox to check the xpath by putting it in the Target Box and hitting find (to check that it will locate the xpath when it should, //table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"] doesn't highlight the 2nd last td in the last tr, it highlights the first td in the last tr, as if the xpath was //table/tbody/tr[last()-0]/td[last()-2]
from my experiments, it seems to be treating the xapth like //table/tbody/tr[last()-0]/td[text()="${nbsp}"] which would only be the FIRST instance in which text is a blank space, not good if the 2nd tr was like
<tr>
<th>2</th>
<td>cows are my friends</td>
<td>let's go to my room pig!</td>
<td> </td>
</tr>
and i was to use isElementPresent('//table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"]'); it would still return true as its not looking at last()-1 but last()-0
so my question is, how can i check if a particular element node has a certain string
NOTE 1: i use last()-# cause on this page http://www.w3schools.com/xpath/xpath_syntax.asp it says IE5 and later says [0] is the first not and not [1] like Firefox or Chrome which in a sense makes sense since that's how an array's index works, for full compatibility, i start from the last Node and work backwards which last() would work with IE5 and later and the logic of moving backwards though the nodes should be the same (unless microsoft wants to redefine that logic)
NOTE 2: i am well aware that a simple fix is to add title or id attributes to the table however the page i'm making the test for was done by someone else, i would like to avoid modifying the page just to suit test cases
NOTE 3: the table i'm testing is populated using a JSON string so my test is when there is no data for the table, is it blank, if not, than the JSON string is adding data to the table when it shouldn't
EDIT 1: seems like ${nbsp} doesn't work in php, only in the Selenium IDE 1.10.0 Plugin seemed to recognize it however inserting a space by holding Alt and typing in 0160 worked just as fine
EDIT 2: for the time being i have added id attributes to the tags to get this working and it works perfectly fine with checking #id=[VALUE] and text()=[VALUE] but it would still be good to get this question answered as while i add id, title and/or class attributes to all my html tags the person who originally made the table i was testing obviously didn't and as i said in NOTE 2, 'i would like to avoid modifying the page just to suit test cases'
#Memor-X Use SimpleXML to load the xml. Then it should be easy to test in phpunit. If I was you I would build in dependency tests to validate the xml structure before testing the contents.

Access child on a table using Xpath

I am trying to access a specific element of the Dom using XPath
Here is an example
<table>
<tbody>
<tr>
<td>
<b>1</b> data<br>
<b>2</b> data<br>
<b>3</b> data<br>
</td>
</tr>
</tbody>
</table>
I want to target "table td" so my query in Xpath is something like
$finder->query('//table/td');
only this doesn't return the td as its a sub child and direct access would be done using
$finder->query('//tr/td');
Is there a better way to write the query which would allow me to use something like the first example ignoring the elements in-between and return the TD?
Is there a better way to write the query which would allow me to use
something like the first example ignoring the elements in-between and
return the TD?
You can write:
//table//td
However, is this really "better"?
In many cases the evaluation of the XPath pseudo-operator // can result in significant inefficiency as it causes the whole subtree rooted in the context-node to be traversed.
Whenever the path to the wanted nodes is statically known, it may be more efficient to replace any // with the specific, known path, thus avoiding the complete subtree traversal.
For the provided XML document, such expression is:
/*/*/tr/td
If there is more than one table element, each a child of the top element and we want to select only the tds of the forst table, a good, specific expression is:
/*/table[1]/*/tr/td
If we want to select only the first td of the first table in the same document, a good way to do this would be:
(/*/table[1]/*/tr//td)[1]
Or if we want to select the first td in the XML document (not knowing its structure in advance), then we could specify this:
(//td)[1]
What you are looking for is:
$finder->query('//table//td');
Oh boy oh boy, there's something not seen often.
As for your first xpath query, you can just return what you want but use double // on before tagnames
But, I don't see why you don't just want to get the td's by tagname...
You can write this way too:-
$finder->query('//td');

PHP DOMXPath content match & select expression

I'm trying to match and select a bunch of cols of a table but don't get it working. Here's a simplified table:
<table>
<tr>
<td>Foo</td>
<td>Bar</td>
<td>Arb</td>
<td>...</td>
</tr>
<tr>
<td>Foo</td>
<td>Rab</td>
</tr>
</table>
So I want to get the TDs wich contain Bar and Arb and others but not Foo and nothing from the 2nd TR Block. Someone knows if this is possible with a XPath expression?
Note: There's nothing static in there. The only way to get the correct cols is to match the first TDs content.
Don't know if my answer could help you out, but an adaptable XPath could look like:
//table/tr[1]/td[x] | //table/tr[1]/td[y]
where x and y depends which column you'd like to target. The | computes two node-sets.
I'd also like to suggest you to install the XPath Checker Firefox AddOn ( https://addons.mozilla.org/en-US/firefox/addon/xpath-checker/ ) where you're able to play around with the xPath syntax, in order to trigger a particular DOM Element of a website.

Beautiful Soup [Python] and the extracting of text in a table

i am new to Python and to Beatiful Soup also! I heard about BS. It is told to be a great tool to parse and extract content. So here i am...:
I want to take the content of the first td of a table in a html
document. For example, i have this table
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>
<td>
This is the second sample text
</td>
</tr>
</table>
How can i use beautifulsoup to take the text "This is a sample text"?
I use soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.
Thanks... or should i try to get the whole stuff with Perl ... which i am not so familiar with. Another soltion would be a regex in PHP.
See the target [1]: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323
Note; since the html is a bit invalid - i think that we have to do some cleaning. That can cause a lot of PHP code - since we want to solve the job in PHP. Perl would be a good solution too.
Many thanks for some hints and ideas for a starting point
zero
First find the table (as you are doing). Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list):
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
Then use find again to find the first td:
first_td = table.find('td')
Then use renderContents() to extract the textual contents:
text = first_td.renderContents()
... and the job is done (though you may also want to use strip() to remove leading and trailing spaces:
trimmed_text = text.strip()
This should give:
>>> print trimmed_text
This is a sample text
>>>
as desired.
Use "text" to get text between "td"
1) First read table DOM using tag or ID
soup = BeautifulSoup(self.driver.page_source, "html.parser")
htnm_migration_table = soup.find("table", {'id':'htnm_migration_table'})
2) Read tbody
tbody = htnm_migration_table.find('tbody')
3) Read all tr from tbody tag
trs = tbody.find_all('tr')
4) get all tds using tr
for tr in trs:
tds = tr.find_all('td')
for td in tds:
print(td.text)
I find Beautiful Soup very efficient tool so keep learning it :-) It is able to parse a page with invalid markup so it should be able to handle the page you refer. You may want to use command BeautifulSoup(html).prettify() command if you want to get a valid reformatted page source with valid markup.
As for your question, the result of your first soup.findAll(...) command is also a Beautiful Soup object and you can make a second search in it, like this:
table_soup = soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'})
your_sample_text = table_soup.find("td").renderContents().strip()
print your_sample_text

Categories