I'm using Behat and Mink framework for BDD (using PHP)
I was looping through td elements to verify the values and text. Since all the elements returned are span elements, I could use getText() on the elements and it was all good.
$page = $this->getSession()->getPage();
$expected_values = ["Tuesday", 22, 22, 22, 22];
$actual_rows = $page->findAll('css', 'table.admin-table tbody tr td');
$actual_Values = array();
foreach($actual_rows as $row) {
$actual_Values[] = $row->getText();
}
assertEquals($expected_values, $actual_Values);
But recently, the design of the page has changed. And some of the span elements have been replaced by input elements. And getText() returns null so I replaced it with getValue(). But since the first td element is a span it returns null if we use getText().
Is there anyway I can skip the first td within the loop from the code snippet above.
Update:
Here is what I put in comments which hasn't rendered properly:
$actual_Values[] = $row->find('css', 'input')->getValue();
I misread and thought you want to get text independently whether there is an input or span. To skip the first value you can just:
foreach($actual_rows as $row) {
if ($row !== reset($actual_rows) {
$actual_Values[] = $row->getText();
}
}
Or a better approach would be to array_shift($actual_rows) to remove the first value and do the normal loop then.
Or another approach would be to change your selector to $page->findAll('css', 'table.admin-table tbody tr td:not(:first-child)'); to select everything but the first td child.
Related
I am web scraping a table with Xpath and matching TR's tds but the problem in this situation. some of TR has one td so I need to eliminate those. But with that elimination I am having a quiet problem.
For example:
$getTR = $path->query("//table[#class='bgc_line']/tr");
foreach($getTR as $tr){
if ($tr->length == 2) {
$route = $path>query("//table[#class='bgc_line']/tr/td[1]");
foreach ($route as $td1) {
$property[] = trim($td1->nodeValue);
}
$route = $path->query("//table[#class='bgc_line']/tr/td[2]");
foreach ($route as $td2) {
$value[] = trim($td2->nodeValue);
}
}
}
So my usage of if isn't exactly right. But is there other way to do this? Because I have two expression and first Xpath's count is different then second. That's why I can't match the Data with each other. you can see the table here.
You can use
table[#class='bgc_line']/tr[count(td) > 1]
To get table rows only if they have more than one child td
A few elements in my HTML page have the same value for the class attribute. I want to remove all of them (elements) except the first one.
I wrote the following SSCCE. So the question is
There are two loops being executed, the first one changes the attribute value for the first element and breaks the loop, and the second one then removes the elements with that attribute value.
So is there a shorter, less costly (in terms of memory, speed etc.) or more straightforward way to do this? May be can be done in a single loop or something like that? I feel I am making it unnecessarily long.
<?php
require_once("E:\\simple_html_dom.php");
$haystack = '<div>
<div class="removable" style="background-color:pink; width:100%; height:50px;">aa</div>
<div style="background-color:brown; width:100%; height:50px;">ss</div>
<div class="removable" style="background-color:grey; width:100%; height:50px;">dd</div>
<div class="removable" style="background-color:green; width:100%; height:50px;">gg</div>
<div style="background-color:blue; width:100%; height:50px;">hh</div>
<div class="removable" style="background-color:purple; width:100%; height:50px;">jj</div>
</div>';
$html_haystack = str_get_html($haystack);
//echo $html_haystack; //check
foreach ($html_haystack->find('div[class=removable]') as $removable) {
$removable->class='removable_first';
//$removable->style='background-color:black; width=100%; height=50px;'; //check
break;
}
foreach($html_haystack->find('div[class=removable]') as $removable) {
$removable->outertext= '';
}
$haystack = $html_haystack->save();
echo $haystack;
Find function returns an array, so the first element has index 0. No need then to use the 1st loop !
// Get all nodes
$array = $html_haystack->find('div[class=removable]');
// Edit the 1st => maybe you won't need this line if you're doing so only to skip the 1st node
$array[0]->class='removable_first';
// Remove the 1st from the array
unset($array[0]);
// Loop through the other nodes
foreach($array as $removable) {
$removable->outertext= '';
}
$html->find('.removable', 0)->class = 'removable_first';
foreach($html->find('.removable') as $removable){
$removable->outertext = '';
}
I've got a foreach loop that is only running once and it has me stumped.
1: I load an array of status values (either "request", "delete", or "purchased")
2: I then load an xml file and need to loop through the "code" nodes and update their status, BUT if the new code is "delete" I want to remove it before moving onto the next one
XML structure is....
<content>
.... lots of stuff
<codes>
<code date="xxx" status="request">xxxxx</code>
.. repeat ...
</codes>
</content>
and the php code is ...
$newstatus = $_POST['updates'];
$file = '../apps/templates/'.$folder.'/layout.xml';
$xml2 = simplexml_load_file($file);
foreach($xml2->codes->code as $code){
if($code['status'] == "delete") {
$dom=dom_import_simplexml($code);
$dom->parentNode->removeChild($dom);
}
}
$xml2->asXml($file);
I've temporarily removed the updating so I can debug the delete check.
This all works BUT it only removes the 1st delete and leaves all the other deletes even though it's a foreach loop??.
Any help greatly appreciated.
Deleting multiple times in the same iteration is unstable. E.g. if you remove the second element, the third becomes the second and so on.
You can prevent that by storing the elements to delete into an array first:
$elementsToRemove = array();
foreach ($xml2->codes->code as $code) {
if ($code['status'] == "delete") {
$elementsToRemove[] = $code;
}
}
And then you remove the element based on the array which is stable while you iterate over it:
foreach ($elementsToRemove as $code) {
unset($code[0]);
}
You could also put the if-condition into an xpath query which does return the array directly (see the duplicate question for an example) or by making use of iterator_to_array().
SimpleXML node lists are plain arrays of references, and like with any deleting of items while forward iterating through an array, the array position pointer can get mixed up because the expected next item has disappeared.
The simple way to remove a bunch of children in SimpleXML without using an extra array is to iterate in reverse (=decrementing the index), taking the looping in your example to:
// FOR EACH NODE IN REVERSE
$elements=$xml2->xpath('codes/code');
$count=count($elements);
for($j=$count-1;$j>=0;$j--){
// IF TO DELETE
$code=$elements[$j];
if($code['status']=="delete"){
// DELETE ELEMENT
$dom=dom_import_simplexml($code);
$dom->parentNode->removeChild($dom);
}
}
Of course, if your other processing requires forward iterating through the elements, then using an array is the best.
I would like to find all <tr> starting from the second, but i don't know how to get it right..
$items = $html->find('tr');
That piece of code gets all trs but i want everyone except the first one because that one contains <th>.
Just cut off the first element.
$items = array_slice($html->find('tr'), 1)
When you get your list with $html->find('tr'); make a loop that don't care of the first "index/row".
if Simple html dom work like Jquery, try to use like this:
$items = $html->find('tr:not(:has(th)');
As PoulsQ suggests you CAN do it like
$firstTr = true;
foreach($html->find('tr') as $tr) {
if(!$firstTr) {
// YOUR LOGIC FOR A TR HERE
}
else {
$firstTr = false;
}
}
But I think it would be nicer code if you query the DOM to ignore the first element.
You can get all trs from $html->find('tr'); from this u can add the condition to ignore the if the next element for object is "th" tag then u can ignore that tr.
I am using XPath, and this is my query:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1"]/table/tr/td');
And everything works fine.
Then I change the condition in the div, and the query is like this:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con2"]/table/tr/td');
And I do see what I must see.
But later, if I do this:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1" or #id="con2"]/table/tr/td');
I see again only the elements of con1. Why is that?
The full code is below:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1" or #id="con2"]/table/tr/td');
foreach ( $elements as $element ) {
$str1=$element->getAttribute('class');
$str2="first-td";
$str3="status";
if (strcmp($str1,$str2)==0) {
var_dump( $element->nodeValue);
}
if (strcmp($str1,$str3)==0) {
echo $element->childNodes->item(0)->getAttribute('class'). "<br />";
}
}
To sum up: If my condition is only con1, I see the correct results. If it's only con2, I see the correct results. The problem comes when I am using the or. In that case, I see the results only from con1. It's like it's stopping after fullfilling the first condtions. They are at the same level of the DOM tree.
What you are trying to do is to retrieve <div id="con1"> and <div id="con2"> in the same expression, but what you are actually doing is to retrieve a div which either has an attribute id="con1" or id="con2". The first expression of the condition returns true and then you get the <div id="con1"> node. It makes sense.
To get both nodes you need something like:
//div[#id="con1"]|//div[#id="con2"
Note: //div[#id="con1"] finds whatever node <div id="con1"> in the tree and the id in a document has to be unique. It's not necessary to specify all the path down.