I would like to find all <tr> starting from the second, but i don't know how to get it right..
$items = $html->find('tr');
That piece of code gets all trs but i want everyone except the first one because that one contains <th>.
Just cut off the first element.
$items = array_slice($html->find('tr'), 1)
When you get your list with $html->find('tr'); make a loop that don't care of the first "index/row".
if Simple html dom work like Jquery, try to use like this:
$items = $html->find('tr:not(:has(th)');
As PoulsQ suggests you CAN do it like
$firstTr = true;
foreach($html->find('tr') as $tr) {
if(!$firstTr) {
// YOUR LOGIC FOR A TR HERE
}
else {
$firstTr = false;
}
}
But I think it would be nicer code if you query the DOM to ignore the first element.
You can get all trs from $html->find('tr'); from this u can add the condition to ignore the if the next element for object is "th" tag then u can ignore that tr.
Related
I use the code below to iterate all div's on a page with id = news using PHPScraper. Is it possible to only take the first div it find so that the array only contains one entry? I was thinking of maybe (if possible) only take one in the foreach loop like you can do in c# (myList.Take(1))
$dom = file_get_html('http://localhost/test.html');
//collect all news entries into an array
$myArray = array();
if(!empty($dom)) {
$divClass = $title = '';
foreach($dom->find("div[id*=news]") as $divClass) {
You can use break to stop the loop from continuing after you've added the first div.
Something like this:
foreach($dom->find("div[id*=news]") as $divClass) {
$myArray[] = $divClass; // Just assuming you're doing something like this
break;
}
Side note: The code $divClass = $title = ''; before the loop doesn't serve any purpose in your posted code. The variable $divClass will be completely overwritten on each iteration of your foreach.
I'm guessing you're using PHP Simple HTML DOM Parser.
To grab only one element, you can simply pass 0 as the second argument of find:
$firstDiv = $dom->find('div[id*=news]', 0);
foreach($dom->find("div[id*=news]") as $divClass) {
/// work here
break;
}
break; statement is used to stop loop from further processing. So if you use it directly then loop would only execute once.
I am web scraping a table with Xpath and matching TR's tds but the problem in this situation. some of TR has one td so I need to eliminate those. But with that elimination I am having a quiet problem.
For example:
$getTR = $path->query("//table[#class='bgc_line']/tr");
foreach($getTR as $tr){
if ($tr->length == 2) {
$route = $path>query("//table[#class='bgc_line']/tr/td[1]");
foreach ($route as $td1) {
$property[] = trim($td1->nodeValue);
}
$route = $path->query("//table[#class='bgc_line']/tr/td[2]");
foreach ($route as $td2) {
$value[] = trim($td2->nodeValue);
}
}
}
So my usage of if isn't exactly right. But is there other way to do this? Because I have two expression and first Xpath's count is different then second. That's why I can't match the Data with each other. you can see the table here.
You can use
table[#class='bgc_line']/tr[count(td) > 1]
To get table rows only if they have more than one child td
I'm using Behat and Mink framework for BDD (using PHP)
I was looping through td elements to verify the values and text. Since all the elements returned are span elements, I could use getText() on the elements and it was all good.
$page = $this->getSession()->getPage();
$expected_values = ["Tuesday", 22, 22, 22, 22];
$actual_rows = $page->findAll('css', 'table.admin-table tbody tr td');
$actual_Values = array();
foreach($actual_rows as $row) {
$actual_Values[] = $row->getText();
}
assertEquals($expected_values, $actual_Values);
But recently, the design of the page has changed. And some of the span elements have been replaced by input elements. And getText() returns null so I replaced it with getValue(). But since the first td element is a span it returns null if we use getText().
Is there anyway I can skip the first td within the loop from the code snippet above.
Update:
Here is what I put in comments which hasn't rendered properly:
$actual_Values[] = $row->find('css', 'input')->getValue();
I misread and thought you want to get text independently whether there is an input or span. To skip the first value you can just:
foreach($actual_rows as $row) {
if ($row !== reset($actual_rows) {
$actual_Values[] = $row->getText();
}
}
Or a better approach would be to array_shift($actual_rows) to remove the first value and do the normal loop then.
Or another approach would be to change your selector to $page->findAll('css', 'table.admin-table tbody tr td:not(:first-child)'); to select everything but the first td child.
I am using this piece of code with using Simple Html dom :
$google = "http://www.google.com/something.";
$html = file_get_html($google_html);
foreach ($html->find('span[class=st]') as $element)
echo $element->innertext;
But i just want to echo out the first one of $element->innertext.
How can i just echo out first one ?
The above code echo's all elements.
Is there any way to stop the searching of simpledom , when the first child of array get found ?
I mean we don't need to get ALL of the elements, we just need the first one, so it's wasting time to picking all elements and them picking up the first one !
the Better is that when the fist one , got found , the SimpleDom get stop for finding new items.
Don't use iteration if you don't need it.
$elements = $html->find('span[class=st]');
echo $elements[0]->innertext;
You can also use the :first modifier in the selector to make it more efficient.
Use break() after the first iteration.
foreach ($html->find('span[class=st]') as $element){
echo $element->innertext;
break;
}
You can read more about break() in this documentation from PHP.net: http://php.net/manual/en/control-structures.break.php
But I'd use this method to get the first element of the array instead:
echo $html->find('span[class=st]')->innertext;
No need to loop.
I am using XPath, and this is my query:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1"]/table/tr/td');
And everything works fine.
Then I change the condition in the div, and the query is like this:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con2"]/table/tr/td');
And I do see what I must see.
But later, if I do this:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1" or #id="con2"]/table/tr/td');
I see again only the elements of con1. Why is that?
The full code is below:
$elements = $xpath->query('//div/div/div/div/div/div[#id="con1" or #id="con2"]/table/tr/td');
foreach ( $elements as $element ) {
$str1=$element->getAttribute('class');
$str2="first-td";
$str3="status";
if (strcmp($str1,$str2)==0) {
var_dump( $element->nodeValue);
}
if (strcmp($str1,$str3)==0) {
echo $element->childNodes->item(0)->getAttribute('class'). "<br />";
}
}
To sum up: If my condition is only con1, I see the correct results. If it's only con2, I see the correct results. The problem comes when I am using the or. In that case, I see the results only from con1. It's like it's stopping after fullfilling the first condtions. They are at the same level of the DOM tree.
What you are trying to do is to retrieve <div id="con1"> and <div id="con2"> in the same expression, but what you are actually doing is to retrieve a div which either has an attribute id="con1" or id="con2". The first expression of the condition returns true and then you get the <div id="con1"> node. It makes sense.
To get both nodes you need something like:
//div[#id="con1"]|//div[#id="con2"
Note: //div[#id="con1"] finds whatever node <div id="con1"> in the tree and the id in a document has to be unique. It's not necessary to specify all the path down.