php domdocument parse nested tables - php

I got a table which looks like this: http://pastebin.com/jjZxeNHF
I got it as a PHP-DOMDocument.
Now I want to "parse" this table.
If I am correct, something like the following is not going to work because $superTable->getElementsByTagName('tr') is not only going to get outer tr's but also the inner ones.
foreach ($superTable->getElementsByTagName('tr') as $superRow) {
foreach ($superRow->getElementsByTagName('td') as $superCol) {
foreach ($superCol->getElementsByTagName('table') as $table) {
foreach ($table->getElementsByTagName('tr') as $row) {
foreach ($row->getElementsByTagName('td') as $col) {
}
}
}
}
}
How can I go trough all the tables, field by field, as described in the second snippet.

This is my solution:
foreach ($raumplan->getElementsByTagName('tr') as $superRow) {
if ($superRow->getElementsByTagName('table')->length > 0) {
foreach ($superRow->getElementsByTagName('td') as $superCol) {
if ($superCol->getElementsByTagName('table')->length > 0) {
foreach ($superCol->getElementsByTagName('table') as $table) {
foreach ($table->getElementsByTagName('tr') as $row) {
foreach ($row->getElementsByTagName('td') as $col) {
}
}
}
}
}
}
}
It checks if you are in the outer table by looking if there is a table contained in the element.

You could use XPath to eliminate a lot of the blatantly low-level iteration and reduce the apparent complexity of all this...
$xpath = new DOMXPath($document);
foreach ($xpath->query('//selector/for/superTable//table') as $table) {
// in case you really wanted them...
$superCol = $table->parentNode;
$superRow = $superCol->parentNode;
foreach ($table->getElementsByTagName('td') as $col) {
$row = $td->parentNode;
// do your thing with each cell here
}
}
You could drill down further than this, if you wanted -- if you just wanted every cell in the inner tables, you could reduce it to one loop over //selector/for/superTable//table//td.
Course, if you're dealing with valid HTML, then you could just loop over each element's children as well. It all depends on what the HTML will look like, and exactly what you need from it.
Edit: If you can't use XPath for some reason, you might could do something like
// I assume you've found $superTable already
foreach ($superTable->getElementsByTagName('table') as $table) {
$superCol = $table->parentNode;
$superRow = $superCol->parentNode;
foreach ($table->getElementsByTagName('td') as $col) {
$row = $col->parentNode;
// do your thing here
}
}
Note that neither solution bothers to iterate over the rows etc. That's a big part of what obviates the need to get only rows in the current table. You're only looking for tables within the table, which by definition (1) will be the sub-tables and (2) will be within a column within a row within the main table, and you can get the parent row and column from the table element itself.
Of course, both solutions assume you're only nesting tables one level deep. If it's more than that, you're going to want to look at a recursive solution and DOMElement's childNodes property. Or, a more narrowly focused XPath query.

Related

Using Xpath expression and checking element's count

I am web scraping a table with Xpath and matching TR's tds but the problem in this situation. some of TR has one td so I need to eliminate those. But with that elimination I am having a quiet problem.
For example:
$getTR = $path->query("//table[#class='bgc_line']/tr");
foreach($getTR as $tr){
if ($tr->length == 2) {
$route = $path>query("//table[#class='bgc_line']/tr/td[1]");
foreach ($route as $td1) {
$property[] = trim($td1->nodeValue);
}
$route = $path->query("//table[#class='bgc_line']/tr/td[2]");
foreach ($route as $td2) {
$value[] = trim($td2->nodeValue);
}
}
}
So my usage of if isn't exactly right. But is there other way to do this? Because I have two expression and first Xpath's count is different then second. That's why I can't match the Data with each other. you can see the table here.
You can use
table[#class='bgc_line']/tr[count(td) > 1]
To get table rows only if they have more than one child td

How to fetch one single row of an excel sheet with Box\Spout

I'm trying out Box\Spout and rewriting some code which was formerly using PHPExcel. Iterating through rows is clear but in a few cases I need to address directly one specific row. I cannot find this in the documentation.
Something like:
$row = $sheet->getRow(8);
You can't access rows directly. If you need the 8th row, you'll need to read the first 8 rows... This is because Spout does not load the entire spreadsheet in memory so it reads the data row by row.
However, you can do something like this:
foreach ($reader->getSheetIterator() as $sheet) {
foreach ($sheet->getRowIterator() as $rowIndex => $row) {
if ($rowIndex !== 8) {
continue;
}
// do something with row 8
}
}
Try using the iterator's key() method:
$it = $sheet->getRowIterator();
$row = $it->key(8);

(Why) should I test if an array is empty prior to a for loop?

Given an empty array $items = array();
Why should I use the following code (which I've seen used before):
if (count($items) > 0) {
foreach ($items as $item) // do stuff here
}
instead of just
foreach ($items as $item) // do stuff here
If count($items) === 0 the loop won't execute anyway??
You don't generally need to do the check. I see lots of code with that check, but it's often unnecessary, and I think it's usually due to ignorance on the part of the programmer. A similar pattern that I frequently see is after database queries:
if (mysqli_num_rows($stmt) > 0) {
while ($row = mysqli_fetch_assoc($stmt)) {
...
}
}
This test is also unnecessary; if no rows were found, the first fetch will return false, and the loop will stop.
The only case where it's useful is if you want to tell the user that there were no results, instead of just displaying an empty table. So you might do:
if (count($items) > 0) {
echo "<table>";
foreach ($items as $item) {
// display table row
}
echo "</table>";
} else {
echo "<b>No data found!</b>";
}
It actually depends on what you want to do. As the comments have pointed out, you may want an empty array to be a special case, so you'd like to handle that case differently. Otherwise, no warning will be thrown if you execute foreach on an empty array, you just won't get any results. A typical check you should execute is if $items is an array at all, or cast $items into an array anyway to avoid getting that warning. If you did that and $items would be generally be converted into an array of one value, i.e. the value $items had at that point.
e.g.
$items = 2;
foreach((array)$items as $item) {
print $item; //will print 2 alright
}
or
if(is_array($items)) {
foreach($items as $item) {
print $item;
}
}
else {
// do something here
}

PHP Comparing two multidimensional arrays with foreach

I'm attempting to compare non-matching values within two multidimensional arrays ($allSessions, my master array and $userSessions, my inner array...everything in it should be within $allSessions, but structured differently) and my approach was to use a foreach within a foreach loop.
This works under most situations except one (when $userSession only contains one item).
I'm wondering if the bug is caused by this loop within a loop? When it is buggy because $userSessions only contains 1 item, the returned $unregistered array contains multiples of each item...
$allSessions = $this->getAllUpcoming();
$unregistered = array();
$userSessions = $this->getUserSessions($userID);
foreach ($allSessions as $session) {
foreach ($userSessions as $user) {
if ($user["entry_data"]["session-participant-session"]["id"] !== $session["id"]){
array_push($unregistered, $session);
}
}
}
In the way you have it, you will get every non-matching element.
Let's say you have a perfect match of a,b,c in $allSessions and a,b,c in $userSessions. In your first outer loop you have 'a'. In the inner loop you will add 'b' and 'c' to your $unregistered because they don't match. Then you go on to 'b' in your outer loop and add 'a' and another copy of 'c' in your inner loop. And so on.
I'm pretty you're going to have to structure it differently. You've got to check every element in $userSessions and go on to the next element in $allSessions only if you don't find any matches:
foreach ($allSessions as $session) {
foreach ($userSessions as $user) {
if ($user["entry_data"]["session-participant-session"]["id"] === $session["id"])
continue 2; // this goes to the next element in $allSessions
}
array_push($unregistered, $session);
}
A slightly more readable form if you are not familiar with continue:
foreach ($allSessions as $session) {
$found = false;
foreach ($userSessions as $user) {
if ($user["entry_data"]["session-participant-session"]["id"] === $session["id"]) {
$found = true;
break; // an optimization - not strictly necessary
}
}
if (!$found)
array_push($unregistered, $session);
}

PHP Simple Dom - Loop through one class at a time?

Is there any way to do this? currently I'm using this method to go through the td's for every table in a wikipedia entry that have a class of wikitable.
foreach ($html->find('table.wikitable td') as $key => $info)
{
$text = $info->innertext;
}
However, what I want to do is have seperate loops for each table that share the class name of wikitable. I can't figure out how to do this.
Is there some kind of syntax? I want to do something like this
$first_table = $html->find('table.wikitable td', 0); // return all the td's for the first table
$second_table = $html->find('table.wikitable td', 1); // second one
I might not fully understand your question but it seems that $html->find simply returns an array, in your case an array of tables:
$tables = $html->find('table.wikitable');
You can then loop through your tables and find the td's in each table:
foreach( $tables as $table )
{
$tds = $table->find('td');
foreach( $tds as $td )
{
...
}
}
If you only want to target the second table you can use:
$table = $tables[1];
Or something like that.

Categories