Firstly here is my table HTML:
<table class="xyz">
<caption>Outcomes</caption>
<thead>
<tr class="head">
<th title="a" class="left" nowrap="nowrap">A1</th>
<th title="a" class="left" nowrap="nowrap">A2</th>
<th title="result" class="left" nowrap="nowrap">Result</th>
<th title="margin" class="left" nowrap="nowrap">Margin</th>
<th title="area" class="left" nowrap="nowrap">Area</th>
<th title="date" nowrap="nowrap">Date</th>
<th title="link" nowrap="nowrap">Link</th>
</tr>
</thead>
<tbody>
<tr class="data1">
<td class="left" nowrap="nowrap">56546</td>
<td class="left" nowrap="nowrap">75666</td>
<td class="left" nowrap="nowrap">Lower</td>
<td class="left" nowrap="nowrap">High</td>
<td class="left">Area 3</td>
<td nowrap="nowrap">Jan 2 2016</td>
<td nowrap="nowrap">http://localhost/545436</td>
</tr>
<tr class="data1">
<td class="left" nowrap="nowrap">55546</td>
<td class="left" nowrap="nowrap">71666</td>
<td class="left" nowrap="nowrap">Lower</td>
<td class="left" nowrap="nowrap">High</td>
<td class="left">Area 4</td>
<td nowrap="nowrap">Jan 3 2016</td>
<td nowrap="nowrap">http://localhost/545437</td>
</tr>
...
And there are many more <tr> after that.
I am using this PHP code:
$html = file_get_contents('http://localhost/outcomes');
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXPath($document);
$xpath->registerNamespace('', 'http://www.w3.org/1999/xhtml');
$elements = $xpath->query("//table[#class='xyz']");
How can I, now that I have the table as the first element in $elements, get the values of each <td>?
Ideally I want to get arrays like:
array(56546, 75666, 'Lower', 'High', 'Area 3', 'Jan 2 2016', 'http://localhost/545436'),
array(55546, 71666, 'Lower', 'High', 'Area 4', 'Jan 3 2016', 'http://localhost/545437'),
...
But I'm not sure how I can dig that deeply into the the table code.
Thank you for any advice.
First, get all the table rows in the <tbody>
$rows = $xpath->query('//table[#class="xyz"]/tbody/tr');
Then, you can iterate over that collection and query for each <td>
foreach ($rows as $row) {
$cells = $row->getElementsByTagName('td');
// alt $cells = $xpath->query('td', $row)
$cellData = [];
foreach ($cells as $cell) {
$cellData[] = $cell->nodeValue;
}
var_dump($cellData);
}
Related
I'm trying to get multiple href's from a table like this
<table class="table table-bordered table-hover">
<thead>
<tr>
<th class="text-center">No</th>
<th>TITLE</th>
<th>DESCRIPTION</th>
<th class="text-center"><span class="glyphicon glyphicon-download-alt"></span></th>
</tr>
</thead>
<tbody>
<tr data-key="11e44c4ebff985d08ca5313231363233">
<td class="text-center" style="width: 50px;">181</td>
<td style="width:auto; white-space: normal;">Link 1</td>
<td style="width:auto; white-space: normal;">Lorem ipsum dolor 1</td>
<td class="text-center" style="width: 50px;"><img src="https://example.com/img/pdf.png" width="15" height="20" alt="myImage"></td>
</tr>
<tr data-key="11e44c4e4222d630bdd2313231323532">
<td class="text-center" style="width: 50px;">180</td>
<td style="width:auto; white-space: normal;">Link 2</td>
<td style="width:auto; white-space: normal;">Lorem ipsum dolor 2</td>
<td class="text-center" style="width: 50px;"><img src="https://example.com/img/pdf.png" width="15" height="20" alt="myImage"></td>
</tr>
</tbody>
</table>
i try PHP DOM like this
<?php
$html = file_get_contents('data2.html');
$htmlDom = new DOMDocument;
$htmlDom->preserveWhiteSpace = false;
$htmlDom->loadHTML($html);
$tables = $htmlDom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
echo #$cols->item(0)->nodeValue.'<br />';
echo #$cols->item(1)->nodeValue.'<br />';
echo trim($cols->item(1)->getElementsByTagName('a')->item(0)->getAttribute('href')).'<br />';
echo #$cols->item(2)->nodeValue.'<br />';
echo trim($cols->item(3)->getElementsByTagName('a')->item(0)->getAttribute('href')).'<br />';
}
?>
I get this error
Fatal error: Uncaught Error: Call to a member function getElementsByTagName() on null
getAttribute causes the error
Could someone help me out here please thanks
Your $rows are results of "all the <tr> within <table>". It not only caught the <tr> in the table body, it also caught that in your table head, which has no <td> in it. Hence when reading that row, $cols->item(0) and $cols->item(1) both got you NULL.
You should take the hint when your code didn't find ->nodeValue attribute in the items (hence you added the # sign to suppress the warning).
Try to change this:
$rows = $tables->item(0)->getElementsByTagName('tr');
into this:
$rows = $tables
->item(0)->getElementsByTagName('tbody')
->item(0)->getElementsByTagName('tr');
Now it is searching the <tr> within your <tbody> and should fix your issue with this particular HTML.
To have a more robust code, you should have checked the variables before acting on them. A type check or count check would be good.
As the previous access to the $cols array all have # to suppress the errors, this is the first one that complains.
A simple fix would be to just skip the rest of the code if no <td> elements are found (such as the header row)...
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
if ( count($cols) == 0 ) {
continue;
}
You could alternatively use XPath and only select <tr> tags which contain <td> tags.
I have this html page:
<div class="table_container p402_hide " id="div_Summer">
<table class=" stats_table" id="Summer">
<colgroup><col><col><col><col><col><col><col><col><col></colgroup>
<thead>
<tr class="">
<th data-stat="year" align="right" class=" sort_default_asc" >Year</th>
<th data-stat="city" align="left" class=" sort_default_asc" >City</th>
<th data-stat="country" align="left" class=" sort_default_asc" >Country</th>
<th data-stat="countries" align="right" class="" >Countries</th>
<th data-stat="participants" align="right" class="" >Participants</th>
<th data-stat="participants_men" align="right" class="" >Men</th>
<th data-stat="participants_women" align="right" class="" >Women</th>
<th data-stat="sports" align="right" class="" >Sports</th>
<th data-stat="events" align="right" class="" >Events</th>
</tr>
</thead>
<tbody>
<tr class="">
<td align="right" >2012</td>
<td align="left" csk="London:2012">London</td>
<td align="left" csk="Great Britain:2012">Great Britain</td>
<td align="right" >205</td>
<td align="right" >10,519</td>
<td align="right" >5,864</td>
<td align="right" >4,655</td>
<td align="right" >32</td>
<td align="right" >302</td>
</tr>
To extract the text I used this code written in PHP 7:
<?php
$html = file_get_contents('http://www.sports-reference.com/olympics/summer/');
error_reporting(E_ERROR | E_PARSE);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$result = $xpath->query('//div[#id="div_Summer"]');
var_dump($result->item(0)->nodeValue);
?>
In this way I get this result:
string(2148) "
Year
City
Country
Countries
Participants
Men
Women
Sports
Events
2012
London
Great Britain
205
10,519
5,864
4,655
32
302
"
I would like only this text: "2012" and "London". How could I extract this information from $result?
Have you tried to query the td(s) you're interested in directly?
Try using a more specific xpath expression, like this:
$result = $xpath->query('(//div[#id="div_Summer"]//tbody//tr//td[position() >= 1 and position() <= 2])');
And then processing them through a simple loop:
<?php
foreach ($result as $element) {
var_dump($element->nodeValue);
}
?>
Full example, based on your code:
<?php
$html = file_get_contents('http://www.sports-reference.com/olympics/summer/');
error_reporting(E_ERROR | E_PARSE);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$result = $xpath->query('(//div[#id="div_Summer"]//tbody//tr//td[position() >= 1 and position() <= 2])');
foreach ($result as $element) {
var_dump($element->nodeValue);
}
?>
Output (truncated):
string(4) "2012"
string(6) "London"
string(4) "2008"
string(7) "Beijing"
string(4) "2004"
[..]
I have only ever used things like contains() in my assertions, so I'm not sure how I'd go about something as complex as this.
Let's say I have an array of expected answers - in this case it's YES, YES, NO.
So that means effectively, for the first and second question I'd expect to see <span class="glyphicon glyphicon-ok"></span> inside the third <td> and for the third question I'd expect to see it inside the fourth <td>.
Here is my HTML code:
<table class="table table-curved">
<tr>
<th width="10%">Item</th>
<th width="60%">Description</th>
<th width="10%">YES</th>
<th width="10%">NO</th>
<th width="10%">NOT APPLICABLE</th>
</tr>
<tr>
<td class="report-table-inner report-centre">1</td>
<td class="report-table-inner">Check cargo is secure and undamaged.</td>
<td class="report-centre success"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
<td class="report-centre"></td>
</tr>
<tr>
<td class="report-table-inner report-centre">2</td>
<td class="report-table-inner">Is all cargo accounted for.</td>
<td class="report-centre success"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
<td class="report-centre"></td>
</tr>
<tr>
<td class="report-table-inner report-centre">3</td>
<td class="report-table-inner">Is all cargo checked by customs.</td>
<td class="report-centre"></td>
<td class="report-centre danger"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
</tr>
...
How should I go about writing a test for this? Is it hard to iterate through the <tr>'s programatically?
Thank you
I think you should look at the documentation page about Testing and DomCrawler component:
Testing
The DomCrawler Component
There are very simple methods which can filter html or xml content.
References :
http://symfony.com/doc/current/book/testing.html#your-first-functional-test
http://symfony.com/doc/current/components/dom_crawler.html#node-traversing
<?php
use Symfony\Bundle\FrameworkBundle\Test\WebTestCase;
class PageTest extends WebTestCase
{
public function testPage()
{
// create a client to get the content of the page
$client = static::createClient();
$crawler = $client->request('GET', '/page');
// retrieve table rows
$rows = $crawler->filter('.table-curved tr');
$statesColumnIndex = array(
// 0 indexed
'ok' => 2,
'ko' => 3,
'na' => 4,
);
$expectedValues = array(
// 0 indexed, row index => [$values]
1 => ['identifier' => 1, 'state' => 'ok'],
2 => ['identifier' => 2, 'state' => 'ok'],
3 => ['identifier' => 3, 'state' => 'ko'],
);
foreach ($expectedValues as $rowIndex => $values) {
// retrieve columns for row
$columns = $rows->eq($rowIndex)->filter('td');
// check item identifier
$identifierColumn = $columns->eq(0);
$this->assertEquals(
(string) $values['identifier'],
trim($identifierColumn->text())
);
// check state
$stateColumn = $columns->eq($statesColumnIndex[$values['state']]);
$this->assertEquals(1, $stateColumn->filter('.glyphicon-ok')->count());
}
}
}
Note that I don't Symfony at all, but here's an answer that uses pure PHP DOM; it needs $values as an array with either 'pass' (to skip this <tr>) or an index of which column should have the glyphicon-ok class on it:
<?php
$data = <<<DATA
<table class="table table-curved">
<tr>
<th width="10%">Item</th>
<th width="60%">Description</th>
<th width="10%">YES</th>
<th width="10%">NO</th>
<th width="10%">NOT APPLICABLE</th>
</tr>
<tr>
<td class="report-table-inner report-centre">1</td>
<td class="report-table-inner">Check cargo is secure and undamaged.</td>
<td class="report-centre success"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
<td class="report-centre"></td>
</tr>
<tr>
<td class="report-table-inner report-centre">2</td>
<td class="report-table-inner">Is all cargo accounted for.</td>
<td class="report-centre success"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
<td class="report-centre"></td>
</tr>
<tr>
<td class="report-table-inner report-centre">3</td>
<td class="report-table-inner">Is all cargo checked by customs.</td>
<td class="report-centre"></td>
<td class="report-centre danger"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
</tr>
</table>
DATA;
$dom = new DOMDocument();
$dom->loadXML($data);
$xpath = new DOMXPath($dom);
$values = ['pass', 2, 2, 3];
$idx = 0;
foreach($xpath->query('//tr') as $tr) {
if ($values[$idx] != 'pass') {
$tds = $tr->getElementsByTagName('td');
$td = $tds->item($values[$idx]);
if ($td instanceof DOMNode && $td->hasChildNodes()) {
if (FALSE !== strpos($td->firstChild->getAttribute('class'), 'glyphicon-ok')) {
echo "Matched on ", $tds->item(1)->textContent, "\n";
} else {
echo "Not matched on ", $tds->item(1)->textContent, "\n";
}
}
}
++$idx;
}
I use regex for HTML parsing but I need your help to parse the following table:
<table class="resultstable" width="100%" align="center">
<tr>
<th width="10">#</th>
<th width="10"></th>
<th width="100">External Volume</th>
</tr>
<tr class='odd'>
<td align="center">1</td>
<td align="left">
http://xyz.com
</td>
<td align="right">210,779,783<br />(939,265 / 499,584)</td>
</tr>
<tr class='even'>
<td align="center">2</td>
<td align="left">
http://abc.com
</td>
<td align="right">57,450,834<br />(288,915 / 62,935)</td>
</tr>
</table>
I want to get all domains with their volume(in array or var) for example
http://xyz.com - 210,779,783
Should I use regex or HTML dom in this case. I don't know how to parse large table, can you please help, thanks.
here's an XPath example that happens to parse the HTML from the question.
<?php
$dom = new DOMDocument();
$dom->loadHTMLFile("./input.html");
$xpath = new DOMXPath($dom);
$trs = $xpath->query("//table[#class='resultstable'][1]/tr");
foreach ($trs as $tr) {
$tdList = $xpath->query("td[2]/a", $tr);
if ($tdList->length == 0) continue;
$name = $tdList->item(0)->nodeValue;
$tdList = $xpath->query("td[3]", $tr);
$vol = $tdList->item(0)->childNodes->item(0)->nodeValue;
echo "name: {$name}, vol: {$vol}\n";
}
?>
I have a spinner and what happens is that whatever number is in the spinner, when the form is submitted, it should display the word "quest" as many times as the number in the spinner.. E.g if number in spinner is 3, then it will display "quest" 3 times in the table.
The problem is displaying it in the table.
At the moment with my current code it is displaying it like this:
quest
quest
quest
Question Id, Option Type, Duration .... These are table headings
It is displaying the words quest outside the table
Instead I want the word "quest" to be displayed in the Question Id column like this:
Question Id, Option Type, Duration...
quest
quest
quest
How can I get it to display it like the example above?
Below is code
<table border=1 id="qandatbl" align="center">
<tr>
<th class="col1">Question No</th>
<th class="col2">Option Type</th>
<th class="col1">Duration</th>
<th class="col2">Weight(%)</th>
<th class="col1">Answer</th>
<th class="col2">Video</th>
<th class="col1">Audio</th>
<th class="col2">Image</th>
</tr>
<?php
$spinnerCount = $_POST['txtQuestion'];
if($spinnerCount > 0) {
for($i = 1; $i <= $spinnerCount; $i++) {
echo "<tr>quest";
}
}
?>
<td class='qid'></td>
<td class="options"></td>
<td class="duration"></td>
<td class="weight"></td>
<td class="answer"></td>
<td class="video"></td>
<td class="audio"></td>
<td class="image"></td>
</tr>
</table>
I did try echo "<td class='qid'></td>"; but this completely failed as well
Try this:
<table border=1 id="qandatbl" align="center">
<tr>
<th class="col1">Question No</th>
<th class="col2">Option Type</th>
<th class="col1">Duration</th>
<th class="col2">Weight(%)</th>
<th class="col1">Answer</th>
<th class="col2">Video</th>
<th class="col1">Audio</th>
<th class="col2">Image</th>
</tr>
<?php
$spinnerCount = $_POST['txtQuestion'];
if($spinnerCount > 0) {
for($i = 1; $i <= $spinnerCount; $i++) {
?>
<tr>
<td class='qid'><?php echo $quest; ?></td>
<td class="options"></td>
<td class="duration"></td>
<td class="weight"></td>
<td class="answer"></td>
<td class="video"></td>
<td class="audio"></td>
<td class="image"></td>
</tr>
<?php
} // For
} // If
?>
</table>
Is this what you want to do? Display "quest" in the first column?
<table border=1 id="qandatbl" align="center">
<tr>
<th class="col1">Question No</th>
<th class="col2">Option Type</th>
<th class="col1">Duration</th>
<th class="col2">Weight(%)</th>
<th class="col1">Answer</th>
<th class="col2">Video</th>
<th class="col1">Audio</th>
<th class="col2">Image</th>
</tr>
<?php
$spinnerCount = $_POST['txtQuestion'];
if($spinnerCount > 0) {
for($i = 1; $i <= $spinnerCount; $i++) { ?>
<tr>
<td class='qid'>quest</td>
<td class="options"></td>
<td class="duration"></td>
<td class="weight"></td>
<td class="answer"></td>
<td class="video"></td>
<td class="audio"></td>
<td class="image"></td>
</tr>
<?php
}
}
?></table>
?>