Symfony2 functional test - check table contents - php

I have only ever used things like contains() in my assertions, so I'm not sure how I'd go about something as complex as this.
Let's say I have an array of expected answers - in this case it's YES, YES, NO.
So that means effectively, for the first and second question I'd expect to see <span class="glyphicon glyphicon-ok"></span> inside the third <td> and for the third question I'd expect to see it inside the fourth <td>.
Here is my HTML code:
<table class="table table-curved">
<tr>
<th width="10%">Item</th>
<th width="60%">Description</th>
<th width="10%">YES</th>
<th width="10%">NO</th>
<th width="10%">NOT APPLICABLE</th>
</tr>
<tr>
<td class="report-table-inner report-centre">1</td>
<td class="report-table-inner">Check cargo is secure and undamaged.</td>
<td class="report-centre success"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
<td class="report-centre"></td>
</tr>
<tr>
<td class="report-table-inner report-centre">2</td>
<td class="report-table-inner">Is all cargo accounted for.</td>
<td class="report-centre success"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
<td class="report-centre"></td>
</tr>
<tr>
<td class="report-table-inner report-centre">3</td>
<td class="report-table-inner">Is all cargo checked by customs.</td>
<td class="report-centre"></td>
<td class="report-centre danger"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
</tr>
...
How should I go about writing a test for this? Is it hard to iterate through the <tr>'s programatically?
Thank you

I think you should look at the documentation page about Testing and DomCrawler component:
Testing
The DomCrawler Component
There are very simple methods which can filter html or xml content.

References :
http://symfony.com/doc/current/book/testing.html#your-first-functional-test
http://symfony.com/doc/current/components/dom_crawler.html#node-traversing
<?php
use Symfony\Bundle\FrameworkBundle\Test\WebTestCase;
class PageTest extends WebTestCase
{
public function testPage()
{
// create a client to get the content of the page
$client = static::createClient();
$crawler = $client->request('GET', '/page');
// retrieve table rows
$rows = $crawler->filter('.table-curved tr');
$statesColumnIndex = array(
// 0 indexed
'ok' => 2,
'ko' => 3,
'na' => 4,
);
$expectedValues = array(
// 0 indexed, row index => [$values]
1 => ['identifier' => 1, 'state' => 'ok'],
2 => ['identifier' => 2, 'state' => 'ok'],
3 => ['identifier' => 3, 'state' => 'ko'],
);
foreach ($expectedValues as $rowIndex => $values) {
// retrieve columns for row
$columns = $rows->eq($rowIndex)->filter('td');
// check item identifier
$identifierColumn = $columns->eq(0);
$this->assertEquals(
(string) $values['identifier'],
trim($identifierColumn->text())
);
// check state
$stateColumn = $columns->eq($statesColumnIndex[$values['state']]);
$this->assertEquals(1, $stateColumn->filter('.glyphicon-ok')->count());
}
}
}

Note that I don't Symfony at all, but here's an answer that uses pure PHP DOM; it needs $values as an array with either 'pass' (to skip this <tr>) or an index of which column should have the glyphicon-ok class on it:
<?php
$data = <<<DATA
<table class="table table-curved">
<tr>
<th width="10%">Item</th>
<th width="60%">Description</th>
<th width="10%">YES</th>
<th width="10%">NO</th>
<th width="10%">NOT APPLICABLE</th>
</tr>
<tr>
<td class="report-table-inner report-centre">1</td>
<td class="report-table-inner">Check cargo is secure and undamaged.</td>
<td class="report-centre success"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
<td class="report-centre"></td>
</tr>
<tr>
<td class="report-table-inner report-centre">2</td>
<td class="report-table-inner">Is all cargo accounted for.</td>
<td class="report-centre success"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
<td class="report-centre"></td>
</tr>
<tr>
<td class="report-table-inner report-centre">3</td>
<td class="report-table-inner">Is all cargo checked by customs.</td>
<td class="report-centre"></td>
<td class="report-centre danger"><span class="glyphicon glyphicon-ok"></span></td>
<td class="report-centre"></td>
</tr>
</table>
DATA;
$dom = new DOMDocument();
$dom->loadXML($data);
$xpath = new DOMXPath($dom);
$values = ['pass', 2, 2, 3];
$idx = 0;
foreach($xpath->query('//tr') as $tr) {
if ($values[$idx] != 'pass') {
$tds = $tr->getElementsByTagName('td');
$td = $tds->item($values[$idx]);
if ($td instanceof DOMNode && $td->hasChildNodes()) {
if (FALSE !== strpos($td->firstChild->getAttribute('class'), 'glyphicon-ok')) {
echo "Matched on ", $tds->item(1)->textContent, "\n";
} else {
echo "Not matched on ", $tds->item(1)->textContent, "\n";
}
}
}
++$idx;
}

Related

Extracting Site data through Web Crawler outputs error due to mis-match of Array Index

I been trying to extract site table text along with its link from the given table to (which is in site1.com) to my php page using a web crawler.
But unfortunately, due to incorrect input of Array index in the php code, it came error as output.
site1.com
<table border="0" cellpadding="0" cellspacing="0" width="100%" class="Table2">
<tbody><tr>
<td width="1%" valign="top" class="Title2"> </td>
<td width="65%" valign="top" class="Title2">Subject</td>
<td width="1%" valign="top" class="Title2"> </td>
<td width="14%" valign="top" align="Center" class="Title2">Last Update</td>
<td width="1%" valign="top" class="Title2"> </td>
<td width="8%" valign="top" align="Center" class="Title2">Replies</td>
<td width="1%" valign="top" class="Title2"> </td>
<td width="9%" valign="top" align="Center" class="Title2">Views</td>
</tr>
<tr>
<td width="1%" height="25"> </td>
<td width="64%" height="25" class="FootNotes2">Serious dedicated study partner for U World - step12013</td>
<td width="1%" height="25"> </td>
<td width="14%" height="25" class="FootNotes2" align="center">02/11/17 01:50</td>
<td width="1%" height="25"> </td>
<td width="8%" height="25" align="Center" class="FootNotes2">10</td>
<td width="1%" height="25"> </td>
<td width="9%" height="25" align="Center" class="FootNotes2">318</td>
</tr>
</tbody>
</table>
The php. web crawler as ::
<?php
function get_data($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_close($ch);
return $result;
}
$returned_content = get_data('http://www.usmleforum.com/forum/index.php?forum=1');
$first_step = explode( '<table class="Table2">' , $returned_content );
$second_step = explode('</table>', $first_step[0]);
$third_step = explode('<tr>', $second_step[1]);
// print_r($third_step);
foreach ($third_step as $key=>$element) {
$child_first = explode( '<td class="FootNotes2"' , $element );
$child_second = explode( '</td>' , $child_first[1] );
$child_third = explode( '<a href=' , $child_second[0] );
$child_fourth = explode( '</a>' , $child_third[0] );
$final = "<a href=".$child_fourth[0]."</a></br>";
?>
<li target="_blank" class="itemtitle">
<?php echo $final?>
</li>
<?php
if($key==10){
break;
}
}
?>
Now the Array Index on the above php code can be the culprit. (i guess)
If so, can some one please explain me how to make this work.
But what my final requirement from this code is::
to get the above text in second with a link associated to it.
Any help is Appreciated..
Instead of writing your own parser solution you could use an existing one like Symfony's DomCrawler component: http://symfony.com/doc/current/components/dom_crawler.html
$crawler = new Crawler($returned_content);
$linkTexts = $crawler->filterXPath('//a')->each(function (Crawler $node, $i) {
return $node->text();
});
Or if you want to traverse the DOM tree yourself you can use DOMDocument's loadHTML
http://php.net/manual/en/domdocument.loadhtml.php
$document = new DOMDocument();
$document->loadHTML($returned_content);
foreach ($document->getElementsByTagName('a') as $link) {
$text = $link->nodeValue;
}
EDIT:
To get the links you want, the code assumes you have a $returned_content variable with the HTML you want to parse.
// creating a new instance of DOMDocument (DOM = Document Object Model)
$domDocument = new DOMDocument();
// save previous libxml error reporting and set error reporting to internal
// to be able to parse not well formed HTML doc
$previousErrorReporting = libxml_use_internal_errors(true);
$domDocument->loadHTML($returned_content);
libxml_use_internal_errors($previousErrorReporting);
$links = [];
/** #var DOMElement $node */
// getting all <a> element from the HTML
foreach ($domDocument->getElementsByTagName('a') as $node) {
$parentNode = $node->parentNode;
// checking if the <a> is under a <td> that has class="FootNotes2"
$isChildOfAFootNotesTd = $parentNode->nodeName === 'td' && $parentNode->getAttribute('class') === 'FootNotes2';
// checking if the <a> has class="Links2"
$isLinkOfLink2Class = $node->getAttribute('class') == 'Links2';
// as I assumed you wanted links from the <td> this check makes sure that both of the above conditions are fulfilled
if ($isChildOfAFootNotesTd && $isLinkOfLink2Class) {
$links[] = [
'href' => $node->getAttribute('href'),
'text' => $parentNode->textContent,
];
}
}
print_r($links);
This will create you an array similar to:
Array
(
[0] => Array
(
[href] => /files/forum/2017/1/837242.php
[text] => Q#Q Drill Time ① - cardio69
)
[1] => Array
(
[href] => /files/forum/2017/1/837356.php
[text] => study partner in Houston - lacy
)
[2] => Array
(
[href] => /files/forum/2017/1/837110.php
[text] => Serious dedicated study partner for U World - step12013
)
...
Using the Simple HTML DOM Parser library, you can use the following code:
<?php
require('simple_html_dom.php'); // you might need to change this, depending on where you saved the library file.
$html = file_get_html('http://www.usmleforum.com/forum/index.php?forum=1');
foreach($html->find('td.FootNotes2 a') as $element) { // find all <a>-elements inside a <td class="FootNotes2">-element
$element->href = "http://www.usmleforum.com" . $element->href; // you can also access only certain attributes of the elements (e.g. the url).
echo $element.'</br>'; // do something with the elements.
}
?>
I tried the same code for another site. and it works.
Please take a look at it:
<?php
function get_data($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_close($ch);
return $result;
}
$returned_content = get_data('http://www.usmle-forums.com/usmle-step-1-forum/');
$first_step = explode( '<tbody id="threadbits_forum_26">' , $returned_content );
$second_step = explode('</tbody>', $first_step[1]);
$third_step = explode('<tr>', $second_step[0]);
// print_r($third_step);
foreach ($third_step as $element) {
$child_first = explode( '<td class="alt1"' , $element );
$child_second = explode( '</td>' , $child_first[1] );
$child_third = explode( '<a href=' , $child_second[0] );
$child_fourth = explode( '</a>' , $child_third[1] );
echo $final = "<a href=".$child_fourth[0]."</a></br>";
}
?>
I know its too much to ask, but can you please make a code out of these two which make the crawler work.
#jkmak
Chopping at html with string functions or regex is not a reliable method. DomDocument and Xpath do a nice job.
Code: (Demo)
$dom=new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate("//td[#class = 'FootNotes2']/a") as $node) { // target a tags that have <td class="FootNotes2"> as parent
$result[]=['href' => $node->getAttribute('href'), 'text' => $node->nodeValue]; // extract/store the href and text values
if (sizeof($result) == 10) { break; } // set a limit of 10 rows of data
}
if (isset($result)) {
echo "<ul>\n";
foreach ($result as $data) {
echo "\t<li class=\"itemtitle\">{$data['text']}</li>\n";
}
echo "</ul>";
}
Sample Input:
$html = <<<HTML
<table border="0" cellpadding="0" cellspacing="0" width="100%" class="Table2">
<tbody><tr>
<td width="1%" valign="top" class="Title2"> </td>
<td width="65%" valign="top" class="Title2">Subject</td>
<td width="1%" valign="top" class="Title2"> </td>
<td width="14%" valign="top" align="Center" class="Title2">Last Update</td>
<td width="1%" valign="top" class="Title2"> </td>
<td width="8%" valign="top" align="Center" class="Title2">Replies</td>
<td width="1%" valign="top" class="Title2"> </td>
<td width="9%" valign="top" align="Center" class="Title2">Views</td>
</tr>
<tr>
<td width="1%" height="25"> </td>
<td width="64%" height="25" class="FootNotes2">Serious dedicated study partner for U World - step12013</td>
<td width="1%" height="25"> </td>
<td width="14%" height="25" class="FootNotes2" align="center">02/11/17 01:50</td>
<td width="1%" height="25"> </td>
<td width="8%" height="25" align="Center" class="FootNotes2">10</td>
<td width="1%" height="25"> </td>
<td width="9%" height="25" align="Center" class="FootNotes2">318</td>
</tr>
<tr>
<td width="1%" height="25"> </td>
<td width="64%" height="25" class="FootNotes2">some text - step12013</td>
<td width="1%" height="25"> </td>
<td width="14%" height="25" class="FootNotes2" align="center">02/11/17 01:50</td>
<td width="1%" height="25"> </td>
<td width="8%" height="25" align="Center" class="FootNotes2">10</td>
<td width="1%" height="25"> </td>
<td width="9%" height="25" align="Center" class="FootNotes2">318</td>
</tr>
</tbody>
</table>
HTML;
Output:
<ul>
<li class="itemtitle">Serious dedicated study partner for U World</li>
<li class="itemtitle">some text</li>
</ul>

PHP XPath to parse table

Firstly here is my table HTML:
<table class="xyz">
<caption>Outcomes</caption>
<thead>
<tr class="head">
<th title="a" class="left" nowrap="nowrap">A1</th>
<th title="a" class="left" nowrap="nowrap">A2</th>
<th title="result" class="left" nowrap="nowrap">Result</th>
<th title="margin" class="left" nowrap="nowrap">Margin</th>
<th title="area" class="left" nowrap="nowrap">Area</th>
<th title="date" nowrap="nowrap">Date</th>
<th title="link" nowrap="nowrap">Link</th>
</tr>
</thead>
<tbody>
<tr class="data1">
<td class="left" nowrap="nowrap">56546</td>
<td class="left" nowrap="nowrap">75666</td>
<td class="left" nowrap="nowrap">Lower</td>
<td class="left" nowrap="nowrap">High</td>
<td class="left">Area 3</td>
<td nowrap="nowrap">Jan 2 2016</td>
<td nowrap="nowrap">http://localhost/545436</td>
</tr>
<tr class="data1">
<td class="left" nowrap="nowrap">55546</td>
<td class="left" nowrap="nowrap">71666</td>
<td class="left" nowrap="nowrap">Lower</td>
<td class="left" nowrap="nowrap">High</td>
<td class="left">Area 4</td>
<td nowrap="nowrap">Jan 3 2016</td>
<td nowrap="nowrap">http://localhost/545437</td>
</tr>
...
And there are many more <tr> after that.
I am using this PHP code:
$html = file_get_contents('http://localhost/outcomes');
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXPath($document);
$xpath->registerNamespace('', 'http://www.w3.org/1999/xhtml');
$elements = $xpath->query("//table[#class='xyz']");
How can I, now that I have the table as the first element in $elements, get the values of each <td>?
Ideally I want to get arrays like:
array(56546, 75666, 'Lower', 'High', 'Area 3', 'Jan 2 2016', 'http://localhost/545436'),
array(55546, 71666, 'Lower', 'High', 'Area 4', 'Jan 3 2016', 'http://localhost/545437'),
...
But I'm not sure how I can dig that deeply into the the table code.
Thank you for any advice.
First, get all the table rows in the <tbody>
$rows = $xpath->query('//table[#class="xyz"]/tbody/tr');
Then, you can iterate over that collection and query for each <td>
foreach ($rows as $row) {
$cells = $row->getElementsByTagName('td');
// alt $cells = $xpath->query('td', $row)
$cellData = [];
foreach ($cells as $cell) {
$cellData[] = $cell->nodeValue;
}
var_dump($cellData);
}

redirect to error page in yii2

i'm still new in yii2.
so, make a simple project which use MVC in yii2 where the project is to output examination result.
what bugging me is, how to redirect to main page as no data found in database?
another one, i get Undefined offset: 0 which i googled say mismatch array and data not NULL.
anyhow, here are those code :
controller : StudentController.php
public function actionCall()
{
$result = $_POST['semester'];
$result_explode = explode('|', $result);
$sem = $result_explode[0];
$tahun = $result_explode[0]." ".$result_explode[1];
$send = array(
'id' => $_POST['id'],
'semester' => $sem ,
'tahun' => $tahun);
$model = new Student();
if(!$data['result']= $model->getDetails($send))
{
return $this->render('detail', $data);
}
else
{
return $this->render('detail');
}
}
public function actionSearch()
{
return $this->render('searchstudent2');
}
model : Student.php
public function getDetails($send)
{
$student = student::find()
->select('s.student_name,al.level_matric_no,al.level_semester,al.level_id,s.student_mykad, s.student_address,s.student_postcode,
s.student_state,ss.subject_code,ss.subject_name,ss.subject_credit_hour,c.course_name,st.taken_session,
g.Grade_symbol,g.Grade_value,sr.semester_gpa,sr.semester_cgpa,sr.total_point,
sr.total_credit, sr.semester_count')
->from('student AS s')
->leftJoin('a_level AS al', '`s`.`student_id` = `al`.`student_id`')
->leftJoin('subject_taken AS st', '`al`.`level_id` = `st`.`level_id`')
->leftJoin('semester_result AS sr', '`al`.`level_id` = `sr`.`level_id`')
->leftJoin('grade AS g', '`g`.`grade_id` = `st`.`grade_id`')
->leftJoin('course AS c', '`al`.`level_course_offered` = `c`.`course_id`')
->leftJoin('subject AS ss', '`ss`.`subject_id` = `st`.`subject_id`')
->where(['al.level_id'=>$send['id']])
->andWhere(['sr.semester_count'=>$send['semester']])
->andWhere(['st.taken_session'=>$send['tahun']])
->asArray()
->all();
return $student;
}
view : searchstudent2.php
<div class="container">
<div class="row">
<h2>Stylish Search Box</h2>
<div id="custom-search-input">
<?php $form = ActiveForm::begin(['action' => Url::to(['student/call']),'options' => ['method' => 'post']]) ?>
<div class="input-group col-md-12">
<select name="semester">
<option value="1|2012/2013">sem1</option>
<option value="2|2013/2014">sem2</option>
<option value="3|2014/2015">sem3</option>
<option value="4|2015/2016">sem4</option>
</select>
<br><br>
<input type="text" name="id" class="search-query form-control" placeholder="Search" />
<span class="input-group-btn">
<button class="btn btn-danger" type="submit">
<span class=" glyphicon glyphicon-search"></span>
</button>
</span>
</div>
<?php ActiveForm::end() ?>
</div>
</div>
detail.php
table 1 in detail.php
<table style="width:100%">
<tr>
<th class="tg-yw4l">Nama</th>
<th class="tg-baqh" colspan="7"><?php echo $result[0]['student_name']; ?></th>
</tr>
<tr>
<td class="tg-yw4l">Alamat</td>
<td class="tg-baqh"><?php echo $result[0]['student_address']; echo " ". $result[0]['student_postcode']; echo " ".$result[0]['student_state'];?></td>
<td class="tg-yw4l" rowspan="4"></td>
<td class="tg-yw4l">Kemasukan</td>
<td class="tg-baqh"></td>
<td class="tg-yw4l" rowspan="4"></td>
<td class="tg-yw4l" colspan="2"></td>
</tr>
<tr>
<td class="tg-yw4l">No. KP</td>
<td class="tg-baqh"><?php echo $result[0]['student_mykad']; ?></td>
<td class="tg-yw4l">Sesi</td>
<td class="tg-baqh"><?php echo $result[0]['taken_session']; ?></td>
<td class="tg-yw4l">Tahun Akademik</td>
<td class="tg-yw4l"><?php echo $result[0]['taken_session']; ?></td>
</tr>
<tr>
<td class="tg-yw4l">No. Matrik</td>
<td class="tg-baqh"><?php echo $result[0]['level_matric_no']; ?></td>
<td class="tg-yw4l">Fakulti</td>
<td class="tg-baqh"></td>
<td class="tg-yw4l" colspan="2" rowspan="2"></td>
</tr>
<tr>
<td class="tg-yw4l">Program</td>
<td class="tg-baqh"><?php echo $result[0]['course_name']; ?></td>
<td class="tg-yw4l">Semester</td>
<td class="tg-baqh"></td>
</tr>
<tr>
<td class="tg-yw4l">Pinjaman</td>
<td class="tg-baqh" colspan="7"></td>
</tr>
table 2 in detail.php
<tr>
<th class="tg-031e">BIL</th>
<th class="tg-031e">KOD</th>
<th class="tg-031e">SUBJEK</th>
<th class="tg-yw4l">KREDIT</th>
<th class="tg-yw4l">GRED</th>
<th class="tg-yw4l">MATA</th>
<!-- <th class="tg-yw4l">GPA/CGPA</th> -->
</tr>
<?php
$bil=0;
foreach ($result as $details) {
$bil++;
?>
<tr>
<td class="tg-031e"><?=$bil?></td>
<td class="tg-031e"><?php echo $details['subject_code']; ?></td>
<td class="tg-031e"><?php echo $details['subject_name']; ?></td>
<td class="tg-yw4l"><?php echo $details['subject_credit_hour']; ?></td>
<td class="tg-yw4l"><?php echo $details['Grade_symbol']; ?></td>
<td class="tg-yw4l"><?php echo $details['Grade_value']; ?></td>
</tr>
<?php } ?>
<tr>
<td class="tg-031e" colspan="2"></td>
<td class="tg-031e">TOTAL KREDIT</td>
<td class="tg-yw4l"><?php echo $details['total_point']; ?></td>
<td class="tg-yw4l">JUMLAH JAM KREDIT</td>
<td class="tg-yw4l" colspan="3"><?php echo $details['total_credit']; ?></td>
</tr>
Try the following steps one by one
check $_POST['semester'] and make sure it is not empty
check $model->getDetails($send) returns any result or not.
Note: find()-> ...asArray()->all() returns an empty array if no result found. so check for empty result before for render view.
if( !empty($data=$model->getDetails($send)) )
{
return $this->render('detail', ['viewData'=>$data]);
}
else
{
return $this->render('_another_view');
// or redirect to some page or do whatever
}
Note: if you send empty $data array to the view and if you use $data[0]['anything']
you will get Undefined offset: 0 error.
so if result is empty then render another view. in your case you are rendering same view 'detail' one with $data another without $data.
So there is a high chance to get Undefined offset: 0 error.
Thanks
If getDetails functions doesn't return data, redirect to main page:
if ( ($data = $model->getDetails($send)) != null ) {
return $this->render('detail', ['result' => $data]);
} else {
// redirect no data
return $this->redirect(['site/index']);
}

Fetching td data from a table using regexp, table is in a variable [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 8 years ago.
I have html of a webpage in a php string, that has, among other things, a table. The table has many rows, out of which I want to extract information about two rows, class.day and class.odd. Each of them are getting repeated multiple times.
I want to have the information in an array out of this, something like (or maybe in an associative array) :
$array1 = array(17=>'', 18=>'', 19=>150, 20=>145, 21=>175)
$array2 = array(17=>'', 18=>'', 19=>90, 20=>75, 21=>120)
$array3 = array(...)
$array4 = array(...)
I was wondering how can I achieve that, can anyone help with any suggestion ?
Thanks a lot.
<table class="matrix">
<tr ...></tr>
<tr ...></tr>
<tr class="day">
<td class="fill_row" colspan="2"></td>
<td class="past">17</td>
<td class="past">18</td>
<td>19</td>
<td>20</td>
<td>21</td>
</tr>
<tr ...></tr>
<tr ...></tr>
<tr class="odd">
<td class="fill_row" colspan="2"></td>
<td class="past"> </td>
<td class="past"> </td>
<td>150</td>
<td>145</td>
<td>175</td>
</tr>
<tr ...></tr>
<tr ...></tr>
<tr class="day">
<td class="fill_row" colspan="2"></td>
<td class="past">17</td>
<td class="past">18</td>
<td>19</td>
<td>20</td>
<td>21</td>
</tr>
<tr ...></tr>
<tr ...></tr>
<tr class="odd">
<td class="fill_row" colspan="2"></td>
<td class="past"> </td>
<td class="past"> </td>
<td>90</td>
<td>75</td>
<td>120</td>
</tr>
</table>
its not perfect but it works ^^
<?php
$strRegEx = '#<tr class="(day|odd)">.{1,10}<td .{28}>([0-9]*)</td>.{1,10}<td .{12}>(.{1,5})</td>.{1,10}<td .{12}>(.{1,5})</td>.{1,10}<td>([0-9]*)</td>.{1,10}<td>([0-9]*)</td>.{1,10}<td>([0-9]*)</td>.{1,10}</tr>#s';
$regEx = preg_match_all($strRegEx , $strYourHtml, $arrTable);
if ($regEx) {
$arrResults = array();
foreach($arrTable[1] as $strKey => $arrResult){
$arrResults[$strKey]["name"] = $arrResult;
$arrResults[$strKey]["value_1"] = $arrTable[2][$strKey];
$arrResults[$strKey]["value_2"] = $arrTable[3][$strKey];
$arrResults[$strKey]["value_3"] = $arrTable[4][$strKey];
$arrResults[$strKey]["value_4"] = $arrTable[5][$strKey];
$arrResults[$strKey]["value_5"] = $arrTable[6][$strKey];
$arrResults[$strKey]["value_6"] = $arrTable[7][$strKey];
}
} else {
$arrResults = false;
}
print_r($arrResults);
?>

Parsing Wikipedia Page tables issue

Hi I'm trying to parse a Wikipedia document in which there is a table called "infobox biota" with this structure. I'm trying to get the following table data and classes of the following characteristics
Kingdom:
Phylum:
Subphylum:
Class:
Order:
Family:
<table class="infobox biota" style="text-align: left; width: 200px; font-size: 100%">
<tbody><tr>
<th colspan="2" style="text-align: center; background-color: rgb(211,211,164)">Rabbit</th>
</tr>
<tr>
<td colspan="2" style="text-align: center"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/3/3b/Rabbit_in_montana.jpg/250px-Rabbit_in_montana.jpg" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/3/3b/Rabbit_in_montana.jpg/375px-Rabbit_in_montana.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/3b/Rabbit_in_montana.jpg/500px-Rabbit_in_montana.jpg 2x" height="222" width="250"></td>
</tr>
<tr>
<th colspan="2" style="text-align: center; background-color: rgb(211,211,164)">Scientific classification</th>
</tr>
<tr>
<td>Kingdom:</td>
<td><span class="kingdom" style="white-space:nowrap;">Animalia</span></td>
</tr>
<tr>
<td>Phylum:</td>
<td><span class="phylum" style="white-space:nowrap;">Chordata</span></td>
</tr>
<tr>
<td>Subphylum:</td>
<td><span class="subphylum" style="white-space:nowrap;">Vertebrata</span></td>
</tr>
<tr>
<td>Class:</td>
<td><span class="class" style="white-space:nowrap;">Mammalia</span></td>
</tr>
<tr>
<td>Order:</td>
<td><span class="order" style="white-space:nowrap;">Lagomorpha</span></td>
</tr>
<tr>
<td>Family:</td>
<td><span class="family" style="white-space:nowrap;">Leporidae<br>
<small>in part</small></span></td>
</tr>
<tr>
<th colspan="2" style="text-align: center; background-color: rgb(211,211,164)">Genera</th>
</tr>
<tr>
<td colspan="2" style="text-align: left">
<div>
<table style="background-color:transparent;table-layout:fixed;" border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody><tr valign="top">
<td>
<div style="margin-right:20px;">
<p><i>Pentalagus</i><br>
<i>Bunolagus</i><br>
<i>Nesolagus</i><br>
<i>Romerolagus</i></p>
</div>
</td>
<td>
<div style="margin-right: 20px;">
<p><i>Brachylagus</i><br>
<i>Sylvilagus</i><br>
<i>Oryctolagus</i><br>
<i>Poelagus</i></p>
</div>
</td>
</tr>
</tbody></table>
</div>
</td>
</tr>
</tbody></table>
Here is my attempt to parse and obtain the kingdom,phylum,subphylum,class,order and family of a rabbit with the table structure. However I get a the following Array ( [Kingdom:] => [Phylum:] => [Subphylum:] => [Class:] => [Order:] => [Family:] => [
Pentalagus
Bunolagus
Nesolagus
Romerolagus
] => )
it doesnt fill in the array with the data for the rabbit. also it give me a parse error in the line shown below, what can be wrong?
<?php
//require"mydb.php";
header('Content-type: text/html; charset=utf-8'); // this just makes sure encoding is right
include('simple_html_dom.php'); // the parser library
$html = file_get_html('http://en.wikipedia.org/wiki/Rabbit');
$table = $html->find('table.infobox');
$data = array();
foreach($table[0]->find('tr') as $row)
{
$td = $row->find('> td');
if (count($td) == 2)
{
$name = $td[0]->innertext;
$text = $td[1]->find('a')[0]->innertext; //PARSE ERROR IS GIVEN HERE, after the find('a')[0], taking off the array takes away the error but just me no results
$data[$name] = $text;
}
}
print_r($data);
?>
$text = $td[1]->find('a')[0]->innertext;
In this line you are dereferencing a function. This is only available in PHP 5.4 or later. Try this instead:
$td = $td[1]->find('a');
$text = $td[0]->innertext;

Categories