I am getting an php notice when using simple html dom to scrape a website. There are 2 notices displayed and everything rendered underneath looks perfect when using the print_r function to display it.
The website table structure is as follows:
<table class=data schedTbl>
<thead>
<tr>
<th>DATA</th>
<th>DATA</th>
<th>DATA</th>
etc....
</tr>
</thead>
<tbody>
<tr>
<td>
<div class="class1">DATA</div>
<div class="class2">SAME DATA AS PREVIOUS DIV</div>
</td>
<td>DATA</td>
<td>DATA</td>
etc....
</tr>
<tr>
<td>
<div class="class1">DATA</div>
<div class="class2">SAME DATA AS PREVIOUS DIV</div>
</td>
<td>DATA</td>
<td>DATA</td>
etc....
</tr>
<tr>
<td>
<div class="class1">DATA</div>
<div class="class2">SAME DATA AS PREVIOUS DIV</div>
</td>
<td>DATA</td>
<td>DATA</td>
etc....
</tr>
etc....
</tbody>
</table>
The code below is used to find all tr in table[class=data schedTbl]. I have a tbody selector in there, but it seems to pay no attention to this selector as it still selects the tr in the thead.
include('simple_html_dom.php');
$articles = array();
getArticles('www.somesite.com');
function getArticles($page) {
global $articles;
$html = new simple_html_dom();
$html->load_file($page);
$items = $html->find('table[class=data schedTbl] tbody tr');
foreach($items as $post) {
$articles[] = array($post->children(0)->first_child(0)->plaintext,//0 -- GAME DATE
$post->children(1)->plaintext,//1 -- AWAY TEAM
$post->children(2)->plaintext);//2 -- HOME TEAM
}
}
So, I believe notices come from the tr in the thead because I am calling on the first child of the first td which only has one record. The reason for two is there is actually two tables with the same data structure in the body.
Again, I believe there are 2 ways of solving this:
1) PROBABLY THE EASIEST (fix the find selector so the TBODY works and only selects the tds within the tbodies)
2) Figure out a way to not do the first_child filter when it is not needed?
Please let me know if you would like a snapshot of the print_r($articles) output I am receiving.
Thanks in advance for any help provided!
Sincerely,
Bill C.
Just comment out line #695 in the simple_html_dom.php
if ($m[1]==='tbody') continue;
Then it should read the tbody.
Related
im trying to add table footer(<tfoot></tfoot>) using mPDF to every bottom page (last row).
but it only appear on the last page. I have repeated but not . Example code below :
View
<table>
<th>Name</th>
<th>Position</th>
<tbody>
#foreach($datas as $data)
<tr>
<td> $data->name </td>
<td> $data->position</td>
</tr>
#endforeach
</tbody>
//i want to this to be on every page
<tfoot> Footer </foot>
</table>
Controller
$htmlString = \View::make('test_pdf',['datas'=>$datas]);
$htmlString = $htmlString->render();
$mpdf->SetFooter('Report_Name' | |{PAGENO}');
$mpdf->WriteHTML($htmlString);
return $mpdf->Output('test'.pdf','D');
Thank you in advanced!
I'm trying to achieve a table with expandable rows.
This is the code i have so far.
<table id="something" class="table table-responsive table-condensed table-striped">
<thead>
<tr>
<th>Name</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<?php
foreach ($info as $var) {
?>
<tr data-toggle="collapse" data-target="#accordion<?php echo $var['id'] ?>" class="clickable">
<td><h4><?php echo $name ?> </h4></td>
<td class="<?php echo $colors[array_rand($colors)] ?>">Status</td>
</tr>
<tr>
<td colspan="2">
<div id="accordion<?php echo $var['id'] ?>" class = "collapse">
</div>
</td>
</tr>
<?php
}
?>
</tbody>
The table loads without any problem.
My goal is to sort either by Status or Name, and use a few more options of DataTables.
As soon as I load DataTables I get the following error: Uncaught TypeError: Cannot set property '_DT_CellIndex' of undefined
Any idea what can be causing this?
Thanks in advance.
I had this same thing and none of the ideas above fixed it for me. This one was a bit of head scratcher until I got it figured out.
My table has only two columns and I'm using a <tr><td colspan="2"><td></tr> from time to time. The table is syntactically correct. I'm using the colspans to separate logical sections of data.
The error comes from the fact that I'm using HTML5 data attributes on the <td> tags. Specifically the data-order attribute. In this case Datatabes does not support colspans on <td> tags.
To get this to work properly for me I replaced the <td colspan="2"> tag with two <td> tags that included the data-order attributes. I used the same attribute value for the row coming next so that the order doesn't get wonky when the order is changed.
Hope that helps someone!
I don't know why yet, but it seems to be related to the "table-stiped" class. Solution, right after your javascript code to "convert" your table to a DataTable you can add the class "table-striped" and that will work, like this:
$('#something').addClass("table-striped");
I am trying to get the text of child elements using the PHP DOM.
Specifically, I am trying to get only the first <a> tag within every <tr>.
The HTML is like this...
<table>
<tbody>
<tr>
<td>
1st Link
</td>
<td>
2nd Link
</td>
<td>
3rd Link
</td>
</tr>
<tr>
<td>
1st Link
</td>
<td>
2nd Link
</td>
<td>
3rd Link
</td>
</tr>
</tbody>
</table>
My sad attempt at it involved using foreach() loops, but would only return Array() when doing a print_r() on the $aVal.
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(returnURLData($url));
libxml_use_internal_errors(false);
$tables = $dom->getElementsByTagName('table');
$aVal = array();
foreach ($tables as $table) {
foreach ($table as $tr){
$trVal = $tr->getElementsByTagName('tr');
foreach ($trVal as $td){
$tdVal = $td->getElementsByTagName('td');
foreach($tdVal as $a){
$aVal[] = $a->getElementsByTagName('a')->nodeValue;
}
}
}
}
Am I on the right track or am I completely off?
Put this code in test.php
require 'simple_html_dom.php';
$html = file_get_html('test1.php');
foreach($html->find('table tr') as $element)
{
foreach($element->find('a',0) as $element)
{
echo $element->plaintext;
}
}
and put your html code in test1.php
<table>
<tbody>
<tr>
<td>
1st Link
</td>
<td>
2nd Link
</td>
<td>
3rd Link
</td>
</tr>
<tr>
<td>
1st Link
</td>
<td>
2nd Link
</td>
<td>
3rd Link
</td>
</tr>
</tbody>
</table>
I am pretty sure I am late, but better way should be to iterate through all "tr" with getElementByTagName and then while iterating through each node in nodelist recieved use getElementByTagName"a". Now no need to iterate through nodeList point out the first element recieved by item(0). That's it! Another way can be to use xPath.
I personally don't like SimpleHtmlDom because of the loads of extra added features it uses where a small functionality is required. In case of heavy scraping also memory management issue can hold you back, its better if you yourself do DOM Analysis rather than depending thrid party application.
Just My opinion. Even I used SHD initially but later realized this.
You're not setting $trVal and $tdVal yet you're looping them ?
The webpage in question is http://assignments.uspto.gov/assignments/q?db=pat&pub=20060030630
Now, let's just say I want to capture the Assignees in the first assignment. The relevant code there looks like
<div class="t3">Assignee:</div>
</td>
</tr>
</table>
</td><td>
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<tbody valign="top">
<tr>
<td>
<table>
<tr>
<td>
<div class="p1">
LEAR CORPORATION
</div>
</td>
</tr>
<tr>
<td><span class="p1">21557 TELEGRAPH ROAD</span></td>
</tr>
<tr>
<td><span class="p1">SOUTHFIELD, MICHIGAN 48034</span></td>
</tr>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
I could I suppose use xpath and grab everything out of spans with class p1, except that thing is used all throughout the page for basically everything, same for the div class that lear corporation is in.
So is there a way for me to just read "Assignees" and then grab just the information relevant to it?
I figure if I can understand how to do that, then I can extrapolate from that and figure out how to grab any specific data on the page that I want, i.e. grabbing the conveyance data on any particular assignment.
But if say, I were just to grab all the data on the page (reel/frame, conveyance, assignors, assignee, correspondent for every assignment, and the header information about the patent itself), might that be easier to do than trying to grab each individual piece of information?
There is no clear way to do it since we have no designation in the DOM where this information is.. It's very arbitrary.
I would recommend using some math to figure out the pattern of where in the DOM the Assignee resides.
For example, we know that for every class of p1, the assignee value is position 16, and a new Assignment occurs every 23rd position. Using a loop you could figure it out.
This should get you started at the very least.
$Site = file_get_contents('http://assignments.uspto.gov/assignments/q?db=pat&pub=20060030630');
$Dom = new DomDocument();
$Dom->loadHTML($Site);
$Finder = new DomXPath($Dom);
$Nodes = $Finder->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' p1 ')]");
$position = 0;
foreach($Nodes as $node) {
if(($position % 16) == 0 && $position > 0) {
var_dump($node->nodeValue);
break;
}
$position++;
}
I have a table with header on it. I need the header to be fixed when the user scrolls the table data.
my table is as follows
<div style="height: 300px;overflow: auto">
<table>
<thead>
<tr>
<th> Nr. </th>
<th> Name </th>
<th> Status </th>
<th> Date </th>
</tr>
</thead>
<tbody>
<tr>
<?php while($record = odbc_fetch_array($result)) { ?>
<td> <?php echo$record['Nr']; ?></td>
<td> <?php echo$record['Name']; ?></td>
<td> <?php echo$record['Status']; ?></td>
<td> <?php echo$record['Date']; ?></td>
<?php }?>
</tr>
</tbody>
</table>
</div>
Let me know if you need more information.
your syntax is wrong.
this will not work.
you have to put the table head inside section. not .
then you can define overflow: auto and a fixed height to tbody and you will be able to scroll inside the table.
<table>
<thead>
... heading
</thead>
<tbody style="height: 100px; display: block; overflow: auto; ">
... bodycols
</tbody>
</table>
something like that, but pleas s dont do this.
its very unreliable.
please do two seperate tables, wrap them inside a div and make one div fixed height and overflow auto. two more links:
http://www.cssplay.co.uk/menu/tablescroll.html
http://www.imaputz.com/cssStuff/bigFourVersion.html
To begin, your code is wrong.
The Tr with th's must be wrapped with a thead, the other ones by a tbody.
Then, you should watch the source code of this page
If you set a fixed height on the table, if it doesn't have content or very few rows, they will expand to be very high and fill the space. I don't know if that's what you want.
This solution might work for you depending on the style of your headers.
http://salzerdesign.com/blog/?p=191