extracting a link in a html table php using simple_html_dom

extracting a link in a html table php using simple_html_dom - php

I'm trying to extract a specific link from a table but is not displaying anything. It's the 3rd link in the td. I thought this would work but doesn't.
here the code:
<?php
$site = 'site';
$html = file_get_html($site);
foreach($html->find('td a', 3) as $element)
echo $element->href;
?>
Here is the HTML
<tr class="evenrow team-600-359">
<td>
Aug 17
</td>
<td>
FT
</td>
<td align="right">
Arsenal
</td>
<td align="center">
1-3
</td>
<td>Aston Villa</td>
<td style="text-align:right;">60,003</td>
</td>
<td>
Premier League
</td>
</tr>

You have invalid HTML. It can be the cause.
Check double closing of TD with 60,003 value.

Just use native DomDocument:
$str = <<<STR
<tr class="evenrow team-600-359">
<td>
Aug 17
</td>
<td>
FT
</td>
<td align="right">
Arsenal
</td>
<td align="center">
1-3
</td>
<td>Aston Villa</td>
<td style="text-align:right;">60,003</td>
</td>
<td>
Premier League
</td>
</tr>
STR;
$dom = new DOMDocument();
#$dom->loadHTML($str);
$elements = $dom->getElementsByTagName('td');
echo '<pre>' . print_r($dom->saveXML($elements->item(2)), true) . '</pre>';
OUTPUT
<td align="right">
Arsenal
</td>

Related

PHP with DOMXPath - How to select and count from this html tree

I need to count how many of these items are open, and there are four types of them: Easy, Medium, Difficult and Not-Wanted. All of these types are values inside the div's. I need to exclude the 'Not-Wanted' types from the count. Notice the 'Open' and 'Close' values have different number of spaces around them. This is the html structure:
<table>
<tbody>
<tr>
<td>
<div>Difficult</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Closed </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Medium</div>
</td>
<td>Name</td>
<td>Open </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Medium</div>
</td>
<td>Name</td>
<td> Closed</td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td>Closed </td>
</tr>
<tr>
<td>
<div>Not-wanted</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Difficult</div>
</td>
<td>Name</td>
<td>Open</td>
</tr>
............
This is one of my attempts to solve the problem. It is obviously wrong, but I don't know how to get it right.
$doc = new DOMDocument();
$doc->loadHtmlFile('http://www.nameofsite.com');
$doc->preserveWhiteSpace = false;
$xpath = new DOMXPath($doc);
$elements = $xpath->query("/html/body/div[1]/div/section/div/section/article/div/div[1]/div/div/div[2]/div[1]/div[2]/div/section/div/div/table/tbody/tr");
$count = 0;
foreach ($elements as $element) {
if ($element->childNodes->nodeValue != 'Not-wanted') {
if ($element->childNodes->nodeValue === 'open') {
$count++;
}
}
}
echo $count;
I have a very rudimental knowledge of DOMXPath, so it is too complex for me, since I'm only able to create simple queries.
Can anybody help?
Thanks in advance.

Based on the data in your example, I think you can adjust the xpath expression to this to get all the <tr>'s that match your conditions:
//table/tbody/tr[normalize-space(td[3]/text()) = 'Open' and
td[1]/div/text() != 'Not-wanted']
$elements is then of type DOMNodeList and you can then get the length property to get the number of nodes in the list.
For example:
$source = <<<SOURCE
<table>
<tbody>
<tr>
<td>
<div>Difficult</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Closed </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Medium</div>
</td>
<td>Name</td>
<td>Open </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Medium</div>
</td>
<td>Name</td>
<td> Closed</td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td>Closed </td>
</tr>
<tr>
<td>
<div>Not-wanted</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Difficult</div>
</td>
<td>Name</td>
<td>Open</td>
</tr>
</tbody>
</table>
SOURCE;
$doc = new DOMDocument();
$doc->loadHTML($source);
$doc->preserveWhiteSpace = false;
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//table/tbody/tr[normalize-space(td[3]/text()) = 'Open' and td[1]/div/text() != 'Not-wanted']");
echo $elements->length;
Which will result in:
5
Demo

How to automatically generate rowspan using php

In this code everything is perfect except that I could not generate rowspan in the Internal Grade cell. this line <td rowspan="<?php echo $cols;?>"> hides the bottom border while I expect to span the rows every time the code generates rows. Thanks for your input and sorry for using mysql() function. I am newbie.
<?php $cols=mysql_num_rows($grds); ?>
<table>
<?php do{ ?>
<tr>
<td> </td>
//This hide the bottom border
<td rowspan="<?php echo $cols;?>">Internal<br />Grades</td>
<td colspan="2"> <?php echo strtoupper($row_grds['grade_name']); ?></td>
<td><?php echo strtoupper($row_grds['igrade']); ?></td>
<?php } while ($row_grds=mysql_fetch_assoc($grds)); ?>
</tr>
</table>
The output is like below:
output is here
The code generated by php is:
<table>
<tr>
<td> </td>
<td rowspan="1"> </td>// A
<td colspan="2"> RHYMES</td>
<td align="center">B</td>
</tr>
<tr>
<td> </td>
<td rowspan="2"> </td>//B
<td colspan="2"> CONVERSATION</td>
<td align="center">A</td>
</tr>
</table>
I want to merge A and B
Expected output is:
<table>
<tr>
<td> </td>
<td rowspan="2"> Internal<br /> Grade</td>
<td colspan="2"> RHYMES</td>
<td align="center">B</td>
</tr>
<tr>
<td> </td>
<td colspan="2"> CONVERSATION</td>
<td align="center">A</td>
</tr>
</table>

You can do something similar to this:
<table>
<?php $row_grds = mysql_fetch_assoc($grds); ?>
<tr>
<td> </td>
<td rowspan="<?=$cols;?>">Internal<br />Grades</td>
<td colspan="2"> <?=strtoupper($row_grds['grade_name']);?></td>
<td><?=strtoupper($row_grds['igrade']);?></td>
</tr>
<?php while ($row_grds=mysql_fetch_assoc($grds)): ?>
<tr>
<td> </td>
<td colspan="2"> <?=strtoupper($row_grds['grade_name']);?></td>
<td><?=strtoupper($row_grds['igrade']);?></td>
</tr>
<?php endwhile; ?>
Its not perfect but you are using mysql_fetch_assoc, which on its own is not perfect either :D

Warning: DOMXPath::query(): Invalid expression

I'm an Xpath newbie. I want to loop through the result of a cURL query and print each element of the only table on the page.
I've used the Xpath plugin for Firefox to obtain my expression and my table is structured as follows:
<table>
<tr class="listItemOneBg">
<td valign="top">
SMITH
</td>
<td valign="top">
WILLIAM C C
</td>
<td valign="top">
Male
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
BLACKWOOD
</td>
<td valign="top">
61
</td>
<td valign="top">
1924
</td>
<td valign="top">
<a target="_blank" href='XXX'>
order</a>
</td>
</tr>
<tr class="listItemTwoBg">
<td valign="top">
SMITH
</td>
<td valign="top">
WILLIAM C PAGE-
</td>
<td valign="top">
Male
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
SWAN
</td>
<td valign="top">
9
</td>
<td valign="top">
1914
</td>
<td valign="top">
<a target="_blank" href='XXY'>
order</a>
</td>
</tr>
Here's the code I've tried so far. I get a message"Warning: Invalid argument supplied for foreach()". What am I doing wrong?
$page = curl_exec($ch);
curl_close($ch);
// Create new PHP DOM document
$dom = new DOMDocument;
// Load html from curl request into document model
#$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$tableRows = $xpath->query("id('divResults')/table/tbody/tr");
foreach ($tableRows as $row) {
// fetch all 'tds' inside this 'tr'
$td = $xpath->query('td', $row);
echo $td->item(1)->textContent;
}

Assuming the table you're after is actually in a <div id="divResults">...
$tableRows = $xpath->query('//div[#id="divResults"]/table/tbody/tr');
foreach ($tableRows as $row) {
$cells = $row->getElementsByTagName('td');
}

That's a non-standard XPath expression. It cannot work in DOMXPath.(Downvoters, the expression has been edited since the question was posted. Cheers!)
This is where you learn XPath:
Microsoft XPath Syntax
Microsoft XPath by Example
PS: It's where I learnt it.

How To Format This Scraped Content

I'm grabbing the content from all the td's in this table with the class="job" using this.
$table01 = $salary->find('table.table01');
$rows = $table01[0]->find('td.job');
Then I'm using this to output it which works, but obviously only outputs it as plaintext, I need to do some more with it...
foreach($table01[0]->find('td.job') as $element) {
$jobs .= $element->plaintext . '<br />';
}
Ultimately I would like it outputted to this format. Notice the a href is using the job name and replacing spaces and / with a -.
<tr>
<td class="small"> Graphic Artist / Designer
$23,755 – $55,335 </td>
</tr>
<tr>
<td class="small"> Sales Associate<br />
$15,577 – $56,290 </td>
</tr>
<tr>
<td class="small"> Film / Video Editor<br />
$24,184 – $94,493 </td>
</tr>
Heres the table im scraping
<table cellpadding="0" cellspacing="0" border="0" class="table01">
<tr>
<td class="head">Test</td>
<td class="job">
Graphic Artist / Designer<br/>
$23,755 – $55,335
</td>
</tr>
<tr>
<td class="head">Test</td>
<td class="job">
Sales Associate<br/>
$15,577 – $56,290
</td>
</tr>
<tr>
<td class="head">Test</td>
<td class="job">
Film / Video Editor<br/>
$24,184 – $94,493
</td>
</tr>
</table>

may be better to use regexps
<?php
$html=file_get_contents('1.html');
$jobs='';
if(preg_match_all("/<tr>.*?<td.*?>.*?<\/td>.*?<td\sclass=\"job\">.*?<a.+?href=\"(.+?)\".+?>(.*?)<\/a>(.*?)<\/td>.*?<\/tr>/ims", $html, $res))
{
foreach($res[1] as $i=>$uri)
{
$uri=strtolower(urldecode($uri));
$uri=preg_replace("/_\/_/",'-',$uri);
$uri=preg_replace("/_/",'-',$uri);
$jobs.='<tr><td class="small"> '.$res[2][$i].''.$res[3][$i].'</td></tr>'."\n";
}
}
echo $jobs;

Can php code execute while creation of pdf using tcpdf?

I m working on module in which i have to make pdf from php page. I m Using tcpdf for that but m facing one problem that file contain some mysql queries and php coding which is not executed by pdf page.
$prn_no = $_POST['prn_no'];
$current_sem = $_POST['current_sem'];
$qr_fetch_sem_res_id = mysql_query("SELECT * FROM table1 WHERE ((prn='$prn_no') AND (semisterName='$current_sem'))")or die(mysql_error());
$qr_fetch_sem_result_ans = mysql_fetch_array($qr_fetch_sem_res_id);
<tr>
<td colspan="11" align="left" valign="middle">Programme Name: <?php echo $qr_fetch_sem_result_ans['programme_name'];?></td>
</tr>
<tr>
<td colspan="11" align="center" valign="middle"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="27%">Seat No.: <?php echo $qr_fetch_sem_result_ans['seatNo'];?></td>
<td width="3%"> </td>
<td width="22%">PR No. : <?php echo $qr_fetch_sem_result_ans['prn'];?></td>
<td width="2%"> </td>
<td width="17%">Semester : <?php echo $qr_fetch_sem_result_ans['semisterName'];?></td>
<td width="1%"> </td>
<td width="25%">Month / Year Of Exam : <?php echo $qr_fetch_sem_result_ans['month_year_of_exam'];?> </td>
<td width="3%"> </td>
</tr>
<tr>
<td colspan="3">Name: <?php echo $qr_fetch_sem_result_ans['student_name'];?></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td colspan="7">College / Institute: <?php echo $qr_fetch_sem_result_ans['institute_name'];?></td>
<td> </td>
</tr>
</table></td>
</tr>

I'm going to go out on a limb here and suggest that you run your queries fist and then build your pdf file. If you run the queries after you build the pdf then of course it will not have access to your data. If that doesn't help then I must not understand what you're asking.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

extracting a link in a html table php using simple_html_dom - php

You have invalid HTML. It can be the cause. Check double closing of TD with 60,003 value.

Related

PHP with DOMXPath - How to select and count from this html tree

How to automatically generate rowspan using php

Warning: DOMXPath::query(): Invalid expression

How To Format This Scraped Content

Can php code execute while creation of pdf using tcpdf?

Categories

Resources