How to catch text from html page - php

I'd like to catch the word "Bronze" from this html page portion:
<tr class="">
<td align="left" csk="Nikpai,Rohullah">Rohullah Nikpai</td>
<td align="right" >25</td>
<td align="left" >Men's Featherweight</td>
<td align="right" csk="3">3T </td>
<td align="left" class=" Bronze" csk="1"><strong>Bronze</strong></td>
</tr>
I tried different code but I failed in my intent. One of many attempts is the following:
foreach($html4->find('td align="left" strong') as $tag4) {
echo $prova = $tag4->innertext . "\n";
}
where html4 is the entire html page I have to process.

With following Code you can get the classname "Bronze"
<?php
$html='<tr class="">
<td align="left" csk="Nikpai,Rohullah">Rohullah Nikpai</td>
<td align="right" >25</td>
<td align="left" >Mens Featherweight</td>
<td align="right" csk="3">3T </td>
<td align="left" class=" Bronze" csk="1"><strong>Bronze</strong></td>
</tr>';
$dom = new DOMDocument();
#$dom->loadHTML($html);
foreach($dom->getElementsByTagName('td') as $link) {
echo trim($link->getAttribute('class'),' ');
}
?>
Or, if you prefer the Node Value and not the class name and the csk attribut is always 1:
foreach($dom->getElementsByTagName('td') as $link) {
if ($link->getAttribute('csk')=="1"){
echo $link->nodeValue;
}
}

Related

Selective extraction of data from external site using DOM PHP web crawler

I have this PHP dom web crawler which works fine. it extracts mentioned tag along with its link from a (external) forum site to my page.
But recently i ran into a problem. Like
this is the HTML of the forum data::
<tbody>
<tr>
<td width="1%" height="25"> </td>
<td width="64%" height="25" class="FootNotes2">Hispanic Study Partner - dreamer1984</td>
<td width="1%" height="25"> </td>
<td width="14%" height="25" class="FootNotes2" align="center">02/28/17 01:42</td>
<td width="1%" height="25"> </td>
<td width="8%" height="25" align="Center" class="FootNotes2">0</td>
<td width="1%" height="25"> </td>
<td width="9%" height="25" align="Center" class="FootNotes2">200</td>
</tr>
<tr>
<td width="1%" height="25"> </td>
<td width="64%" height="25" class="FootNotes2">nbme - monariyadh</td>
<td width="1%" height="25"> </td>
<td width="14%" height="25" class="FootNotes2" align="center">02/27/17 23:12</td>
<td width="1%" height="25"> </td>
<td width="8%" height="25" align="Center" class="FootNotes2">0</td>
<td width="1%" height="25"> </td>
<td width="9%" height="25" align="Center" class="FootNotes2">108</td>
</tr>
</tbody>
Now if we consider the above code (table data) as the only statements available in that site. and if i tried to extract it with a web crawler like,
<?php
require_once('dom/simple_html_dom.php');
$html = file_get_html('http://www.sitename.com/');
foreach($html->find('td.FootNotes2') as $element) {
echo $element;
}
?>
It extracts al the data that is inside with a class name as "FootNote2"
Now what if i want to extract specific data in tag, for example names like, " dreamer1984" and "monariyadh" from the first tag/line.
and what if i wanted to extract data from 3rd (skipping the rest) which has same class names.
Please note that i can use "regex" like
preg_match_all('/<td.+?FootNotes2.+?<a.+?<\/a> - (?P<name>.*?)<\/td>.+?<td.+?FootNotes2.+?(?P<date>\d{2}\/\d{2}\/\d{2} \d{2}:\d{2})/siu', $subject, $matchs);
foreach ($matchs['name'] as $k => $v){
var_dump('name: '. $v, 'relative date: '. $matchs['date'][$k]);
}
But i prefer to find solution for this in DOM parser...
Any help is appreciated..
As I said in my comment some text processing is unavoidable, however you can get the text element associated with the td like so :
require_once('dom/simple_html_dom.php');
$html = file_get_html('http://www.sitename.com/');
foreach ($html->find("tr") as $row) {
$element = $row->find('td.FootNotes2',0);
if ($element == null) { continue; }
$textNode = array_filter($element->nodes, function ($n) {
return $n->nodetype == 3; //Text node type, like in jQuery
});
if (!empty($textNode)) {
$text = current($textNode);
echo $text;
}
}
This echoes:
- dreamer1984
- monariyadh
Do with that what you will.
Updated to only find the first td for each tr.
If you want to extract only text (not tags and its contain)
foreach ($html->find("td.FootNotes2") as $element) {
$children = $element->children; // get an array of children
foreach ($children AS $child) {
$child->outertext = ''; // This removes the element, but MAY NOT remove it from the original $myDiv
}
echo $element->innertext."<br>";
}
o/p:
- dreamer1984
02/28/17 01:42
0
200
- monariyadh
02/27/17 23:12
0
108
You have to use regex either way so no sense overcomplicating it:
foreach($html->find('tr') as $tr) {
echo preg_replace('/.* - /', '', $tr->find('td',1)->text()) . "\n";
echo $tr->find('td',3)->text() . "\n";
}
I really don't like apokryfos' approach to this, it's a lot of confusion with no benefit.

Duplicated Results on SQL Query

PHP newbie here! I ve been struggling with this for a few days now and i have decided i cant figure this out on my own.
Basically i have 2 database tables "projects_2016" and "attachment".
I want to show the data of "projects_2016" to show in the top table and then check for a matching id number (and if it exsits) in "attachment" it will list all the results under the "project_2016 data".
At the moment it works great but it duplicates the "projects_2016" data for every "attachment" entry.
Here is my code, any input is appreciated!
PS not too concereded about Sql injections. Still learning that!
<?php include '../../../connection_config.php';
$sql = "SELECT DISTINCT * FROM attachment JOIN projects_2016 ON attachment.attachment_ABE_project_number = projects_2016.id ORDER BY `attachment_ABE_project_number` DESC";
$result = $conn->query($sql);
if ($result->num_rows > 0) {
while($row = $result->fetch_assoc()) {
?>
<table width="20" border="1" cellspacing="0" cellpadding="2">
<tr>
<th height="0" scope="col"><table width="990" border="0" align= "center" cellpadding="3" cellspacing="0">
<tr class="text_report">
<td width="107" height="30" align="left" valign="middle" nowrap="nowrap" bgcolor="#F5F5F5"><strong>PNo</strong></td>
<td width="871" align="left" valign="middle" nowrap="nowrap" bgcolor="#F5F5F5"><strong>Project Name</strong></td>
</tr>
<tr>
<td height="20" align="left" valign="middle" bgcolor="#FFFFFF" class="text_report"><strong><?php echo "<br>". $row["ID"]. "<br>";?></strong></td>
<td height="20" align="left" valign="middle" bgcolor= "#FFFFFF" class="text_report"><strong><?php echo "<br>". $row["project_name"]. "<br>";?></strong></td>
</tr>
</table>
<?php
$photo_id = $row["ID"];
$contacts = "SELECT DISTINCT * FROM attachment WHERE attachment_ABE_project_number = '$photo_id'" ;
$result_contacts = $conn->query($contacts);
if ($result_contacts->num_rows > 0) {
// output data of each row
while($row_contacts = $result_contacts->fetch_assoc()) {
?>
<table width="990" border="0" align="center" cellpadding= "3" cellspacing="0" class="text_report">
<tr>
<td height="0" colspan="4" align="left" valign="middle" nowrap="nowrap" bgcolor="#FFFFFF"> </td>
</tr>
<tr>
<td width="319" height="30" align="left" valign="middle" nowrap="nowrap" bgcolor="#F5F5F5"><strong>File Name</strong></td>
<td width="279" align="left" valign="middle" nowrap="nowrap" bgcolor="#F5F5F5"><strong>File Type</strong></td>
<td width="315" align="left" valign="middle" nowrap="nowrap" bgcolor="#F5F5F5"><strong>File Size (KB)</strong></td>
<td width="53" align="right" valign="middle" nowrap="nowrap" bgcolor="#F5F5F5"><strong>View File</strong></td>
</tr>
<tr>
<td height="20" align="left" valign="middle" bgcolor="#FFFFFF"><?php echo $row_contacts ['file'] ?></td>
<td height="20" align="left" valign="middle" bgcolor="#FFFFFF"><?php echo $row_contacts ['type'] ?></td>
<td height="20" align="left" valign="middle" bgcolor="#FFFFFF"><?php echo $row_contacts ['size'] ?></td>
<td align="right" valign="middle" bgcolor="#FFFFFF">view file</td>
</tr>
<tr>
<td height="0" colspan="4" align="left" valign="middle" bgcolor="#FFFFFF"> </td>
</tr>
<?php
}
?>
</table>
<?php
}
?></th>
</tr>
</table>
<table width="1000" border="0" cellspacing="0" cellpadding="0">
<tr>
<th height="26"> </th>
</tr>
</table>
<p>
<?php
}
}
?>
</p>
</table>
<?php $conn->close();
?>
$sql = "SELECT * FROM projects_2016
WHERE EXISTS (SELECT * FROM attachment WHERE projects_2016.id = attachment_ABE_project_number) ORDER BY id DESC ";

store part of the page in variable

i have a page contains a table and this table contains a lot of coding like this
<table class="table">
<tbody>
<tr >
<td width="20" class="tabletop">م</td>
<td class="tabletop" >name</td>
<td class="tabletop" style="width:120px">date</td>
<td class="tabletop" style="width:120px">note1</td>
<td class="tabletop" style="width:100px">note2</td>
<td class="tabletop" style="width:90px">note3</td>
</tr>
<? $res=mysql_query($sql);
while($resArr=mysql_fetch_array($res)){ ?>
<tr style="width:700px">
<td class="tabletext"><?= ++$serial;?></td>
<td class="tabletext" ><?= $resArr[stName];?></td>
<td class="tabletext"><?= $resArr['date'];?></td>
<td class="tabletext" ><?= $resArr[matName];?></td>
<td class="tabletext" ><? if($resArr[exam]==1) echo "work";else echo "final";?></td>
<td class="tabletext" ><? if($resArr[exam_type]==1) echo "prac";else echo "test";?></td>
</tr>
<? }?>
</tbody>
</table>
as you see the table has php coding
now i want to store the whole table in variable so i can send it to pdf printing library tcpdf
You can use heredoc syntax, but you need to move the conditionals outside:
<?php
if($resArr['exam']==1) $exam = "work"; else $exam = "final";
if($resArr["exam_type"]==1) $examtype = "prac";else $examtype = "test";
$var = <<<EOT
<table class="table">
<tbody>
<tr >
<td width="20" class="tabletop">م</td>
<td class="tabletop" >name</td>
<td class="tabletop" style="width:120px">date</td>
<td class="tabletop" style="width:120px">note1</td>
<td class="tabletop" style="width:100px">note2</td>
<td class="tabletop" style="width:90px">note3</td>
</tr>
$res=mysql_query($sql);
while($resArr=mysql_fetch_array($res)){
<tr style="width:700px">
<td class="tabletext">{++$serial}</td>
<td class="tabletext" >{$resArr["stName"]}</td>
<td class="tabletext">{$resArr["date"]}</td>
<td class="tabletext" >{$resArr["matName"]}</td>
<td class="tabletext" >{$exam}</td>
<td class="tabletext" >{$examtype}</td>
</tr>
<? }?>
</tbody>
</table>
EOT;
echo $var;
Another way, if you already have all of this code and want to save it:
Before the table:
<?php ob_start(); ?>
After the table:
<?php $output = ob_get_contents(); ?>
The table will still be displayed and you can use $output to send to the PDF.

PHP DOM get element which contains

Need help with parsing HTML code by PHP DOM.
This is simple part of huge HTML code:
<table width="100%" border="0" align="center" cellspacing="3" cellpadding="0" bgcolor='#ffffff'>
<tr>
<td align="left" valign="top" width="20%">
<span class="tl">Obchodne meno:</span>
</td>
<td align="left" width="80%">
<table width="100%" border="0">
<tr>
<td width="67%">
<span class='ra'>STORE BUSSINES</span>
</td>
<td width="33%" valign='top'>
<span class='ra'>(od: 02.10.2012)</span>
</td>
</tr>
</table>
</td>
</tr>
</table>
What I need is to get text "STORE BUSINESS". Unfortunately, the only thing I can catch is "Obchodne meno" as a content of first tag, so according to this content I need to get its parent->parent->first sibling->child->child->child->child->content. I have limited experience with parsing html in php so any help will be valuable. Thanks in advance!
Make use of DOMDocument Class and loop through the <span> tags and put them in array.
<?php
$html=<<<XCOE
<table width="100%" border="0" align="center" cellspacing="3" cellpadding="0" bgcolor='#ffffff'>
<tr>
<td align="left" valign="top" width="20%">
<span class="tl">Obchodne meno:</span>
</td>
<td align="left" width="80%">
<table width="100%" border="0">
<tr>
<td width="67%">
<span class='ra'>STORE BUSSINES</span>
</td>
<td width="33%" valign='top'>
<span class='ra'>(od: 02.10.2012)</span>
</td>
</tr>
</table>
</td>
</tr>
</table>
XCOE;
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('span') as $tag) {
$spanarr[]=$tag->nodeValue;
}
echo $spanarr[1]; //"prints" STORE BUSINESS

Get data inside html tags using Simple HTML DOM Parser:

I want to get all the information inside the html tags and display them in a table. I'm using Simple HTML DOM Parser. I tried the following code but I'm only getting the LAST COLUMN (Column:Total). How do I get the data from the other columns?
foreach($html->find('tr[class="tblRowShade"]') as $div) {
$key = '';
$val = '';
foreach($div->find('*') as $node) {
if ($node->tag=='td'){
$key = $node->plaintext;
}
}
$ret[$key] = $val;
}
Here's my code for the table
<tr class="tblRowShade">
<td width="12%"><strong>Project</strong></td>
<td width="38%"> </td>
<td width="25%"><strong>Recipient</strong></td>
<td width="14%"><strong>Municipality/City</strong></td>
<td width="11%" nowrap="nowrap" class="td_right"><strong>Implementing Unit</strong></td>
<td width="11%" nowrap="nowrap" class="td_right"><strong>Release Date</strong></td>
<td align="right" width="11%" class="td_right"><strong>Total</strong></td>
</tr>
<tr class="tblRowShade">
<td colspan="2" >Livelihood Programs</td>
<td >Basic Espresso and Latte</td>
<td nowrap="nowrap"></td>
<td >DOLE - TESDA Regional Office IV-A</td>
<td nowrap="nowrap">2013-06-11</td>
<td align="right" nowrap="nowrap" class="td_right">1,500,000</td>
</tr>
Why do you have $div->find('*')? you can try $div->find('td') instead. This should produce correct result. Otherwise you can also try to iterate over children: foreach($div->children as $node)
Assuming you are trying to use the first row as $key and the rest for the data, you might want to alter your HTML code by simply add th in the first row, which is your header: <tr><th>…</th></tr>. This way you can get the keys by $div->find('th'). I suppose using the first row is okay as well.
As alamin.ahmed said, it would be better to search for td instead...
Here's a working example :
$text = ' <tr class="tblRowShade">
<td width="12%"><strong>Project</strong></td>
<td width="38%"> </td>
<td width="25%"><strong>Recipient</strong></td>
<td width="14%"><strong>Municipality/City</strong></td>
<td width="11%" nowrap="nowrap" class="td_right"><strong>Implementing Unit</strong></td>
<td width="11%" nowrap="nowrap" class="td_right"><strong>Release Date</strong></td>
<td align="right" width="11%" class="td_right"><strong>Total</strong></td>
</tr>
<tr class="tblRowShade">
<td colspan="2" >Livelihood Programs</td>
<td >Basic Espresso and Latte</td>
<td nowrap="nowrap"></td>
<td >DOLE - TESDA Regional Office IV-A</td>
<td nowrap="nowrap">2013-06-11</td>
<td align="right" nowrap="nowrap" class="td_right">1,500,000</td>
</tr>';
echo "<div>Original Text: <xmp>$text</xmp></div>";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($text);
// Find all elements
$rows = $html->find('tr[class="tblRowShade"]');
// Find succeeded
if ($rows) {
echo count($rows) . " \$rows found !<br />";
foreach ($rows as $key => $row) {
echo "<hr />";
$columns = $row->find('td');
// Find succeeded
if ($rows) {
echo count($columns) . " \$columns found in \$rows[$key]!<br />";
foreach ($columns as $col) {
echo $col->plaintext . " | ";
}
}
else
echo " /!\ Find() \$columns failed /!\ ";
}
}
else
echo " /!\ Find() \$rows failed /!\ ";
here's the output of the above code:
You must be aware that the two rows doesnt contain the same number of columns... then you must handle that in your program.

Categories