xpath got wrong html attribute - php

So, I have this PHP scraper code and HTML below that I want to scrape using Xpath.
When I try to scrape every #href, it shows outerHTML 14, when it is supposes to be 14
The #href is cut in half where the space are. What causes this?
$content = $xpath->query('//a');
foreach($content as $c){
var_dump(htmlspecialchars($c->C14N())); echo '<br>';
}
The one above is the CURL code.
Here is the HTML.
<div class="outercalendar" id="maincalendar821"><table class="calendarHeader">
<tbody><tr>
<td><input type="button" onclick="AjxGetMainCalendarMonth('2', '2015', '821')" value="<"></td>
<td class="calendarHeader" colspan="5">March 2015</td>
<td><input type="button" onclick="AjxGetMainCalendarMonth('4', '2015', '821')" value=">"></td>
</tr>
</tbody></table>
<table class="calendar">
<tbody><tr>
<td class="calendarDay">S</td>
<td class="calendarDay">M</td>
<td class="calendarDay">T</td>
<td class="calendarDay">W</td>
<td class="calendarDay">T</td>
<td class="calendarDay">F</td>
<td class="calendarDay">S</td>
</tr>
<tr>
<td class="calendar">1</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar">7</td>
</tr>
<tr>
<td class="calendar">8</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar">14</td>
</tr>
<tr>
<td class="calendar">15</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar">21</td>
</tr>
<tr>
<td class="calendar">22</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar">28</td>
</tr>
<tr>
<td class="calendar">29</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
</tr>
</tbody></table>
</div>

The issue could be in the structure of the info stored in the tag.
I would suggest starting with a more detailed xpath:
//a/#href
so your initial code would be:
$content = $xpath->query('//a/#href');

Related

Adding next row fields to previous row in mySQL using PHP

I have a table with unix timestamp, userID, Long and lat. I would like to copy the value of Long and Lat from the following row and add them to that row. Please see below.
<table style="width: 645px;" border="0" cellspacing="0" cellpadding="0">
<colgroup>
<col width="140" />
<col width="57" />
<col span="7" width="64" />
</colgroup>
<tbody>
<tr>
<td class="xl66" width="140" height="21">Unix Time Stamp</td>
<td class="xl66" width="57"> </td>
<td class="xl66" width="64">Long</td>
<td class="xl66" width="64"> </td>
<td class="xl66" width="64">Lat</td>
<td class="xl66" width="64"> </td>
<td class="xl66" width="64"> </td>
<td class="xl66" width="64"> </td>
<td class="xl66" width="64"> </td>
</tr>
<tr>
<td class="xl67" align="right" width="140" height="20">1458119939</td>
<td class="xl77" width="57"> </td>
<td class="xl70" align="right">-26.2004</td>
<td class="xl70"> </td>
<td class="xl70" align="right">28.01277</td>
<td class="xl70"> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="21">1458119940</td>
<td class="xl77" width="57"> </td>
<td class="xl70" align="right">26.20654</td>
<td class="xl70"> </td>
<td class="xl70" align="right">28.04565</td>
<td class="xl70"> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="20">1458128756</td>
<td class="xl71" width="57"> </td>
<td class="xl71" align="right" width="64">-29.0065</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88437</td>
<td class="xl77" width="64"> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="20">1458128757</td>
<td class="xl71" width="57"> </td>
<td class="xl71" align="right" width="64">-29.0067</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88465</td>
<td class="xl77" width="64"> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="20">1442829381</td>
<td class="xl71" width="57"> </td>
<td class="xl71" align="right" width="64">-29.0064</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88458</td>
<td class="xl77" width="64"> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="21">1442829397</td>
<td class="xl71" width="57"> </td>
<td class="xl73" align="right" width="64">-29.0062</td>
<td class="xl78" width="64"> </td>
<td class="xl74" align="right" width="64">29.88436</td>
<td class="xl77" width="64"> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="xl69" align="right" width="140" height="21">1442830988</td>
<td class="xl73" width="57"> </td>
<td class="xl75" align="right" width="64">-26.2065</td>
<td class="xl79" width="64"> </td>
<td class="xl76" align="right" width="64">28.04565</td>
<td class="xl77" width="64"> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td height="20"> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td height="20"> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="xl65" colspan="3" height="20">Query Result needs to look like</td>
<td class="xl65"> </td>
<td class="xl65"> </td>
<td class="xl65"> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td class="xl66" height="21">Unix Time Stamp</td>
<td class="xl66"> </td>
<td class="xl66">Long</td>
<td class="xl66"> </td>
<td class="xl66">Lat</td>
<td class="xl66"> </td>
<td class="xl66">LongB</td>
<td class="xl66"> </td>
<td class="xl66">LatB</td>
</tr>
<tr>
<td class="xl67" align="right" width="140" height="20">1458119939</td>
<td class="xl77" width="57"> </td>
<td class="xl70" align="right">-26.2004</td>
<td class="xl70"> </td>
<td class="xl70" align="right">28.01277</td>
<td class="xl70"> </td>
<td class="xl70" align="right">26.20654</td>
<td class="xl70"> </td>
<td class="xl70" align="right">28.04565</td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="20">1458119940</td>
<td class="xl77" width="57"> </td>
<td class="xl70" align="right">26.20654</td>
<td class="xl70"> </td>
<td class="xl70" align="right">28.04565</td>
<td class="xl70"> </td>
<td class="xl71" align="right" width="64">-29.0065</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88437</td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="20">1458128756</td>
<td class="xl71" width="57"> </td>
<td class="xl71" align="right" width="64">-29.0065</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88437</td>
<td class="xl77" width="64"> </td>
<td class="xl71" align="right" width="64">-29.0067</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88465</td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="20">1458128757</td>
<td class="xl71" width="57"> </td>
<td class="xl71" align="right" width="64">-29.0067</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88465</td>
<td class="xl77" width="64"> </td>
<td class="xl71" align="right" width="64">-29.0064</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88458</td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="21">1442829381</td>
<td class="xl71" width="57"> </td>
<td class="xl71" align="right" width="64">-29.0064</td>
<td class="xl77" width="64"> </td>
<td class="xl72" align="right" width="64">29.88458</td>
<td class="xl77" width="64"> </td>
<td class="xl73" align="right" width="64">-29.0062</td>
<td class="xl78" width="64"> </td>
<td class="xl74" align="right" width="64">29.88436</td>
</tr>
<tr>
<td class="xl68" align="right" width="140" height="21">1442829397</td>
<td class="xl71" width="57"> </td>
<td class="xl73" align="right" width="64">-29.0062</td>
<td class="xl78" width="64"> </td>
<td class="xl74" align="right" width="64">29.88436</td>
<td class="xl78" width="64"> </td>
<td class="xl75" align="right" width="64">-26.2065</td>
<td class="xl79" width="64"> </td>
<td class="xl76" align="right" width="64">28.04565</td>
</tr>
</tbody>
</table>
Your help would be very much appreciated.
SELECT timestamp, userID, Long, Lat, Long as LongB, Lat as LatB FROM table
I think you meant column instead of row

Getting tags in DOMDocument

I'm trying to get the HTML markup of a table in a page:
$new_dom = new DOMDocument();
$table = '';
$nodesTable = $this->dom->getElementsbyTagName("table");
foreach($nodesTable as $nodeTable){
$color = $nodeTable->getAttribute('bordercolordark');
if ($color == '#73BAFF') {
$table = $nodeTable;
}
}
$new_dom->appendChild($table);
echo $new_dom->saveHTML();
Here is somepage.html:
<html>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
<table border="1" cellpadding="0" width="500" bordercolorlight="#ACD6FF" bordercolordark="#73BAFF" align="center">
<tr>
<td rowspan="2" colspan="2" bgcolor="#73BAFF"> </td>
<td colspan="3" align="center" bgcolor="#ACD6FF"> Element 1 </td>
<td colspan="3" align="center" bgcolor="#ACD6FF"> Element 2 </td>
</tr>
<tr>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 1</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 2</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 3</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
</tr>
</table>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
</html>
$new_dom just outputs \n instead of HTML markup. I tried looking at this answer: PHP DOMDocument stripping HTML tags, but appending the table this way didn't work either.
Fatal error: Uncaught exception 'DOMException' with message 'Wrong Document Error'
So you cannot move nodes from one document to another... If you want to do that, you have to use importNode() with the deep flag.
$dom = new DOMDocument();
$dom->loadHTMLFile('x.html');
$new_dom = new DOMDocument();
$table = '';
$nodesTable = $dom->getElementsbyTagName("table");
foreach($nodesTable as $nodeTable){
$color = $nodeTable->getAttribute('bordercolordark');
if ($color == '#73BAFF') {
$table = $new_dom->importNode($nodeTable, true);
}
}
$new_dom->appendChild($table);
echo $new_dom->saveHTML();
This imports only the table element, but not the children...
note: I'd disable the entity loader in your case libxml_disable_entity_loader(true);. I am not sure whether XEE attacks work with loadHTML() too, but just for the sake of security.

PHP curl. Traversing search results

I am working on a website that allows people to search for an 'x' product and display the results in a table format for example.
I am planning on scraping the search data from another website using php curl. (the owner of the website being scraped is aware and allows it, so no legal issues there).
I already have a php curl code to go and login to the website, and do a search based on user inputs. I have no idea how to go thru the results of the search and output then in my website one by one.
PHP curl code:
$username = '********';
$password = '********';
$loginUrl = 'http://www.a-website.com/login.asp';
//init curl
$ch = curl_init();
//Set the URL to work with
curl_setopt($ch, CURLOPT_URL, $loginUrl);
// ENABLE HTTP POST
curl_setopt($ch, CURLOPT_POST, 1);
//Set the post parameters
curl_setopt($ch, CURLOPT_POSTFIELDS, 'username=' . $username . '&password=' . $password . '&submit1=' . 'Login');
//Handle cookies for the login
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie stuff hure');
//Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
//not to print out the results of its query.
//Instead, it will return the results as a string return value
//from curl_exec() instead of the usual true/false.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//execute the request (the login)
$store = curl_exec($ch);
/* * *****************SEARCH HERE****************** */
curl_setopt($ch, CURLOPT_URL, 'http://www.a-website.com/Index.asp');
//execute the request
$content = curl_exec($ch);
//Set the post parameters
curl_setopt($ch, CURLOPT_POSTFIELDS, 'search_txt_vs=' . '' . '&search_txt_UPC=' . '' . '&search_txt_Name=' . $searchString .
'&search_txt_Manufacturer=' . '' . '&submit=' . 'Search');
//execute the request (the search)
$Search = curl_exec($ch);
print CJSON::encode($Search);
print $Search;
//save the data to disk
print $content;
Here is the html code from the website Im scrapping (which btw is in old school table format)
<td colspan="3" height="100%" valign="top">
<table width="100%" border="0" cellpadding="2" cellspacing="0" bordercolor="#99CCCC" class="text">
<tbody>
<tr bgcolor="#9999CC">
<td align="right" class="calendar">Sort ></td>
<td align="center"> NDC
</td>
<td align="left"> Brand Name
</td>
<td align="center" colspan="2"> Strength
| UD
</td>
<td align="left"> Stock
</td>
<td align="center"> Manufacturer
</td>
<td align="center" bgcolor="cccccc"> AWP
/ Your Price
</td>
</tr>
<tr bgcolor="#9999CC">
<td align="right" class="calendar"> </td>
<td align="center"> UPC
</td>
<td align="left"> Generic Alt/Name
</td>
<td align="center" colspan="2"> Size
| Form
</td>
<td align="left" colspan="3" class="selected">Category</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center" valign="top" rowspan="2">1
<br> <span class="smallNorm_red">[add]</span>
</td>
<td align="center"><span class="smallNorm">00169347718</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN 70/ 30U/ML CRT 5X3 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 70-30 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
0.01
/ $
0.01
</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center"><span class="smallNorm">000000000000</span>
</td>
<td align="left"><span class="smallNorm">HUM INSULIN NPH/REG INSULIN HM</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 5X3ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19112,0.01021);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center" valign="top" rowspan="2">2
<br> <span class="smallNorm_red">[add]</span>
</td>
<td align="center"><span class="smallNorm">00169347418</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN N 100 UN/ML CRT 5X3 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 100 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NNP</span>
</td>
<td align="center"><span class="smallNorm">$
0.00
/ $
0.01
</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center"><span class="smallNorm">000000000000</span>
</td>
<td align="left"><span class="smallNorm">NPH HUMAN INSULIN ISOPHANE</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 5X3ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19116,0.012);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center" valign="top" rowspan="2">3
<br> <span class="smallNorm_red">[add]</span>
</td>
<td align="center"><span class="smallNorm">00169231721</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN INNO 70/30 PFS 5X3 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 70-30 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
0.00
/ $
0.01
</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center"><span class="smallNorm">000000000000</span>
</td>
<td align="left"><span class="smallNorm">HUM INSULIN NPH/REG INSULIN HM</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 5X3ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(45211,0.012);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center" valign="top" rowspan="2">4
<br> <span class="smallNorm_red">[add]</span>
</td>
<td align="center"><span class="smallNorm">00169183311</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN R 100 UN/ML VL 10 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 100 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
99.00
/ $
82.09
</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center"><span class="smallNorm">000169183311</span>
</td>
<td align="left"><span class="smallNorm">INSULIN REGULAR HUMAN</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 10ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19117,82.0884);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center" valign="top" rowspan="2">5
<br> <span class="smallNorm_red">[add]</span>
</td>
<td align="center"><span class="smallNorm">00169183711</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN 70/ 30U/ML VL 10 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 70-30 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
99.00
/ $
82.09
</span>
</td>
</tr>
<tr bgcolor="eeeeee">
<td align="center"><span class="smallNorm">000169183711</span>
</td>
<td align="left"><span class="smallNorm">HUM INSULIN NPH/REG INSULIN HM</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 10ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19110,82.0884);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center" valign="top" rowspan="2">6
<br> <span class="smallNorm_red">[add]</span>
</td>
<td align="center"><span class="smallNorm">00169183411</span>
</td>
<td align="left"><span class="smallNorm_red">NOVOLIN N 100 UN/ML VL 10 ML</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 100 U/ML</span>
</td>
<td align="left"><span class="smallNorm">YES</span>
</td>
<td align="center"><span class="smallNorm">NOVO NORDISK PHARM</span>
</td>
<td align="center"><span class="smallNorm">$
99.00
/ $
82.09
</span>
</td>
</tr>
<tr bgcolor="#99CCCC">
<td align="center"><span class="smallNorm">000000000000</span>
</td>
<td align="left"><span class="smallNorm">NPH HUMAN INSULIN ISOPHANE</span>
</td>
<td align="center" colspan="2"><span class="smallNorm"> 10ML </span>
</td>
<td align="left" colspan="3"><span class="smallNorm">
<a href="#" onclick="return openreturn(19114,82.0884);"><span class="smallNorm_red">[return]</span>
</a>INSULIN</span>
</td>
</tr>
</tbody>
</table>
</td>
You could try adding the string to a DOMDocument and use the getElementsByTagName and then write them into an array or something you can use. More information here: http://php.net/manual/en/domdocument.getelementsbytagname.php
Also, similar question was answered here, considering you're returning HTML:PHP parse HTML tags

display column data in each table data

---------------------------------------------------------------------
Quantity | Unit | Detials of Items | ISSUED DATE | RETURNED |
| | | | |
---------------------------------------------------------------------
tbl col 1 |tbl col2 | tbl col2 | tbl col 3 | tbl col4 |
---------------------------------------------------------------------
say I have a table in the database that has 4 columns
and that is the table in my website
how would i display my table column data in each table data?
any suggestion please
I only have this codes so far as I am new to php i just use macromedia for the interface
$myServer = "server";
$myUser = "user";
$myPass = "password";
$myDB = "mssqldb";
//connection to the database
$dbhandle = mssql_connect($myServer, $myUser, $myPass)
or die("Couldn't connect to SQL Server on $myServer");
//select a database to work with
$selected = mssql_select_db($myDB, $dbhandle)
or die("Couldn't open database $myDB");
//declare the SQL statement that will query the database
$query = "SELECT eidnumber ";
$query .= "FROM tablename ";
<table width="935" height="102" border="1">
<tr>
<th width="89" rowspan="2" scope="col">Quantity</th>
<th width="87" rowspan="2" scope="col">Unit</th>
<th width="137" rowspan="2" scope="col">Details of Item(s) Accounted </th>
<th width="221" rowspan="2" scope="col">ISSUED DATE </th>
<th height="51" colspan="3" scope="col">RETURNED</th>
<th width="254" rowspan="2" scope="col">
<p>REMARKS</p> </th>
</tr>
<tr>
<th width="24" height="39" scope="col">C</th>
<th width="18" scope="col">X</th>
<th width="53" scope="col">DATE</th>
</tr>
</table>
<table width="935" border="1">
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
<tr>
<td width="90"> </td>
<td width="86"> </td>
<td width="137"> </td>
<td width="221"> </td>
<td width="24"> </td>
<td width="17"> </td>
<td width="55"> </td>
<td width="253"> </td>
</tr>
</table>
If you're using PDO, which is a better approach:
echo "<table>";
$sql = $this->db->query("SELECT * FROM YOUR_TABLE");
while($row = $sql->fetch(PDO::FETCH_ASSOC))
{
echo "
<tr>
<td>". $row['quantity'] ."</td>
<td>". $row['unit'] ."</td>
<td>". $row['details'] ."</td>
<td>". $row['issued_date'] ."</td>
<td>". $row['returned'] ."</td>
</tr>
";
}
echo "</table>";
Or if you're mysql_ functions, which are deprecated:
echo "<table>";
$sql = mysql_query("SELECT * FROM YOUR_TABLE");
while($row = mysql_fetch_assoc($sql))
{
echo "
<tr>
<td>". $row['quantity'] ."</td>
<td>". $row['unit'] ."</td>
<td>". $row['details'] ."</td>
<td>". $row['issued_date'] ."</td>
<td>". $row['returned'] ."</td>
</tr>
";
}
echo "/<table>";
In addition to the above answer, you can also use mysqli. Here's an example:
$result = $mysqli->query($query);
while ($row = $result->fetch_object()){
echo "<tr>";
echo "<td>" . $row->quantity . "</td>"
...etc...
}

How do I extract values from a html page stored as string using curl function

I am using PHP / curl to get a HTML into a string and then i need to extract the following data and then project a graph out of it .
The data I want looks like :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />
<title></title>
</head>
<body>
<table>
<tbody>
<tr>
<td>
<h3>Income</h3>
</td>
</tr>
<tr>
<td>Operating income</td>
<td class="numericalColumn">22,922.00</td>
<td class="numericalColumn">21,507.30</td>
<td class="numericalColumn">17,492.60</td>
<td class="numericalColumn">13,683.90</td>
<td class="numericalColumn">10,227.12</td>
</tr>
<tr>
<td>
<h3>Expenses</h3>
</td>
</tr>
<tr>
<td>Material consumed</td>
<td class="numericalColumn">4,029.40</td>
<td class="numericalColumn">3,442.60</td>
<td class="numericalColumn">2,952.30</td>
<td class="numericalColumn">1,889.00</td>
<td class="numericalColumn">1,367.67</td>
</tr>
<tr>
<td>Manufacturing expenses </td>
<td class="numericalColumn">2,213.20</td>
<td class="numericalColumn">1,841.80</td>
<td class="numericalColumn">299.80</td>
<td class="numericalColumn">120.50</td>
<td class="numericalColumn">1,020.70</td>
</tr>
<tr>
<td>Personnel expenses</td>
<td class="numericalColumn">9,062.80</td>
<td class="numericalColumn">9,249.80</td>
<td class="numericalColumn">7,409.10</td>
<td class="numericalColumn">5,768.20</td>
<td class="numericalColumn">4,279.03</td>
</tr>
<tr>
<td>Selling expenses</td>
<td class="numericalColumn">378.10</td>
<td class="numericalColumn">308.40</td>
<td class="numericalColumn">532.10</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">171.05</td>
</tr>
<tr>
<td>Adminstrative expenses</td>
<td class="numericalColumn">1,737.00</td>
<td class="numericalColumn">1,906.00</td>
<td class="numericalColumn">2,583.70</td>
<td class="numericalColumn">2,651.70</td>
<td class="numericalColumn">904.78</td>
</tr>
<tr>
<td>Expenses capitalised</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Cost of sales</td>
<td class="numericalColumn">17,420.50</td>
<td class="numericalColumn">16,748.60</td>
<td class="numericalColumn">13,777.00</td>
<td class="numericalColumn">10,429.40</td>
<td class="numericalColumn">7,743.22</td>
</tr>
<tr>
<td>Operating profit</td>
<td class="numericalColumn">5,501.50</td>
<td class="numericalColumn">4,758.70</td>
<td class="numericalColumn">3,715.60</td>
<td class="numericalColumn">3,254.50</td>
<td class="numericalColumn">2,483.90</td>
</tr>
<tr>
<td>Other recurring income</td>
<td class="numericalColumn">434.20</td>
<td class="numericalColumn">468.20</td>
<td class="numericalColumn">326.90</td>
<td class="numericalColumn">288.70</td>
<td class="numericalColumn">113.59</td>
</tr>
<tr>
<td>Adjusted PBDIT</td>
<td class="numericalColumn">5,935.70</td>
<td class="numericalColumn">5,226.90</td>
<td class="numericalColumn">4,042.50</td>
<td class="numericalColumn">3,543.20</td>
<td class="numericalColumn">2,597.49</td>
</tr>
<tr>
<td>Financial expenses</td>
<td class="numericalColumn">108.40</td>
<td class="numericalColumn">196.80</td>
<td class="numericalColumn">116.80</td>
<td class="numericalColumn">7.20</td>
<td class="numericalColumn">3.13</td>
</tr>
<tr>
<td>Depreciation </td>
<td class="numericalColumn">579.60</td>
<td class="numericalColumn">533.60</td>
<td class="numericalColumn">456.00</td>
<td class="numericalColumn">359.80</td>
<td class="numericalColumn">292.26</td>
</tr>
<tr>
<td>Other write offs</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Adjusted PBT</td>
<td class="numericalColumn">5,247.70</td>
<td class="numericalColumn">4,496.50</td>
<td class="numericalColumn">3,469.70</td>
<td class="numericalColumn">3,176.20</td>
<td class="numericalColumn">2,302.10</td>
</tr>
<tr>
<td>Tax charges </td>
<td class="numericalColumn">790.80</td>
<td class="numericalColumn">574.10</td>
<td class="numericalColumn">406.40</td>
<td class="numericalColumn">334.10</td>
<td class="numericalColumn">286.10</td>
</tr>
<tr>
<td>Adjusted PAT</td>
<td class="numericalColumn">4,456.90</td>
<td class="numericalColumn">3,922.40</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,016.00</td>
</tr>
<tr>
<td>Non recurring items</td>
<td class="numericalColumn">441.10</td>
<td class="numericalColumn">-948.60</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">38.33</td>
</tr>
<tr>
<td>Other non cash adjustments</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-33.85</td>
</tr>
<tr>
<td>Reported net profit</td>
<td class="numericalColumn">4,898.00</td>
<td class="numericalColumn">2,973.80</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,020.48</td>
</tr>
<tr>
<td>Earnigs before appropriation</td>
<td class="numericalColumn">4,898.00</td>
<td class="numericalColumn">2,973.80</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,020.48</td>
</tr>
<tr>
<td>Equity dividend</td>
<td class="numericalColumn">880.90</td>
<td class="numericalColumn">586.00</td>
<td class="numericalColumn">876.50</td>
<td class="numericalColumn">873.70</td>
<td class="numericalColumn">712.88</td>
</tr>
<tr>
<td>Preference dividend</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Dividend tax</td>
<td class="numericalColumn">128.30</td>
<td class="numericalColumn">99.60</td>
<td class="numericalColumn">148.90</td>
<td class="numericalColumn">126.80</td>
<td class="numericalColumn">99.98</td>
</tr>
<tr>
<td>Retained earnings</td>
<td class="numericalColumn">3,888.80</td>
<td class="numericalColumn">2,288.20</td>
<td class="numericalColumn">2,037.90</td>
<td class="numericalColumn">1,841.60</td>
<td class="numericalColumn">1,207.62</td>
</tr>
</tbody>
</table>
</body>
</html>
I want to extract each value like Manufacturing Data and the values of all the years mentioned in that line. How do I go about this?
I found something like preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match); but that doesn't get the values I want.
If i understood you question well you want something like this to be done. this was written by me so if you need clarifications i'd love to help.
cheers !
You can use libraries like PHP Simple HTML DOM Parser to extract data from HTML/XHTML.
http://simplehtmldom.sourceforge.net/manual.htm
An example:
$pageDom = str_get_html( $rawHtmlData );
foreach( $pageDom->find( 'td' ) as $tblElem )
{
if( FALSE !== stristr( $tblElem->innertext, 'Manufacturing expenses' ) )
{
// Do stuff
}
}

Categories