extract table data from htm - php

I need to extract data from a html table and save it in a csv file. Is there any easy way to get all the information inside the attributes of the table in bash, or php?
This is the code
<html><head> <link rel=STYLESHEET href="/XPIcons/style.css" type="text/css">
<title>Control 20 November 2014</title>
</head>
<body
>
<table width="100%" cellapdding="0" cellspacing="0">
<tr><td WIDTH="100%" class="username">xxxx<br><font color=#A4A6A0>IDIOMABASE</font> </td>
<td> </td><td bgcolor=#FFFFFF rowspan="3" align="right"><img src="/XPIcons/logo.jpg"></td></tr>
<tr><td width="100%" align="top" class="guio"><img src="/XPIcons/guion_verde.jpg"></td></tr>
<tr><td width="100%" align="top" class="title">CONTROL 20 NOVEMBER 2014
<br><font color=#B7D30C size="1px"><SCRIPT LANGUAGE="JavaScript" SRC="/XPIcons /calendar.js"></SCRIPT></font>
</td></tr>
</table>
<P>
<script language="JavaScript">
function doNothing(){
}
function ShowData(){
var obj = "QUALCTRL.ShowDataTD?p_date="+p_date.value;
location.href=obj;
}
</script>
<script language="JavaScript">
function test(formu)
{
error=formu.p_date.value==""?"ErrorDate\n":"";
if (error != "")
alert(error);
else
formu.submit();
}
</script>
<table>
<td>Fecha</td><td>
<input type="text" id="p_date" name="p_date" value="20/11/2014" onblur="Compruebap_fecha(formu.p_date);">
<A HREF="javascript:doNothing()" onClick="var obj=document.getElementById('p_date'); setDateField(obj);top.newWin=window.open('/XPIcons/calendar.html', 'cal', 'dependent=yes, resizable=yes, width=210, height=230, screenX=200, screenY=300, titlebar=no')">
<IMG SRC="/XPIcons/calendar.gif" BORDER=0></A><font size=1>Ver calendario</font>
(dd/mm/YYYY)
</td></table>
<input type="button" class="btn" onClick="javascript:RecarregaPlana();" value="Cambia de Dia >>">
<P>
<table border="1%">
<tr><td class="fila_blanca">Población</td>
<td class="fila_mesgris">MAX</td>
<td class="fila_mesgris">MIN</td>
<td class="fila_menysgris">MASS MAX</td>
<td class="fila_menysgris">MASS MIN</td>
<td class="fila_mesgris">MERGE MAX</td>
<td class="fila_mesgris">MERGE MIN</td>
<td class="fila_menysgris">MOS MAX</td>
<td class="fila_menysgris">MOS MIN</td>
<td class="fila_blanca">DIF MAX</td>
<td class="fila_blanca">DIF MIN</td>
<td class="fila_blanca">DIF MAX MERGE</td>
<td class="fila_blanca">DIF MIN MERGE</td>
<td class="fila_blanca">DIF MAX MOS</td>
<td class="fila_blanca">DIF MIN MOS</td>
</tr>
<tr>
<td class="fila_blanca">Palermo</td>
<td class="fila_mesgris">20</td>
<td class="fila_mesgris">11</td>
<td class="fila_menysgris">21</td>
<td class="fila_menysgris">10</td>
<td class="fila_mesgris">20</td>
<td class="fila_mesgris">17</td>
<td class="fila_menysgris">20</td>
<td class="fila_menysgris">9</td>
<td class="fila_blanca">-1</td>
<td class="fila_blanca">1</td>
<td class="fila_mesgris">0</td>
<td class="fila_mesgris">-6</td>
<td class="fila_menysgris">0</td>
<td class="fila_menysgris">2</td>
</tr>
<tr>
<td class="fila_blanca">Bergamo</td>
<td class="fila_mesgris"></td>
<td class="fila_mesgris"></td>
<td class="fila_menysgris">16</td>
<td class="fila_menysgris">7</td>
<td class="fila_mesgris">17</td>
<td class="fila_mesgris">7</td>
<td class="fila_menysgris">17</td>
<td class="fila_menysgris">7</td>
<td class="fila_blanca"></td>
<td class="fila_blanca"></td>
<td class="fila_mesgris"></td>
<td class="fila_mesgris"></td>
<td class="fila_menysgris"></td>
<td class="fila_menysgris"></td>
</tr>
<tr>
<td class="fila_blanca">Rome</td>
<td class="fila_mesgris"></td>
<td class="fila_mesgris"></td>
<td class="fila_menysgris">19</td>
<td class="fila_menysgris">16</td>
<td class="fila_mesgris">19</td>
<td class="fila_mesgris">14</td>
<td class="fila_menysgris">19</td>
<td class="fila_menysgris">14</td>
<td class="fila_blanca"></td>
<td class="fila_blanca"></td>
<td class="fila_mesgris"></td>
<td class="fila_mesgris"></td>
<td class="fila_menysgris"></td>
<td class="fila_menysgris"></td>
</tr>
</table>
<SCRIPT>
function openSearch() {
window.open('XPSearch.Search', 'XPSearch', 'scrollbars=yes,resizable=yes,toolbar=yes,location=yes,status=yes,width=550,height=500,screenX=550,screenY=500'); }
function doNothing() {
}
</SCRIPT>
<P>
<table width="100%" cellspacing="0">
<tr><td class="pie" width="100%"><b>Menú principal</b> </td><td bgcolor=#FCFCFA><img border="0" src="/XPIcons/search.jpg" ></td> <td><marquee hspace=147></marquee></td></table>
</body></html>
And I would like to get a csv like this:
Población,MAX,MIN,MASS MAX,MASS MIN,MERGE MAX,MERGE MIN,MOS MAX,MOS MIN,DIF MAX,DIF MIN ,DIF MAX MERGE,DIF MIN MERGE,DIF MAX MOS,DIF MIN MOS
Palermo,20,11,21,10,20,17,20,9,-1,1,0,-6,0,2
Bergamo,,,16,7,17,7,17,7,,,,
Rome,,,19,16,19,14,19,14,,,,

This can be a way:
awk -F'">|<' -v OFS=","
'NF>3{if (r) {r=r OFS $3} else r=$3}
/tr/ {print r; r=""}' file
For your sample input:
$ awk -F'">|<' -v OFS="," 'NF>3{if (r) {r=r OFS $3} else r=$3} /tr/ {print r; r=""}' a
td class="fila_blanca
MAX,MIN,MASS MAX,MASS MIN,MERGE MAX,MERGE MIN,MOS MAX,MOS MIN,DIF MAX,DIF MIN,DIF MAX MERGE,DIF MIN MERGE,DIF MAX MOS,DIF MIN MOS
Palermo,20,11,21,10,20,17,20,9,-1,1,0,-6,0,2
Bergamo,,,16,7,17,7,17,7,,,,,,
Rome,,,19,16,19,14,19,14,,,,,,
Explanation
-F'">|<' set the input field separator to either "> or <. This way, we can catch the values within the tags easily, without further processing.
-v OFS="," set the output field separator to a comma.
NF>3{if (r) {r=r OFS $3} else r=$3} if the record contains more than 3 fields, store the 3rd in the variable r. This will keep adding content until <tr is found...
/tr/ {print r; r=""} and that's when we print the content and empty the variable to start processing the next block.

Related

Getting value of specific td in selected row

I have the way to get the value of the td's at the time the row is selected using
$(this).find('.servid').val()
However I cannot find the way to get this value later.
<table id="servicetable" class="scroll" style="border: 1px solid #cbcbcb;" align="center">
<tbody>
<tr class="selected">
<td>Service</td>
<td class="servid" value="4004072">72569000</td>
<td class="origin">PAC</td>
<td class="street">60 KENDAL</td>
<td class="city">SANRDINO</td>
<td class="state">CA</td>
<td class="zip">99999</td>
</tr>
<tr>
<td>TelePacific Circuit</td>
<td class="servid" value="5369592">77051900</td>
<td class="origin">TP</td>
<td class="street">819 KAISER</td>
<td class="city">AHEM</td>
<td class="state">CA</td>
<td class="zip">88888</td>
</tr>
</tbody>
</table>
I need to later, after filling out more of the form, get the val() of the selected tr -> servid td
I have tried various things but they are not working
$('#servicetable .selected > td:nth-child(2)').val();
$('#servicetable').find('.selected > td:nth-child(2)').val();
$('#servicetable').find('tr.selected').find('.servid').val();
Below goes your sollution
alert($('#servicetable tr.selected > td.servid').attr('value'));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<table id="servicetable" class="scroll" style="border: 1px solid #cbcbcb;" align="center">
<tbody>
<tr class="selected">
<td>Service</td>
<td class="servid" value="4004072">72569000</td>
<td class="origin">PAC</td>
<td class="street">60 KENDAL</td>
<td class="city">SANRDINO</td>
<td class="state">CA</td>
<td class="zip">99999</td>
</tr>
<tr>
<td>TelePacific Circuit</td>
<td class="servid" value="5369592">77051900</td>
<td class="origin">TP</td>
<td class="street">819 KAISER</td>
<td class="city">AHEM</td>
<td class="state">CA</td>
<td class="zip">88888</td>
</tr>
</tbody>
</table>
One more thing you can not put value like this as it will not validate as per w3c, recommanded to put data-value, if HTML5.
And in jquery also you can put like
alert($('#servicetable tr.selected > td.servid').attr('data-value'));
$('#value1').text($('#servicetable .selected > td:nth-child(2)').attr('value'));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<table id="servicetable" class="scroll" style="border: 1px solid #cbcbcb;" align="center">
<tbody>
<tr class="selected">
<td>Service</td>
<td class="servid" value="4004072">72569000</td>
<td class="origin">PAC</td>
<td class="street">60 KENDAL</td>
<td class="city">SANRDINO</td>
<td class="state">CA</td>
<td class="zip">99999</td>
</tr>
<tr>
<td>TelePacific Circuit</td>
<td class="servid" value="5369592">77051900</td>
<td class="origin">TP</td>
<td class="street">819 KAISER</td>
<td class="city">AHEM</td>
<td class="state">CA</td>
<td class="zip">88888</td>
</tr>
</tbody>
</table>
<label id="value1"/>
IDEAL WAY
use data attribute as shown below,
<td class="servid" data-value="4004072">72569000</td>
$('#value1').text($('#servicetable .selected > td:nth- child(2)').data('value'));
It might be better if you tag the TR with the id you need and then have the "attributes" in the TDs.
You can then parse TDs based on TR
<tr class="selected" id="98989">
<td class="city">Lisbon</td>
.....
_tr = $("#servicetable .selected");
_trId=_tr.attr("id");
_trCity = _tr.find(".city").text();
.....

Splitting HTML without cutting the tags

I've been trying to split a PHP string in an arbitrary number of characters per split. However, I'm looking for a way to do so without breaking HTML tags. Here is an example:
$string = 'Section 1:
<table width = "528" border="0" cellpadding="0" cellspacing="0">
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 1 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 2 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 3 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top">• </td> <td valign="top"> Element 4 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 5 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 6 </td></tr>
</table>
Section 2:
<table width = "528" border="0" cellpadding="0" cellspacing="0">
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 7 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 8 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 9 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 10 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 11 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 12 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 13 </td></tr>
</table>';
$charAmount = 450;
$textSplit = array();
while ($string){
array_push($textSplit, substr($string, 0, $charAmount));
$string = substr($string, $charAmount);
}
var_dump($textSplit);
In this case, two tags are broken. I'd like whatever tag that is cut up at the end of a split to just skip to the next split, but I have no idea how to do this.
I'm not php guys, But logicwise I can help, just before split check which of dese two character is present nearest backwards from the split index < or >
if < is encountered u r splitting in wrong place so skip
if > is encountered go ahead with split
I have done it in jQuery successfully sometimes back
About Splitting html string, I have no ideas now but cutting string with limit character you could refer the solution at the link: https://github.com/dhngoc/php-cut-html-string.
This resource may help you to get more ideas.

jquery and php to grab the contents from a selected row When onclick of a selected checbox

Attached the source code of PHP and jquery , gives undefined as alert ...later displays all the contents in the table... i need to just display the one which the user clicks on checkbox..... what mistake is in my code.
--------------------php---------------------------------------------------------
</script>
<script src="<?=base_url();?>js/calendar.js" type="text/javascript"></script>
<form name="inwardProductList" action="" method="post" >
<table width="100%" border="0" cellpadding="0" cellspacing="0" align="center" class="formtable">
<tr>
<td height="23" colspan="8" align="center" valign="middle" bgcolor="#FFFFFF" class="rows"><b>Cart Display</b></td>
</tr>
<tr>
<td height="66" align="left" valign="top"><table width="99%" id="suppliedtable" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td width="4%" height="43" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Sl.no</strong></td>
<td width="20%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Product Name</strong></strong></td>
<td width="20%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Barcode</strong></td>
<td width="8%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Quantity</strong></strong></td>
<td width="8%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Select</strong></strong></td>
</tr>
<?
$i=0;
if($productName->num_rows() >0){
foreach($productName->result() as $row ){
$i++;
?>
<tr>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$i;?></td>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$row->product_name?></td>
<input type="hidden" name="product_name<?=$i?>" id="product_name<?=$i?>" class="button" value="<?=$row ->product_name;?>"/>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$row->barcode?></td>
<input type="hidden" name="barcode<?=$i?>" id="barcode<?=$i?>" class="button" value="<?=$row ->barcode;?>"/>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><form><input type="text" name="Quantity<?=$i;?>" id="Quantity<?=$i;?>" /></form></td>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><form>
<input type="checkbox" name="status<?=$i;?>" id="status<?=$i;?>" value="yes" /> <br /></form></td>
</tr>
<? }}else{?>
<tr>
<td height="23" colspan="8" align="center" valign="middle" bgcolor="#FFFFFF" class="rows"><b>Selected product has not been processed yet</b></td>
</tr>
<?}?>
</table></td>
</tr>
<input type="hidden" name="numOflimit" id="numOflimit" class="button" value="<?=$i?>"/>
<tr><td><input type="hidden" name="cart1" id="cart1"></td></tr>
</table>
<form><tr><td align="center" > <button onclick="go()">Submit</button></td> </tr>
<tr> <td id="cart"> </td> </tr>
<div id="test"></div>
</form></form>
</form>
----------------------jquery code----------------------------------------------
for(k=0;k<=9000;k++)
{ //each change
$("#status"+k).change(function () {
var numOflimit = encodeURIComponent($('#numOflimit').val());
//alert(numOflimit);
for(j=0;j<=numOflimit;j++)
{
var product_name = encodeURIComponent($('#product_name'+j).val());
//alert(product_name);
var barcode = encodeURIComponent($('#barcode'+j).val());
var Quantity = encodeURIComponent($('#Quantity'+j).val());
//var unitBag = encodeURIComponent($('#unitBag'+k).val());
//var postData = $("form").serialize();
// alert(postData);
var cart=product_name + barcode + Quantity;
alert(cart);
$('#cart1').val(cart);
}
});
}
Have you got a full working example up somewhere, so we can actually try it? Debugging an undefined error in JavaScript just by looking at a code snippet is not the easiest thing in the world.
You might also want to open the JavaScript error console in Firefox and check the errors tab straight after the alert appears, as that might give you a useful pointer as to where the error is (e.g. you've forgotten to define a variable somewhere). The Firebug plugin may also help: http://getfirebug.com/
Edit: Just noticed that this seems to be a duplicate of another question:
jquery to fetch the contents from a selected row in php

Fetching the particular values from a table onclick of checkbox in php

My table has the following structure
When i click the particular checkbox.Its should be able to fetch the value of all the contents in the selected checkbox ..It can be n number of products the user selects.How to achieve this .... Thanks for ur time
Source code
<?$this->load->view('admin/header');?>
<script type="text/javascript">
function go() {
var frm = document.frm;
var status = inwardProductList['status'];
var statusList = [];
for ( i = 0; i < status.length; i++ ) {
if ( status[i].checked ) {
statusList.push('status=' + status[i].value);
}
}
document.getElementById('test').innerHTML = 'ajaxUrl?' + statusList.join('&');
}
</script>
<script src="<?=base_url();?>js/calendar.js" type="text/javascript"></script>
<form name="inwardProductList" action="" method="post" >
<table width="100%" border="0" cellpadding="0" cellspacing="0" align="center" class="formtable">
<tr>
<td height="23" colspan="8" align="center" valign="middle" bgcolor="#FFFFFF" class="rows"><b>Cart Display</b></td>
</tr>
<tr>
<td height="66" align="left" valign="top"><table width="99%" id="suppliedtable" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td width="4%" height="43" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Sl.no</strong></td>
<td width="20%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Product Name</strong></strong></td>
<td width="20%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Barcode</strong></td>
<td width="8%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Selection</strong></strong></td>
<td width="8%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Quantity</strong></strong></td>
</tr>
<?
$i=0;
if($productName->num_rows() >0){
foreach($productName->result() as $row ){
$i++;
?>
<tr>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$i;?></td>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$row->product_name?></td>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$row->barcode?></td>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><form>
<input type="checkbox" name="status" value="yes" /> Yes<br /></form></td>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><form>Quantity: <input type="text" name="Qunatity" /></form></td>
</tr>
<? }}else{?>
<tr>
<td height="23" colspan="8" align="center" valign="middle" bgcolor="#FFFFFF" class="rows"><b>Selected product has not been processed yet</b></td>
</tr>
<?}?>
</table></td>
</tr>
</table>
<form><tr><td align="center" > <button onclick="go()">go</button></td> </tr>
<div id="test"></div>
</form></form>
</form>
This could be achieved using jquery, something like this, though its off the top of my head so there may be errors.
Jquery stuff
$('tr.tableRow > input[type=checkbox]').click(function() {
var productName = $(this).find('td.productName').text();
var productBarcode = $(this).find('td.productBarcode').text();
var productQuantity = $(this).find('input[name=productQuantity]').val();
});
HTML stuff
<tr class="table-row">
<td class="productName">My Product</td>
<td class="productBarcode">0389463844</td>
<td class="productSelected"><input type="checkbox" value="yes"></input></td>
<td class="productQuantity"><input type="text" name="productQuantity"></td>
</tr>

Php HTML DOM parsing

<table width="100%" cellspacing="0" cellpadding="0" border="0" id="Table4">
<tbody>
<tr>
<td valign="top" class="tx-strong-dgrey">
<a class="anc-noul" href="http://www.example.com/catalog/proddetail.asp?logon=&langid=EN&sku_id=0665000FS10129471&catid=25653">
Apple 8GB 3rd Generation iPod Touch</a></td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-grey">
Product detail
<a href="http://www.example.com/catalog/proddetail.asp?logon=&langid=EN&sku_id=0665000FS10129471&catid=25653">
More Info</a></td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-red">
<span class="tx-strong-dgrey">Price:</span>
$189.99</td>
</tr>
<tr>
<td valign="top">You save: $9.00 after instant savings</td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-grey">
<a href="http://www.example.com/catalog/subclass.asp?catid=25653&logon=&langid=EN">
View similar products</a>
<a href="http://www.example.com/catalog/mfr.asp?man=Apple&catid=19&logon=&langid=EN">
View similar products with same brand</a>
</td></tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
</tbody>
</table>
I want to be able to get the $189.99.
echo $ret[0]->find('tr', 4)->plaintext;
This outputs: 'Price: $189.99'
I just need $189.99, not 'Price:'
$exp = explode(":", $ret[0]->find('tr', 4)->plaintext);
$price =$exp[1];

Categories