I'm working on small webmail client. For safely embedding html I want to use HTML Purifier (BTW: it's a good idea?).
I checked it with several emails and some problems. One email (from Google) is having something like this:
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="4%">
<td width="92%" style="padding-top:18px; padding-bottom:10px; opacity:0.7">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<td width="30%">
<img style="display:inline-block;" height="26" src="https://www.gstatic.com/local/guides/email/images/photo-impact/googlelogo_light_clr-f040d5d9.png">
<td>
<td width="70%" style="text-align:right">
</td>
</tbody>
</table>
Converts to:
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="4%">
</td><td width="92%" style="padding-top:18px;padding-bottom:10px;opacity:.7;">
</td><td width="30%">
<img style="display:inline-block;" height="26" src="https://www.gstatic.com/local/guides/email/images/photo-impact/googlelogo_light_clr-f040d5d9.png" alt="googlelogo_light_clr-f040d5d9.png">
</td><td>
</td><td width="70%" style="text-align:right;">
</td>
</tr></table>
I don't know why it remove second <table> tag (also it close wrong <td> and removes <tbody>). Is it possible to change HTML Purifier to make it work for those situations?
Related
I need help to get the data from a table. It's an internet usage table and the html code is down below :
<table width="572" border="0" align="center" cellspacing="0">
<tbody><tr valign="top">
<td width="1" class="bgsidelines"></td>
<td width="*" class="bgbottom">
<table summary="" width="100%" border="0" cellpadding="0">
<tbody><tr>
<td width="10" rowspan="2" bgcolor="#CCCCCC"></td>
<td width="443">
<table width="443" height="10" border="0" align="center" cellpadding="8">
<tbody>
<tr>
<td width="100%" class="path"><b>Internet usage</b></td>
</tr>
<tr>
<td class="reg"><!-- Begin yours codes -->
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<table cellpadding="5" cellspacing="1" border="0">
<tbody>
<tr>
<td width="43" bgcolor="#EEEEEE" class="grey"><b><center>MB</center></b>
</td>
<td width="44" bgcolor="#EEEEEE" class="grey"><b><center>GB</center></b>
</td>
<td width="44" bgcolor="#EEEEEE" class="grey"><b><center>MB</center></b>
</td>
<td width="44" bgcolor="#EEEEEE" class="grey"><b><center>GB</center></b>
</td>
<td width="60" bgcolor="#EEEEEE" class="grey"><b><center>MB</center></b>
</td>
<td width="60" bgcolor="#EEEEEE" class="grey"><b><center>GB</center></b>
</td>
</tr>
<tr>
<td bgcolor="#FFFFFF" class="reg" nowrap="nowrap">2017-06-01 to<br>2017-
06-18</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">54815.06</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">53.53</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">52114.59</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">50.89</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">106929.65</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">104.42</td>
</tr>
</tbody></table></td></tr>
</tbody></table>
<!-- End yours codes -->
</tr>
</tbody></table></td></tr>
</tbody></table></td></tr>
</tbody></table>
I've done it in a way that works but only works sometimes, this must be due to the user agent. and it fetches the entire table while I would like each separated values for the internet usage, the ones in the td class="reg" (54815.06, 53.53..) It's hard because there is a table in table.. Also it's
My PHP :
require_once 'advanced_html_dom.php';
$numvl = $_POST['numvl'];
$url =
'https://extranet.videotron.com/services/secur/extranet/tpia/Usage.do?
compteInternet='.$numvl;
$html = new AdvancedHtmlDom();
$html->load_file($url);
$element = $html->find("tr");
echo $element[1]->innertext;
no need for some external lib (advanced_html_dom.php? never heard of), just use PHP's DOMDocument and DOMXPath.
example:
<?php
declare(strict_types=1);
$domd=#DOMDocument::loadHTML(getHTML());
$xpath=new DOMXPath($domd);
foreach($xpath->query("//td[#valign='top' and #class='reg']") as $ele){
var_dump($ele->textContent);
}
function getHTML():string{
$html=<<<'HTML'
<table width="572" border="0" align="center" cellspacing="0">
<tbody><tr valign="top">
<td width="1" class="bgsidelines"></td>
<td width="*" class="bgbottom">
<table summary="" width="100%" border="0" cellpadding="0">
<tbody><tr>
<td width="10" rowspan="2" bgcolor="#CCCCCC"></td>
<td width="443">
<table width="443" height="10" border="0" align="center" cellpadding="8">
<tbody>
<tr>
<td width="100%" class="path"><b>Internet usage</b></td>
</tr>
<tr>
<td class="reg"><!-- Begin yours codes -->
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<table cellpadding="5" cellspacing="1" border="0">
<tbody>
<tr>
<td width="43" bgcolor="#EEEEEE" class="grey"><b><center>MB</center></b>
</td>
<td width="44" bgcolor="#EEEEEE" class="grey"><b><center>GB</center></b>
</td>
<td width="44" bgcolor="#EEEEEE" class="grey"><b><center>MB</center></b>
</td>
<td width="44" bgcolor="#EEEEEE" class="grey"><b><center>GB</center></b>
</td>
<td width="60" bgcolor="#EEEEEE" class="grey"><b><center>MB</center></b>
</td>
<td width="60" bgcolor="#EEEEEE" class="grey"><b><center>GB</center></b>
</td>
</tr>
<tr>
<td bgcolor="#FFFFFF" class="reg" nowrap="nowrap">2017-06-01 to<br>2017-
06-18</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">54815.06</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">53.53</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">52114.59</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">50.89</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">106929.65</td>
<td bgcolor="#FFFFFF" align="right" valign="top" class="reg">104.42</td>
</tr>
</tbody></table></td></tr>
</tbody></table>
<!-- End yours codes -->
</tr>
</tbody></table></td></tr>
</tbody></table></td></tr>
</tbody></table>
HTML;
return $html;
}
output:
string(8) "54815.06"
string(5) "53.53"
string(8) "52114.59"
string(5) "50.89"
string(9) "106929.65"
string(6) "104.42"
Need help with parsing HTML code by PHP DOM.
This is simple part of huge HTML code:
<table width="100%" border="0" align="center" cellspacing="3" cellpadding="0" bgcolor='#ffffff'>
<tr>
<td align="left" valign="top" width="20%">
<span class="tl">Obchodne meno:</span>
</td>
<td align="left" width="80%">
<table width="100%" border="0">
<tr>
<td width="67%">
<span class='ra'>STORE BUSSINES</span>
</td>
<td width="33%" valign='top'>
<span class='ra'>(od: 02.10.2012)</span>
</td>
</tr>
</table>
</td>
</tr>
</table>
What I need is to get text "STORE BUSINESS". Unfortunately, the only thing I can catch is "Obchodne meno" as a content of first tag, so according to this content I need to get its parent->parent->first sibling->child->child->child->child->content. I have limited experience with parsing html in php so any help will be valuable. Thanks in advance!
Make use of DOMDocument Class and loop through the <span> tags and put them in array.
<?php
$html=<<<XCOE
<table width="100%" border="0" align="center" cellspacing="3" cellpadding="0" bgcolor='#ffffff'>
<tr>
<td align="left" valign="top" width="20%">
<span class="tl">Obchodne meno:</span>
</td>
<td align="left" width="80%">
<table width="100%" border="0">
<tr>
<td width="67%">
<span class='ra'>STORE BUSSINES</span>
</td>
<td width="33%" valign='top'>
<span class='ra'>(od: 02.10.2012)</span>
</td>
</tr>
</table>
</td>
</tr>
</table>
XCOE;
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('span') as $tag) {
$spanarr[]=$tag->nodeValue;
}
echo $spanarr[1]; //"prints" STORE BUSINESS
I'm in the process of creating a script for our internal customer support system. I want to collect emails from our IMAP inbox (hosted on Gmail) and parse the emails into the database.
What is the best way to clean frames, badly coded tags, and messy formatting so the result is a clean text with minimal formatting?
I'm aware Regular Expressions will most likely play heavily, but I want to know if this functionality exists in another library somewhere that I'm missing.
Edit: More specifically what needs removed:
All inline CSS/Styling, All HTML except simple formatting like Bold, Underline, and Italics.
Here's an email I'm using as a test case, It's a fairly beefy spam email I got from ZoneAlarm, It's got a bit of everything.
<td>
<br>
<br>
<table align="center" bgcolor="#749FD0" border="0" cellpadding="0" cellspacing="0" style="font-family:Arial,Helvetica,sans-serif;font-size:12px;line-height:16px;color:#555555" valign="top" width="700">
<tbody>
<tr>
<td>
<table align="center" border="0" cellpadding="0" cellspacing="0" valign="top" width="680">
<tbody>
<tr>
<td height="10">
<img border="0" height="1" src="http://download.zonealarm.com/bin/images/email/socialguard/spacer.gif" style="display: block; max-width: 2880px;" width="1"></td>
</tr>
</tbody>
</table>
<table align="center" border="0" cellpadding="0" cellspacing="0" valign="top" width="680">
<tbody>
<tr>
<td height="10" width="10">
<img border="0" height="10" src="http://www.zonealarm.com/email/campaigns/2013/2013_06_SummerSale/nw.png" style="display: block; max-width: 2880px;" width="10"></td>
<td bgcolor="#E3ECEC" height="10" width="660">
<img alt="ZoneAlarm by Check Point Software Technologies LTD." border="0" src="http://www.zonealarm.com/email/campaigns/2013/2013_05_MemorialDay/za_transparent.png" width="120" style="display: block; max-width: 2880px;" title="ZoneAlarm by Check Point Software Technologies LTD."></td>
<td align="right" style="font-family:Arial,Helvetica,sans-serif" width="150">
<span style="color:#999999;font-size:12px">Connect with ZoneAlarm</span></td>
<td align="right" valign="middle" width="125">
<img alt="ZoneAlarm Facebook" border="0" src="http://www.zonealarm.com/email/campaigns/2013/2013_05_MemorialDay/facebook.png" width="22" title="ZoneAlarm Facebook" style="max-width: 2880px;"> <img alt="ZoneAlarm Twitter" border="0" width="22" src="http://www.zonealarm.com/email/campaigns/2013/2013_05_MemorialDay/twitter.png" title="ZoneAlarm Twitter" style="max-width: 2880px;"> <img alt="ZoneAlarm YouTube" border="0" src="http://www.zonealarm.com/email/campaigns/2013/2013_05_MemorialDay/youtube.png" title="ZoneAlarm YouTube" height="22" style="max-width: 2880px;"><img border="0" height="15" src="http://download.zonealarm.com/bin/images/email/socialguard/spacer.gif" width="10" style="max-width: 2880px;"></td>
<td bgcolor="#E3ECEC" rowspan="6" align="center" valign="top" width="1">
<img align="right" height="32" src="http://download.zonealarm.com/bin/images/emails/welcome/borderx1.png" width="1" style="max-width: 2880px;">
</td>
</tr>
</tbody>
</table>
<table align="center" border="0" cellpadding="0" cellspacing="0" valign="top" width="680">
<tbody>
<tr>
<td height="10" width="10">
<img border="0" height="10" src="http://www.zonealarm.com/email/campaigns/2013/2013_06_SummerSale/sw.png" style="display: block; max-width: 2880px;" width="10"></td>
<td bgcolor="#E3ECEC" height="10" width="660">
You can use HTML Purifier for this, see: http://htmlpurifier.org/
On one of my old sites, which has a pretty messy and outdated code, I am having problems with navigation menu in Chrome. It aligns perfectly in Firefox and IE but for some reason in Chrome only first 3 tabs get properly centered.
http://jsfiddle.net/8b2Cm/1/
<table width="765" border="0" align="center" cellpadding="0" cellspacing="0" bgcolor="#FFFFFF">
<td valign="top">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr valign="top">
<td width="19%"><img src="http://LINK" alt="" width="331" height="95" border="0"></td>
<td width="81%"><img src="http://LINK/images/logo.jpg" alt="" width="434" height="95"></td>
</tr>
</table>
</td>
<td valign="top" class="back1"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr valign="top">
<td width="1%"><img src="http://LINK/images/left-top.jpg" width="23" height="30" alt=""></td>
<td width="98%" valign="middle"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="90%" class="left-text11"><div align="center"> Home</div></td>
<td width="10%"><img src="http://LINK/images/line1.jpg" width="8" height="30" alt=""></td>
</tr>
</table></td>
<td><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="90%" class="left-text11"><div align="center" class="left-text11">
<? if(!$_SESSION['sbprj_userid'])
{
?><strong>Signup</strong>
<?
}else
{
?>My Account
<?
}
?></div></td>
<td width="10%"><img src="http://LINK/images/line1.jpg" width="8" height="30" alt=""></td>
</tr>
</table></td>
<td><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="center" width="90%" class="left-text11"><div align="center">Free Poker Money </div></td>
</tr>
</table></td>
<td><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="10%"><img src="http://LINK/images/line1.jpg" width="8" height="30" alt=""></td>
</tr>
</table></td>
<td><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="center" width="90%" class="left-text11"><div align="center">Poker School</div></td>
</tr>
</table></td>
<td><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="10%"><img src="http://LINK/images/line1.jpg" width="8" height="30" alt=""></td>
</tr>
</table></td>
<td class="left-text11"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="90%" class="left-text11"><div align="center">News </div></td>
<td width="10%"><img src="http://LINK/images/line1.jpg" width="8" height="30" alt=""></td>
</tr>
</table></td>
<td class="left-text11"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="90%" class="left-text11"><div align="center">Support </div></td>
<td width="10%"><img src="http://LINK/images/line1.jpg" width="8" height="30" alt=""></td>
</tr>
</table></td>
<td class="left-text11"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="90%" class="left-text11"><div align="center"><img src="http://LINK/images/facebook.png" border="0" width="28" height="25" /> <img src="http://LINK/images/twitter.png" border="0" width="28" height="25" /> <img src="http://LINK/images/googleplus.png" border="0" width="28" height="25" /></div></td>
</tr>
</table></td>
</tr>
</table></td>
<td width="1%"><img src="http://LINK/images/right-top.jpg" width="24" height="30" alt=""></td>
</tr>
</table></td>
<td valign="top"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr valign="top">
<td width="1%"><img src="http://LINK/images/top-1.jpg" width="23" height="17" alt=""></td>
<td width="98%" class="back2"><img src="http://LINK/images/back2.jpg" width="9" height="17" alt=""></td>
<td width="1%"><img src="http://LINK/images/right-1.jpg" width="24" height="17" alt=""></td>
</tr>
</table></td>
This is the code, any suggestions on how to fix this ?
There is a mess with separators.
Some separators are inside table(horizontal table with tabs, by the way, you should put only that table into fiddle and code, not whole the mess making for us harder to understand) cells, other separators come all alone in separate column, therefore they are taking the same place as column. Check it and move them, like it's done in first two tabs
Attached the source code of PHP and jquery , gives undefined as alert ...later displays all the contents in the table... i need to just display the one which the user clicks on checkbox..... what mistake is in my code.
--------------------php---------------------------------------------------------
</script>
<script src="<?=base_url();?>js/calendar.js" type="text/javascript"></script>
<form name="inwardProductList" action="" method="post" >
<table width="100%" border="0" cellpadding="0" cellspacing="0" align="center" class="formtable">
<tr>
<td height="23" colspan="8" align="center" valign="middle" bgcolor="#FFFFFF" class="rows"><b>Cart Display</b></td>
</tr>
<tr>
<td height="66" align="left" valign="top"><table width="99%" id="suppliedtable" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td width="4%" height="43" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Sl.no</strong></td>
<td width="20%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Product Name</strong></strong></td>
<td width="20%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Barcode</strong></td>
<td width="8%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Quantity</strong></strong></td>
<td width="8%" align="center" valign="middle" bgcolor="#e7e6e6" class="rows"><strong>Select</strong></strong></td>
</tr>
<?
$i=0;
if($productName->num_rows() >0){
foreach($productName->result() as $row ){
$i++;
?>
<tr>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$i;?></td>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$row->product_name?></td>
<input type="hidden" name="product_name<?=$i?>" id="product_name<?=$i?>" class="button" value="<?=$row ->product_name;?>"/>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><?=$row->barcode?></td>
<input type="hidden" name="barcode<?=$i?>" id="barcode<?=$i?>" class="button" value="<?=$row ->barcode;?>"/>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><form><input type="text" name="Quantity<?=$i;?>" id="Quantity<?=$i;?>" /></form></td>
<td align="left" valign="middle" bgcolor="#FFFFFF" class="rows"><form>
<input type="checkbox" name="status<?=$i;?>" id="status<?=$i;?>" value="yes" /> <br /></form></td>
</tr>
<? }}else{?>
<tr>
<td height="23" colspan="8" align="center" valign="middle" bgcolor="#FFFFFF" class="rows"><b>Selected product has not been processed yet</b></td>
</tr>
<?}?>
</table></td>
</tr>
<input type="hidden" name="numOflimit" id="numOflimit" class="button" value="<?=$i?>"/>
<tr><td><input type="hidden" name="cart1" id="cart1"></td></tr>
</table>
<form><tr><td align="center" > <button onclick="go()">Submit</button></td> </tr>
<tr> <td id="cart"> </td> </tr>
<div id="test"></div>
</form></form>
</form>
----------------------jquery code----------------------------------------------
for(k=0;k<=9000;k++)
{ //each change
$("#status"+k).change(function () {
var numOflimit = encodeURIComponent($('#numOflimit').val());
//alert(numOflimit);
for(j=0;j<=numOflimit;j++)
{
var product_name = encodeURIComponent($('#product_name'+j).val());
//alert(product_name);
var barcode = encodeURIComponent($('#barcode'+j).val());
var Quantity = encodeURIComponent($('#Quantity'+j).val());
//var unitBag = encodeURIComponent($('#unitBag'+k).val());
//var postData = $("form").serialize();
// alert(postData);
var cart=product_name + barcode + Quantity;
alert(cart);
$('#cart1').val(cart);
}
});
}
Have you got a full working example up somewhere, so we can actually try it? Debugging an undefined error in JavaScript just by looking at a code snippet is not the easiest thing in the world.
You might also want to open the JavaScript error console in Firefox and check the errors tab straight after the alert appears, as that might give you a useful pointer as to where the error is (e.g. you've forgotten to define a variable somewhere). The Firebug plugin may also help: http://getfirebug.com/
Edit: Just noticed that this seems to be a duplicate of another question:
jquery to fetch the contents from a selected row in php