MPDF unexpected bold text - php

Here is a code snippet I used to generate PDF when I used mPDF, version of the mPDF is v8.0.
Updated: 30/03/2022, added the settings of mPDF and the stylesheet I used. Found out the "overflow:hidden" in CSS settings triggered the problem.
But a question is, if I do not set the "overflow:hidden" on tables and sometimes when the text inside of the table is pretty long, and font-size will become smaller to make the table fit in a page.
$config = [
'mode' => 'c',
'format' => "A4",
'default_font' => 'arial',
'orientation' => "P",
];
$htmlContent = '<html>
<head>
<style>
#page {
margin-left: 12.7mm;
margin-right: 12.7mm;
margin-top: 20mm;
margin-bottom: 20mm;
margin-header: 5mm;
margin-footer: 5mm; /* <any of the usual CSS values for margins> */
marks: none;
}
table{
width:100%;
overflow:hidden;
}
</style>
</head>
<body>
<div class="page_holder">
<p>25/03/2022</p>
<p>Dear Dr Jayden,</p>
<p>Thank you for agreeing to undertake respirable crystalline silica health monitoring for the following worker.</p>
<table class="wp-block-advgb-table advgb-table-frontend is-style-padding ">
<tbody>
<tr>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">Jayden
</td>
</tr>
<tr>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff"><strong>Date of Birth</strong></td>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">31/12/1989
</td>
</tr>
<tr>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff"><strong>Description of tasks this worker will complete with engineered stone
fabrication</strong></td>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">Final Task
</td>
</tr>
<tr>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff"><strong>History working with engineered stone</strong></td>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">History
</td>
</tr>
</tbody>
</table>
<p><strong>If yes, please list previous work history with engineered stone:</strong></p>
<table class="wp-block-advgb-table advgb-table-frontend is-style-padding " style="table-layout: fixed">
<tbody>
<tr>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">Workers Name</td>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">Jayden
</td>
</tr>
<tr>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">Date of Birth</td>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">31/12/1989
</td>
</tr>
<tr>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">Description of tasks this worker will complete with engineered stone
fabrication</td>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">Final Task
</td>
</tr>
<tr>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff"><strong>History working with engineered stone</strong></td>
<td style="background-color:#f4f4f4;border-width:2px;border-top-color:#ffffff;border-right-color:#ffffff;border-bottom-color:#ffffff;border-left-color:#ffffff"
data-border-color="#ffffff">History
</td>
</tr>
</tbody>
</table>
<p>I confirm that the minimum health monitoring required has been identified in the attached document; WHSQ Health
monitoring standard - crystalline silica. Upon completion of the health monitoring could you please provide a report
for this worker that at a minimum contains the information outlined below.<br>Within the assessment, my business
requires a level of health monitoring that includes:</p>
<ul>
<li>Demographic, medical and occupational history</li>
<li>Records of personal exposure</li>
<li>Standardised respiratory questionnaire</li>
<li>Standardised respiratory function test, including FEV1, FVC, FEV1/FVC - it is strongly recommended this testing
be undertaken by an accredited respiratory function laboratory and include testing of diffusing capacity.
</li>
<li>Chest X-ray full-size PA view - it is strongly recommended an ILO X-ray be undertaken to allow for reading by a
B-reader.
</li>
</ul>
<p>Please include a confirmation in your report that all requirements of the standard have been met.</p>
</div>
</body>
</html>';
$mpdf = new \Mpdf\Mpdf($config);
$mpdf->WriteHTML($htmlContent);
$mpdf->Output();
Don't know what happens on unexpected bold text after the second table.
I have a bunch of attempt editing the html. I could replicate the problem that when I put piece of bold text between two tables. Could anyone help with if the bold text exists between two tables and keep the normal text after the second table?

Related

how to resolve min/max width error domp while generating pdf?

i'm getting the following error in my code while converting to pdf
there's no inline block statement included and width is defined for every table header still issue is persistent
<?php
//print_invoice.php
if(isset($_GET["pdf"]) && isset($_GET["id"]))
{
require_once 'pdf.php';
include('connection2.php');
$output = '';
$statement = $connect->prepare("
SELECT * FROM POrder
WHERE order_id = :order_id
LIMIT 1
");
$statement->execute(
array(
':order_id' => $_GET["id"]
)
);
$result = $statement->fetchAll();
foreach($result as $row)
{
$output .= '
<table width="100%" border="1" cellpadding="5" cellspacing="0">
<tr>
<td colspan="2" align="center" style="font-size:18px"><b>Invoice</b></td>
</tr>
<tr>
<td colspan="2">
<table width="100%" cellpadding="5">
<tr>
<td width="65%">
To,<br />
<b>Vendors Name</b><br />
Name : '.$row["vendorname"].'<br />
Description : '.$row["description"].'<br />
</td>
<td width="35%">
Reverse Charge<br />
Invoice No. : '.$row["order_no"].'<br />
Invoice Date : '.$row["order_date"].'<br />
</td>
</tr>
</table>
<br />
<table width="100%" border="1" cellpadding="5" cellspacing="0">
<tr>
<th>Sr No.</th>
<th>Item Name</th>
<th>Quantity</th>
<th>Price</th>
<th>Actual Amt.</th>
<th colspan="2">GST (%)</th>
<th rowspan="2">Total</th>
</tr>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>Rate</th>
<th>Amt.</th>
</tr>';
$statement = $connect->prepare(
"SELECT * FROM POrder_item
WHERE order_id = :order_id"
);
$statement->execute(
array(
':order_id' => $_GET["id"]
)
);
$item_result = $statement->fetchAll();
$count = 0;
foreach($item_result as $sub_row)
{
$count++;
$output .= '
<tr>
<td>'.$count.'</td>
<td>'.$sub_row["item_name"].'</td>
<td>'.$sub_row["item_quantity"].'</td>
<td>'.$sub_row["item_price"].'</td>
<td>'.$sub_row["item_price_bt"].'</td>
<td>'.$sub_row["item_gst"].'</td>
<td>'.$sub_row["item_price_at"].'</td>
<td>'.$sub_row["final_amount"].'</td>
</tr>
';
}
$output .= '
<tr>
<td align="right" colspan="11"><b>Total</b></td>
<td align="right"><b>'.$row["total_after_tax"].'</b></td>
</tr>
<tr>
<td colspan="11"><b>Total Amt. Before Tax :</b></td>
<td align="right">'.$row["total_before_tax"].'</td>
</tr>
<tr>
<td colspan="11">Add : GST :</td>
<td align="right">'.$row["gst"].'</td>
</tr>
<td colspan="11"><b>Total Tax Amt. :</b></td>
<td align="right">'.$row["order_total_tax"].'</td>
</tr>
<tr>
<td colspan="11"><b>Total Amt. After Tax :</b></td>
<td align="right">'.$row["total_after_tax"].'</td>
</tr>
';
$output .= '
</table>
</td>
</tr>
</table>
;
}
$pdf = new Pdf();
$file_name = 'Invoice-'.$row["order_no"].'.pdf';
$pdf->loadHtml($output);
$pdf->render();
$pdf->stream($file_name, array("Attachment" => false));
}
?>
// pdf.php
<?php
require_once 'dompdf/autoload.inc.php';
use Dompdf\Dompdf;
class Pdf extends Dompdf{
public function __construct() {
parent::__construct();
}
}
?>
i expect to get a pdf but instead i get this error
Fatal error: Uncaught exception 'Dompdf\Exception' with message
'Min/max width is undefined for table rows' in
/Applications/XAMPP/xamppfiles/htdocs/NTPC/dompdf/src/FrameReflower/TableRow.php:72
Stack trace: #0
/Applications/XAMPP/xamppfiles/htdocs/NTPC/dompdf/src/FrameDecorator/AbstractFrameDecorator.php(903):
Dompdf\FrameReflower\TableRow->get_min_max_width() #1
/Applications/XAMPP/xamppfiles/htdocs/NTPC/dompdf/src/FrameReflower/AbstractFrameReflower.php(268):
Dompdf\FrameDecorator\AbstractFrameDecorator->get_min_max_width() #2
/Applications/XAMPP/xamppfiles/htdocs/NTPC/dompdf/src/FrameDecorator/AbstractFrameDecorator.php(903):
Dompdf\FrameReflower\AbstractFrameReflower->get_min_max_width() #3
/Applications/XAMPP/xamppfiles/htdocs/NTPC/dompdf/src/FrameReflower/AbstractFrameReflower.php(268):
Dompdf\FrameDecorator\AbstractFrameDecorator->get_min_max_width() #4
/Applications/XAMPP/xamppfiles/htdocs/NTPC/dompdf/src/FrameDecorator/AbstractFrameDecorator.php(903):
Dompdf\FrameReflower\AbstractFrameReflower->get_min_max_width in
/Applications/XAMPP/xamppfiles/htdocs/NTPC/dompdf/src/FrameReflower/TableRow.php
on line 72
It seems that including dompdf like you do is no longer supported, see issue 1153. The guy who's asking gets exactly the same error messages as you do.
I'd recommend to follow the dompdf installation manual and install it with composer (as it is imo thyoue most hassle-free way in the long term). I've also found something on installing composer on XAMPP, but I can't really help with this since I don't know XAMPP. As a fallback you could download a pre-configured package (described some lines below).
And also cheack the quick start tutorial to see if dompdf genereally works instead of using your own code first, because some of it might be deprecated.
Hope this helps, good luck!
Do not apply display property to your table (not in inline styles or external styles).
Found from web:
In this case, the fix ended up being pretty simple, it didn’t like the inline style display:block; that I had added to the table.
Upon a little more testing, I found that it would allow for display:inline; or display:inline-block;.
This makes sense as a table has natively the property display:table; and I think block is probably not really valid (although works fine in browsers, is a neat trick to apply to td elements to create a responsive table, and didn’t generate any warnings during validation.
The solution that worked for me was to downgrade dompdf to 1.0.0
Am not saying it's the best solution but as for now with it was not depending on so many other packages but as it comes with phenx/php-svg-lib and phenx/php-font-lib those were also downgraded.
And also the downside of this is that it was installed in the main packages in composer.json
Note: This will upgrade, downgrade and remove packages currently locked to specific versions of the dompdf/dompdf:1.0.0
The command I used is composer require dompdf/dompdf:1.0.0 -w
For me #Ghazni Ali had the right cause.
Adding any type of display to a table made this error occur.
I was trying to get my elements properly spaced and inline.
What I found was I had to add width to my table and then add additional widths to the td inside of the table.
Below is trying to get a 20% and 80% split.
<table style="width: 100%;">
<tbody>
<tr>
<td style="width: 20% !important; border: 1px solid black;">
<p >Left Test</p>
</td>
<td style="width: 80% !important; border: 1px solid black;">
<p >Right Test</p>
</td>
</tr>
</tbody>
</table>
I tried similar methods with using a div tag but it wouldn't appear correct.
The first example is with the table and the bottom two are with div tags.

Pull mysql data into html email with php

Hello so basically I have an invoice email that gets sent out every night containing order information from todays orders. I am in the process of trying to automate this email with php and the mysql data but I am not quite sure what to do. Just an FYI I am new to html. I am used to bash and or python.
Below is a sample of the email html file and where I am trying to put order information
<!-- INSERT HERE Product quantity for Kit 1 Blue Dot -->
1</td>
<td width="30" class="wz2">
</td>
</tr>
<tr>
<td colspan="3" height="20" style="font-size:0;line-height:1;" class="va2">
</td>
</tr>
</table>
</th>
<th width="139" class="stack3" data-border-left-color="borderColor" data-
border-bottom-color="borderColor" style="border-left:1px solid
#dde5f1;border-bottom:1px solid #dde5f1;margin:0; padding:0;">
<table width="139" align="center" cellpadding="0" cellspacing="0" border="0"
class="table60033">
<tr>
<td colspan="3" height="20" style="font-size:0;line-height:1;" class="va2">
</td>
</tr>
<tr>
<td width="30" class="wz2">
</td>
<td class="RegularText5TD" data-link-style="text-decoration:none;
color:#67bffd;" data-link-color="RegularLink" data-color="RegularTXT"
style="color: #425065;font-family: sans-serif;font-size: 14px;font-weight:
lighter;text-align: center;line-height: 23px;">
<a href="#" target="_blank" data-color="RegularLink" style="text-decoration:
none;color: #67bffd;">
</a>
<!--INSERT HERE Total for Kit 1 Blue Dot-->
$22.00</td>
<td width="30" class="wz2">
</td>
</tr>
<tr>
<td colspan="3" height="20" style="font-size:0;line-height:1;" class="va2">
</td>
</tr>
What I would like to do is via php run the mysql query to populate and get the quantity of the product such as 'blue dot' shown here. Then take that quantity and multiply it by the price to get the total cost for the product. I got my queries and know what to run via php and grab the data. I just do not know how to get the data into this dynamic email template. I use phpmailer to mail this template. Any help would be great!

PHP Web Page Scraping

I am able to get the coding of a website with file_get_contents but I want to be able to get certain values out of the html. This piece of code is always the same but the value between the html tag changes from time to time. This is the HTML Code:
<div class="cheapest-bins">
<h3>Cheapest Live Buy Now</h3>
<table>
<tbody><tr>
<th>Console</th>
<th>Buy Now Price</th>
</tr>
<tr class=" active">
<td class="xb1">XB1</td>
<td>1,480,000</td>
</tr>
<tr class="">
<td class="ps4">PS4</td>
<td>1,590,000</td>
</tr>
<tr class="">
<td class="x360">360</td>
<td>---</td>
</tr>
<tr class="">
<td class="ps3">PS3</td>
<td>2,800,000</td>
</tr>
</tbody></table>
</div>
How would I go about getting the: 1,480,000 .. 1,590,000 .. --- and 2,800,000?
short answer:
find a css selector library such as https://github.com/tj/php-selector
then you could grab all td:last-child elements/innerhtml
for your specific example you could just just
preg_match_all('#<td>(.*?)</td>#', $html, $matches);

Pull data from a webpage to use [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
To give some background, I sell Lego parts online. The order total when you place the order is based on the price of the parts you purchased, and the shipping costs.
Shipping costs vary depending on the weight of the order, and the country of shipment.
I am not a techie buff, and thats why I need some help. I know the basics, but not much else, though I'd love to learn and I've been trying around with this for days before coming here.
The source code of an order page, the only place where you can see the weight is this:
<FONT CLASS="fv">Estimated Weight of Order:</FONT></TD><TD ALIGN="RIGHT"><FONT CLASS="fv">2.17oz 61.44g</FONT>
It is the same for every single order.
So, I know where the data I want is.
What I need help with is, coding something that pulls the data out of this webpage (say it's inside a webpage called order.com/order.asp and the document contains a bunch of other data apart from the weight) and exporting a shipping price based on the weight it inputed. I don't know whether you can do this with PHP or Python, etc.
I would have on my server a... say a table with the shipping costs based on weight. Now, what I needed, would be to take that bit of data from the order.com website into my own server. (On my own server process the weight data that I took, match it with the shipping cost, pull out invoices, etc). The weight data is in the order page, always on a line like the one I posted on the question. I just read about web scraping. Maybe some PHP that looks into the order page till it finds the line with the weight, and pulls out the weight?
Many, many, many thanks for your help, and I apologize in advance if I sound too uninformed, which I am. I really need a detailed explanation.
Gerald
*TL;DR*Two webpages. One is in my server and one isn't. The one that isn't in my server (order.asp), has this line:
<FONT CLASS="fv">Estimated Weight of Order:</FONT></TD><TD ALIGN="RIGHT"><FONT CLASS="fv">XX.XXoz XX.XXg</FONT>
I need something that I can put in my server, queries the weight from the page that isn't on my server (order.asp page) and matches the weight with a shipping price that I would have on my page (as a table or maybe with ifs).
There will be different order pages (order1.asp order2.asp order3.asp) with different weights. The script or whatever should do that for ea. wpage.
Thanks.
This would be the source code of an example page that I would need to take the weight of. Removed some sensitive info.
<SCRIPT TYPE="text/javascript" LANGUAGE="JavaScript">
function killImage(imgName){
if (document.images){
document.images[imgName].src="/images/noImage.gif"
}
}
function killImageM(imgName){
if (document.images){
document.images[imgName].src="/images/noImageM.gif"
}
}
</SCRIPT>
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
<META HTTP-EQUIV="IMAGETOOLBAR" CONTENT="NO">
<LINK REL="STYLESHEET" TYPE="text/css" HREF="/stylesheet.css?13">
<STYLE TYPE="text/css">body { margin: 15 auto; }</STYLE>
<SCRIPT TYPE="text/javascript" LANGUAGE="javascript" SRC="/js/getAjax.js"></SCRIPT>
<SCRIPT TYPE="text/javascript" LANGUAGE="javascript" SRC="/lytebox/lytebox.js?10"></SCRIPT>
<LINK REL="STYLESHEET" HREF="/lytebox/lytebox.css?13" TYPE="text/css" MEDIA="screen" />
</HEAD>
<BODY BGCOLOR="#666666">
<CENTER>
<TABLE WIDTH="680" CELLPADDING="10" CELLSPACING="0"><TR><TD BGCOLOR="#FFFFFF">
<TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0"><TR>
<TD><IMG SRC="/images/logowhite.gif" WIDTH="200" HEIGHT="60" ALIGN="ABSMIDDLE" BORDER="0"> </TD>
<TD> <FONT SIZE="+3">Order #3953198</FONT></TD></TR></TABLE><P><FONT FACE="Tahoma,Arial" SIZE="2">
<HR NOSHADE SIZE="1" COLOR="#000000"><B>Order Summary</B><HR NOSHADE SIZE="1" COLOR="#000000">
<TABLE WIDTH="100%" CELLPADDING="5" CELLSPACING="0" BORDER="0" BGCOLOR="#EEEEEE"><TR><TD WIDTH="60%" VALIGN="TOP">
<TABLE WIDTH="100%" BORDER="0" CELLPADDING="1" CELLSPACING="0" CLASS="ta">
<TR>
<TD WIDTH="125">Order Date:</TD>
<TD>Nov 20, 2013 17:12</TD>
</TR>
<TR>
<TD>Payment By:</TD>
<TD>PayPal.com</TD>
</TR>
<TR>
<TD>Payment In:</TD>
<TD>Euro</TD>
</TR>
<TR VALIGN="TOP">
<TD>Order Status:</TD>
<TD>Shipped</TD>
</TR>
<TR>
<TD>Changed:</TD>
<TD>Nov 22, 2013 14:15</TD>
</TR>
<TR>
<TD NOWRAP>Total Items:</TD>
<TD>24</TD>
</TR>
<TR>
<TD NOWRAP>Unique Items (Lots):</TD>
<TD>2</TD>
</TR>
<TR>
<TD NOWRAP>Invoiced:</TD>
<TD>Nov 21, 2013 08:56</TD>
</TR>
<TR VALIGN="TOP">
<TD NOWRAP>Shipping Method:</TD>
<TD>Registered<BR><FONT CLASS="fv">By default, with tracking number and insured up to 30 euros only.</FONT></TD>
</TR>
</TABLE>
</TD><TD WIDTH="40%" VALIGN="TOP">
<TABLE WIDTH="100%" BORDER="0" CELLPADDING="1" CELLSPACING="0" CLASS="ta">
<TR>
<TD>Order Total:</TD>
<TD ALIGN="RIGHT">EUR 8.92</TD>
</TR>
<TR>
<TD>Shipping:</TD>
<TD ALIGN="RIGHT">EUR 4.85</TD>
</TR>
<TR>
<TD>Insurance:</TD>
<TD ALIGN="RIGHT">EUR 0.00</TD>
</TR>
<TR>
<TD>Additional Charges 1:</TD>
<TD ALIGN="RIGHT">EUR 0.00</TD>
</TR>
<TR>
<TD>Additional Charges 2:</TD>
<TD ALIGN="RIGHT">EUR 0.00</TD>
</TR>
<TR>
<TD>Credit:</TD>
<TD ALIGN="RIGHT">EUR 0.00</TD>
</TR>
<TR>
<TD>Grand Total:</TD>
<TD ALIGN="RIGHT"><B>EUR 13.77</TD>
</TR>
<TR>
<TD>Orders in this Store:</TD>
<TD ALIGN="RIGHT">1</TD>
</TR>
</TABLE>
</TD></TR>
</TABLE><HR NOSHADE SIZE="1" COLOR="#000000"><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="100%" CLASS="ta"><TR><TD><B>Items in Order</B></TD></TR></TABLE><HR NOSHADE SIZE="1" COLOR="#000000"><TABLE WIDTH="100%" BORDER="0" CELLSPACING="1" CELLPADDING="3" CLASS="ta"><TR BGCOLOR="#C0C0C0"><TD><B>Image</B></TD><TD ALIGN="CENTER"><B>Condition</B></TD><TD><B>Item Description</B></TD><TD ALIGN="RIGHT"><B>Lots</B></TD><TD ALIGN="RIGHT"><B>Qty</B></TD><TD ALIGN="RIGHT"><B>Left</B></TD><TD ALIGN="RIGHT"><B>Price</B></TD><TD ALIGN="RIGHT"><B>Total</B></TD><TD ALIGN="RIGHT"><B>Weight</B></TD></TR><TR><TD COLSPAN="2" BGCOLOR="#C0C0C0"><B>Batch #1</B></TD><TD COLSPAN="7" BGCOLOR="#C0C0C0"><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="100%"><TR><TD><FONT CLASS="fv">Submitted on Nov 20, 2013 17:12</TD><TD ALIGN="RIGHT"><IMG SRC="/images/printer16.png" WIDTH="16" HEIGHT="16" BORDER="0" ALT="Print Batch" TITLE="Print Batch"><IMG SRC="/images/dot.gif" WIDTH="5" HEIGHT="1"><IMG SRC="/images/invoice16YC.gif" WIDTH="16" HEIGHT="16" ALT="Batch Invoiced" TITLE="Batch Invoiced"></TD></TR></TABLE></TD></TR><TR BGCOLOR="FFFFFF"><TD HEIGHT="60"><CENTER><A ID='imgLink0' HREF='/catalogItemPic.asp?P=60208' REL='blcatimg'><IMG ALT="Lot ID: 48295541 Part No: 60208 Name: Wheel 31mm D. x 15mm Technic" TITLE="Lot ID: 48295541 Part No: 60208 Name: Wheel 31mm D. x 15mm Technic" BORDER='0' WIDTH='80' HEIGHT='60' SRC='http://img.bricklink.com/P/86/60208.gif' NAME='img0' ID='img0' onError="killImage('img0');"></A><BR><FONT FACE='Tahoma,Arial' SIZE='1'>*</FONT></TD><TD ALIGN="CENTER"><B>New</B></TD><TD><SPAN CLASS="u"><FONT COLOR="#000000">Light Bluish Gray Wheel 31mm D. x 15mm Technic </FONT></SPAN><BR><FONT CLASS="fv">AB4</FONT></TD><TD ALIGN="RIGHT"> </TD><TD ALIGN="RIGHT">12</TD><TD ALIGN="RIGHT">X</TD><TD ALIGN="RIGHT">EUR 0.11</TD><TD ALIGN="RIGHT">EUR 1.32</TD><TD ALIGN="RIGHT"><FONT CLASS="fv">38.16g</TD></TR><TR BGCOLOR="EEEEEE"><TD HEIGHT="60"><CENTER><A ID='imgLink1' HREF='/catalogItemPic.asp?P=6179' REL='blcatimg'><IMG ALT="Lot ID: 49014568 Part No: 6179 Name: Tile, Modified 4 x 4 with Studs on Edge" TITLE="Lot ID: 49014568 Part No: 6179 Name: Tile, Modified 4 x 4 with Studs on Edge" BORDER='0' WIDTH='80' HEIGHT='60' SRC='http://img.bricklink.com/P/86/6179.gif' NAME='img1' ID='img1' onError="killImage('img1');"></A><BR><FONT FACE='Tahoma,Arial' SIZE='1'>*</FONT></TD><TD ALIGN="CENTER"><B>New</B></TD><TD><SPAN CLASS="u"><FONT COLOR="#000000">Light Bluish Gray Tile, Modified 4 x 4 with Studs on Edge </FONT></SPAN><BR><FONT CLASS="fv">AJ2</FONT></TD><TD ALIGN="RIGHT"> </TD><TD ALIGN="RIGHT">12</TD><TD ALIGN="RIGHT">X</TD><TD ALIGN="RIGHT">EUR 0.633</TD><TD ALIGN="RIGHT">EUR 7.596</TD><TD ALIGN="RIGHT"><FONT CLASS="fv">23.28g</TD></TR><TR BGCOLOR="#DDDDDD"><TD COLSPAN="3"><B>Batch Total:</B></TD><TD ALIGN="RIGHT">2</TD><TD ALIGN="RIGHT">24</TD><TD></TD><TD> </TD><TD ALIGN="RIGHT">EUR 8.92</TD><TD ALIGN="RIGHT"><FONT CLASS="fv">61.44g</TD></TR><TR BGCOLOR="#C0C0C0"><TD COLSPAN="3"><B>Order Total:</B></TD><TD ALIGN="RIGHT">2</TD><TD ALIGN="RIGHT">24</TD><TD></TD><TD> </TD><TD ALIGN="RIGHT">EUR 8.92</TD><TD ALIGN="RIGHT"></TD></TR><TR><TD COLSPAN="10" ALIGN="RIGHT" BGCOLOR="#EEEEEE"><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="100%"><TR><TD><FONT CLASS="fv">Estimated Weight of Order:</FONT></TD><TD ALIGN="RIGHT"><FONT CLASS="fv">2.17oz 61.44g</FONT></TD></TR></TABLE></TD></TR></TABLE><TABLE WIDTH="100%" BORDER="0" CELLPADDING="1" CELLSPACING="0" CLASS="ta"><TR><TD COLSPAN="2" CLASS="fv" ALIGN="RIGHT">Contact your buyer about this order<BR> </TD></TR></TABLE><HR NOSHADE SIZE="1" COLOR="#000000"><FONT CLASS="fv"><CENTER>This order will be purged from the BrickLink website on May 20, 2014.</CENTER></FONT></TABLE><FONT CLASS="fv"><P><CENTER><FONT COLOR="#FFFFFF">Back to Orders</FONT> | <FONT COLOR="#FFFFFF">Show Temporary Checkboxes</FONT> | <FONT COLOR="#FFFFFF">Show Categories</FONT> | <FONT COLOR="#FFFFFF">Consolidate Batches</FONT> | <FONT COLOR="#FFFFFF">My Settings</FONT><P><FONT COLOR="#FFFFFF">Hide Qty Left in My Inventory</FONT> | <FONT COLOR="#FFFFFF">Hide Item Weight</FONT> | <FONT COLOR="#FFFFFF">Show My Cost</FONT> | <FONT COLOR="#FFFFFF">Show Only Items in Order</FONT> | <FONT COLOR="#FFFFFF">Edit Order</FONT>
It's a little tough to write full-blown code without looking at the page you wish to scrape, but you should be able to use the following code to get what you want. The code below reads in a file called "html.txt", finds all orders in that text file, finds the total weight values in ozs and grams, and writes that data to an output file called foundWeights.txt. To run the code, just save your html in a text file called "html.txt", save the code below in a file called "findweights.py", and then put both of those files in the same folder. Then, open a shell or a terminal window, navigate to that folder, and type "python findweights.py" and momentarily a text file will appear in the same folder with your data in it.
html = open("html.txt").read()
out = open("foundWeights.txt", "w")
#split html on order number
legoOrders = html.split("Order #")
for order in legoOrders[1:]:
print order
orderNumber = order.split("<")[0]
weightString = order.split('Estimated Weight of Order:</FONT></TD><TD ALIGN="RIGHT"><FONT CLASS="fv">')[1]
splitWeightString = weightString.split(' ')
splitStringFinal = splitWeightString[1].split("<")
grams = splitStringFinal[0]
ozs = weightString.split('&nbsp')[0]
out.write(str(orderNumber) + "\t" + str(grams) + "\t" + str(ozs) + "\n"
Outfile is tab-separated (Order #, grams, ozs):
3953198 61.44g 2.17oz

Converting HTML emails to "Well Formed XHTML Code"

I've trying to submit html emails to amazon's mechanical turk using the questionform xml data scheme. I'm having issues converting the html emails into well formed html data. I just input a script to grab it from my table and print the data inside the tags of the html email, but as you can see below, it's terribly formed and will not pass to mechanical turk - I've had to send the data as htmlentities() until now and this makes it difficult for HIT workers to easily solve my issues. Here's an example of how not well formed the data is - any tips on how to send this data through mechanical turk (php) or convert it to well formed html data would be appreciated.
<body text="#333333" bgcolor="#ffffff" link="#073064" vlink="#073064"
alink="#073064">
<a name="top"></a>
<table width="100%" cellspacing="0" cellpadding="10" bgcolor="#f4f2ee">
<tbody>
<tr>
<td valign="top" align="left">
<table width="600" cellspacing="0" cellpadding="0" bgcolor="#ffffff">
<tbody>
<tr>
<td style="background-color:#e8e6dd;background-image:none;background-repeat:repeat;background-position:top left;background-attachment:scroll;font-size:10px;color:#948765;line-height:200%;font-family:verdana;" >Email not displaying correctly?
<a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/r" style="color:#948765;" >View
it in your browser.</a></td>
</tr>
<tr>
<td height="93" bgcolor="#ff6501"
background="http://i1.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/banner-tt_0.jpg">
<h1 style="font-size:30px;text-transform:lowercase;line-height:16px;color:#ffffff;font-family:Helvetica,Arial,sans-serif;text-indent:63px;margin-top:0;padding-top:29px;" >SitePoint <span
style="font-size: 17px; display: block; text-indent: 164px; color:
rgb(248, 255, 225); margin-top: 5px;">Tech Times</span></h1>
</td>
</tr>
<tr>
<td height="20" bgcolor="#C64F00" style="color:#e7fabd;font-family:arial;font-size:13px;" >
<span
id="Date" style="float:right;padding-left:5px;padding-right:5px;" ><strong>Issue 309:</strong> September 21,
2010 </span> Tips, Tricks, News and Reviews for Web Coders
</td>
</tr>
</tbody
</table>
<table width="600" cellspacing="0" cellpadding="0" bgcolor="#ffffff">
<tbody>
<tr>
<td colspan="5" height="10"></td>
</tr>
<tr>
<td width="10"></td>
<td rowspan="2" width="380" valign="top">
<table width="100%" border="0"
style="font-family:Verdana,Arial, Helvetica, sans-serif; font-size:13px;
color:#000">
<tr>
<td>
<a name='2'></a><h2 style="font-size:20px;font-weight:bold;color:#C64F00;font-family:arial;line-height:110%;" >
Introduction
</h2>
<p>
<img src="http://i2.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/lisa-lang_1.jpg" height="119"
align="left" width="130" border="0" alt="Lisa Lang" /><strong><em>Sal
</em>Tech Timers! Every week we aim to provide you with a feast of tech
geekyness -- but this issue is particularly HUGE, with goodies for
everyone. This week, I'm proud to present our latest SitePoint release <a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/y"><em>Host Your Web Site in
the Cloud</em></a><em> </em>by web evangelist Jeff Barr. Everything you
need to know about cloud computing -- and how to make it work for you --
can be found in this book. </strong>
</p>
<p>
In celebration of this release, we'll be
running a live webinar with Jeff Barr, Kevin Yank, Lucas Chan, and Louis
Simoneau. The webinar will begin at 9:00 a.m. (Australian Eastern Standard
Time) on Wednesday, 22nd September. For those in the US, the meeting starts
at 4:00 p.m. (Pacific Standard Time) on Tuesday, 21st September. Places are
limited, so hurry to register now for free here!
</p>
<p>
In the meantime, to get you in the mood for the wonderful world of cloud
computing, have a read through Toby Tremayne's latest addition to his
series "What Cloud Computing Can Mean for Your Business." Toby
shows you how to get started, and introduces a wide range of handy (and
free) applications.
</p>
<p>
Next, the other big news of the week was the release of IE9. Craig
Buckler takes a look at its interface, including some new features and
development tools of this "Beauty of the Web."
</p>
<p>
And last but not least, James Edwards has some fun with shadows for
complex shapes. He shows you how to create a fancy solution in CSS, even
managing to make it work on all modern browsers.
</p>
<p>
That should keep you busy until next week. As always, feel free to come
over and join the discussions in our forums.
</p>
<p>
Keep rocking!
</p>
<hr color="#c5b172" size="1" />
<h2 style="line-height:1.2em;" ><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/h"
style="color:red">Over 80% of Small Businesses Use Email Marketing ... But
Only a Handful Use It Effectively</a></h2><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/k"><img
src="http://i3.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/infusionsoft120x100-em20_2.jpg"
width="120" height="100" align="right" hspace="5" vspace="0" border="0"
/></a><p>Discover the secrets to effective, profitable email marketing
when you download the free report <em>"Email Marketing 2.0: the Three
Techniques That Will Actually Make a Difference In Your Email
Marketing."</em></p><p>Hint: this report does NOT cover subject line
suggestions, SPAM words to avoid, best time of day to send, or how to
address your contacts.</p><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/u"
style="color:red"><strong>Don't wait -- Download your free copy
now!</strong></a><div
style="margin-bottom:2em;padding-bottom:1em;border-bottom: 1px dotted
#C5B172;"></div>
<p><strong>Summary</strong></p>
<ul style="font-size:110%;line-height:150%;" ><li>Introduction</li><li><a href="#5">What Cloud Computing Can
Mean for Your Business, Part II: Starting Your Cloud
Infrastructure</a></li><li>The IE9 Beta Review</li><li>Creating Shadows Around Polygons in CSS</li><li>New Technical Articles</li><li><a href="#12">Techy Forum
Threads</a></li><li>More Techy Blog Entries</li></ul>
<div style="margin-bottom:2em;padding-bottom:1em;border-bottom-width:1px;border-bottom-style:dotted;border-bottom-color:#C5B172;" ></div>
<a name='5'></a><h2 style="font-size:20px;font-weight:bold;color:#C64F00;font-family:arial;line-height:110%;" >
Starting Your Cloud Infrastructure
</h2>
<h3 style="font-size:16px;font-weight:bold;color:#C64F00;font-family:arial;line-height:110%;" >
What Cloud Computing Can Mean for Your Business, Part II
</h3>
<div>
Starting out, what do we need? We have to be able to communicate
with our customers and suppliers, so we need email, perhaps instant
messaging. If we have overseas or long distance clients, some kind of VOIP
phone would help to keep costs down.
</div>
<div>
We must ensure that anything we're working on is properly backed
up from the business plan to product concepts and beyond. Any loss
of data could be crucial when getting your product or service to market at
the right time. Keeping an eye on our schedule is vital to make sure
important events, tasks, and meetings are managed.<br />
</div>
<div>
There's a lot more to address, but this much is enough to get us
up and running so we can get about the business of doing business.
 But if you're not an IT person and you don't know how to
setup email servers or backup systems, where do you begin?
</div>
<h3>
Email
</h3>
<div>
Google has been a provider of innovative products in the cloud for some
time, but many are unaware just how powerful these applications can be.
Gmail, for example, offers free email accounts with enormous amounts of
storage, and an easy-to-use interface. Your email can be accessed from
anywhere, you never have to delete as everything can be archived, and
backups are taken care of for you. The only issue is that you may want to
avoid sending emails to a big potential client from an anonymous, free
Gmail account; to make a proper impression, you need to be able to have
your own email address under your company name.
</div>
<div>
Google can still help you here, though; you can actually use the Gmail
system with your own domain name. As long as you have a <a
mce_href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/o"
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/o">domain</a>, you can purchase a
Google Apps for Business account for the trifling sum of US$50 a year,
which lets you transfer your email hosting to Google's servers.
It's a very simple process, and once done you have full IMAP
access to your email from anywhere in the world, with a guarantee of
availability and uptime that few can compete with certainly not in a
small business.
</div>
<div>
The cost increases as you add more email accounts, but it's still
less than you'd pay to host a server with your own email software on
it. You can set up a normal email client like Outlook or Apple mail to use
the server, or use the user-friendly Gmail interface on the Web. This
means that no matter what happens, you'll always have access to email and
the ability to send from your own email address, even if your personal
computers fail. There are more great benefits to using Google Apps, but
we'll explore those in the next article. For now, check out <a
mce_href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/b" href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/b">Google Apps
for Business</a> to get up and running with your email.
</div>
<h3>
Messaging
</h3>
Note: I removed a lot of code here in the middle so that it would allow me to post the size of this document.
<hr color="#c5b172" size="1" /><h2
style="font-size:15px;font-weight:bold;color:#C64F00;font-family:arial;line-height:110%;"><a
name='11'></a>New Technical Articles</h2><h3><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/tu"
style="color:#7B7B94"><strong>CSS3 Border Images for Beautiful, Flexible
Boxes</strong></a></h3>
<p><img src="http://i8.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/author_louis_lazaris_17.jpg"
hspace="3" alt="Louis Lazaris" align="left" width="67" height="80" />Among
the raft of CSS3 features gaining increasing levels of browser support, the
border-image property is often overlooked. In this article, Louis gives us
the lowdown on what it is and how to use it.</p>
<p align="right"><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/il"
style="color:#7B7B94">Full Story...</a></p><h3><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/ir"
style="color:#7B7B94"><strong>HTML5 and Even Fancier
Forms</strong></a></h3>
<p><img src="http://i9.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/author_tim_connell_18.jpg"
hspace="3" alt="Tim Connell" align="left" width="67" height="80" />Tim
Connell, co-author of SitePoint's Fancy Form Design, takes a look at
the new form input types available in HTML5, and gives you the skinny on
which ones you can start using right now.</p>
<p align="right"><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/iy"
style="color:#7B7B94">Full Story...</a></p>
<h2
style="font-size:15px;font-weight:bold;color:#C64F00;font-family:arial;line-height:110%;margin-top:2em;padding-top:1em;border-top:
1px dotted #C5B172;"><a name='12'></a>Techy Forum Threads</h2><ul
class="forums" style="margin-left:18px;padding-left:0;" ><li><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/ij"
style="color:#7B7B94">How do you organize your CSS?</a> in CSS</li><li><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/it"
style="color:#7B7B94">Jack of all Trades...</a> in .NET</li><li><a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/ii"
style="color:#7B7B94">Personification of software: The contest</a> in
General Chat</li></ul>
<hr color="#c5b172" size="1" /><h2
style="font-size:15px;font-weight:bold;color:#C64F00;font-family:arial;line-height:110%;"><a
name='13'></a>More Techy Blog Entries</h2><p style="font-size:80%;color:#aea194;" >Web Tech</p><table cellpadding="0" cellspacing="0"
border="0" width="100%"><tbody><tr>
<td valign="top"><img
src="http://i10.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/nlblog_19.gif" width="16"
height="19" /></td>
<td width="7"></td>
<td style="font-family:verdana;font-size:13px;" >
<a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/id"
style="color:#7B7B94">The Threat to Software Freedom</a>
</td>
</tr><tr>
<td colspan="2"></td>
<td><font size="-2" style="color:#AEA193;" >1 comment</td>
</tr><tr>
<td height="3"></td>
</tr><tr>
<td valign="top"><img
src="http://i10.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/nlblog_19.gif" width="16"
height="19" /></td>
<td width="7"></td>
<td style="font-family:verdana;font-size:13px;" >
<a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/ih"
style="color:#7B7B94">Apple: Stuff Ups, Mistakes, and Finally Moving
Forward?</a>
</td>
</tr><tr>
<td colspan="2"></td>
<td><font size="-2" style="color:#AEA193;" >19 comments</td>
</tr><tr>
<td height="3"></td>
</tr><tr>
<td valign="top"><img
src="http://i10.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/nlblog_19.gif" width="16"
height="19" /></td>
<td width="7"></td>
<td style="font-family:verdana;font-size:13px;" >
<a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/ik"
style="color:#7B7B94">Behind the Geek Out Scenes: Fancy Fonts and Jaunty
Input Fields</a>
</td>
</tr><tr>
<td colspan="2"></td>
<td><font size="-2" style="color:#AEA193;" >12 comments</td>
</tr><tr>
<td height="3"></td>
</tr></tbody></table><p style="font-size:80%;color:#aea194;" >JavaScript, CSS</p><table cellpadding="0" cellspacing="0"
border="0" width="100%"><tbody><tr>
<td valign="top"><img
src="http://i10.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/nlblog_19.gif" width="16"
height="19" /></td>
<td width="7"></td>
<td style="font-family:verdana;font-size:13px;" >
<a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/iu"
style="color:#7B7B94">High-performance String Concatenation in
JavaScript</a>
</td>
</tr><tr>
<td colspan="2"></td>
<td><font size="-2" style="color:#AEA193;" >11 comments</td>
</tr><tr>
<td height="3"></td>
</tr></tbody></table><p style="font-size:80%;color:#aea194;" >Web
Design</p><table cellpadding="0" cellspacing="0" border="0"
width="100%"><tbody><tr>
<td valign="top"><img
src="http://i10.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/nlblog_19.gif" width="16"
height="19" /></td>
<td width="7"></td>
<td style="font-family:verdana;font-size:13px;" >
<a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/dl"
style="color:#7B7B94">WordPress Trademark Transferred To WordPress
Foundation</a>
</td>
</tr><tr>
<td colspan="2"></td>
<td><font size="-2" style="color:#AEA193;" >2 comments</td>
</tr><tr>
<td height="3"></td>
</tr></tbody></table><p style="font-size:80%;color:#aea194;" >Community</p><table cellpadding="0" cellspacing="0" border="0"
width="100%"><tbody><tr>
<td valign="top"><img
src="http://i10.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/nlblog_19.gif" width="16"
height="19" /></td>
<td width="7"></td>
<td style="font-family:verdana;font-size:13px;" >
<a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/dr"
style="color:#7B7B94">Important People With Things to Say</a>
</td>
</tr><tr>
<td height="3"></td>
</tr></tbody></table><hr color="#c5b172" size="1" />
</td>
<td width="10"></td>
</tr>
<tr>
<td width="10"></td>
<td width="20"
background="http://i5.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/vertical-rule-bg_14.gif"></td>
<td width="180" valign="bottom" style="color:#000000;font-family:verdana;font-size:13px;" >
<div id="subscribe">
<h2 style="font-size:15px;font-weight:bold;color:#073064;font-family:Arial, Helvetica, sans-serif;line-height:110%;" >
Follow SitePoint on..
</h2>
<ul style="margin-left:5px;list-style-type:none;list-style-position:outside;list-style-image:none;padding-top:0;padding-bottom:0;padding-right:0;padding-left:0;" >
<li style="font-size:15px;" >
<img src="http://i1.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/subs-nl_20.gif"
border="0" height="27" width="27" align="bottom" alt="Newsletters" /><a
style="margin-left:5px; text-decoration:none; font-weight:400"
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/dy">Newsletters</a>
</li>
<li style="font-size:15px;" >
<img src="http://i2.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/subs-tw_21.gif"
border="0" height="27" width="27" align="bottom" alt="Twitter" /><a
style="margin-left:5px; text-decoration:none; font-weight:400"
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/dj" rel="nofollow">Twitter</a>
</li>
<li style="font-size:15px;" >
<img src="http://i3.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/subs-m_22.gif" border="0"
height="27" width="27" align="bottom" alt="Mobile" /><a
style="margin-left:5px; text-decoration:none; font-weight:400"
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/dt">Mobile</a>
</li>
<li style="font-size:15px;" >
<img src="http://i4.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/subs-pod_23.gif"
border="0" height="27" width="27" align="bottom" alt="Podcast" /><a
style="margin-left:5px; text-decoration:none; font-weight:400"
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/di">Podcast</a>
</li>
<li style="font-size:15px;" >
<img src="http://i5.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/subs-rss_24.gif"
border="0" height="27" width="27" align="bottom" alt="RSS" /><a
style="margin-left:5px; text-decoration:none; font-weight:400"
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/dd">RSS</a>
</li>
<li style="font-size:15px;" >
<img src="http://i6.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/subs-fb_25.gif"
border="0" height="27" width="27" align="bottom" alt="Facebook" /><a
style="margin-left:5px; text-decoration:none; font-weight:400"
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/dh" rel="nofollow">Facebook</a>
</li>
</ul>
</div>
<h2
style="font-size:15px;font-weight:bold;color:#C64F00;font-family:arial;line-height:110%;">Help
Your Friends Out</h2>
<p>People you care about can benefit from the wealth of information on
new
and maturing technologies available on the Internet. Help them learn
how to do it by forwarding them this issue of the Tech Times!</p>
<!--[if gte mso 0]><div style="display:none;" ><![endif]-->
<table cellspacing="0" cellpadding="0" bgcolor="#F9990C">
<tr>
<td colspan="3" height="3" bgcolor="#C2721C"></td>
</tr>
<tr>
<td colspan="3" height="10"></td>
</tr>
<tr>
<td width="10"></td>
<td style="color:#ffffff;font-family:verdana;font-size:13px;font-weight:bold;" >Send this to a friend</td>
<td width="20"></td>
</tr>
<tr>
<td colspan="3" height="5"></td>
</tr>
<tr>
<td width="10"></td>
<td style="font-family:verdana;font-size:13px;" >
<form autocomplete="on"
action="http://www.sitepoint.com/newsletter/forward" method="get"
style="margin: 0pt;">
<input name="newsletterid" value="3" type="hidden">
<input name="fromemail" value="rcavezza#gmail.com" type="hidden">
<input name="issuenum" value="309" type="hidden">
<input autocomplete="on" name="email" value="friend#example.com"
style="width: 120px;" type="text">
<br />
<input autocomplete="on" name="Send" value="Send" type="submit">
</form>
</td>
<td width="20"></td>
</tr>
<tr>
<td colspan="3" height="10"></td>
</tr>
</table>
<!--[if gte mso 0]></div><![endif]-->
</td>
<td width="10"></td>
</tr>
<tr>
<td colspan="5" height="20"></td>
</tr>
<tr>
<td colspan="5" height="10" bgcolor="#9999BC"></td>
</tr>
<tr>
<td bgcolor="#D7D7E5"></td>
<td bgcolor="#D7D7E5" colspan="4" style="font-family:verdana;font-size:10px;color:#5e5e91;" >
<table width="100%" cellspacing="0" cellpadding="0"
style="font-size:12px">
<tbody>
<tr>
<td width="100%" height="10" bgcolor="#D7D7E5"></td>
<td rowspan="3" bgcolor="#032a5c" style="color:#FFF;font-size:12px;text-align:center;" >
We send this newsletter using Campaign Monitor<br/>
<br/>
<a
href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/dk"><img
src="http://i7.cmail2.com/ei/y/4D/1C6/61A/231513/csimport/cm-passive-200x125_26.png" width="200"
height="125" hspace="10" border="0" alt="Campaign Monitor"/></a>
</td>
</tr>
<tr>
<td style="font-size:12x!important;font-family:arial,
verdana;" >
<p style="font-weight:bold;color:#353553;" >You are
subscribed as: <br>
<span style="font-size:13px;color:#CE6E11;font-weight:700;" ><code>rcavezza#gmail.com</code></span></p>
<ul>
<li><a href="http://sitepointcom.cmail2.com/t/y/u/cvkit/ddktkrydd/"
style="color:#7B7B94">Unsubscribe</a> from this list.</li>
<li><a href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/du"
style="color:#7B7B94">Manage your subscriptions</a>.</li>
<li><a href="http://sitepointcom.cmail2.com/t/y/l/cvkit/ddktkrydd/hl"
style="color:#7B7B94">View the newsletter archives</a>.</li>
</ul>
<p>
<span style="font-weight:bold;color:#353553;" >Mailing
Address:</span><br />
<span style="font-size:12px;" >48 Cambridge St, Collingwood, VIC,
3066 Australia</span>
</p>
<p><strong><span style="color:#353553;" >Phone:</span> +61 3
9090 8200</strong></p>
</td>
</tr>
<tr>
<td height="10" bgcolor="#D7D7E5"></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<img src="https://cmail2.com/t/y/o/cvkit/ddktkrydd/o.gif" width="1" height="1" border="0" style="height:1px !important;width:1px !important;border-width:0 !important;margin-top:0 !important;margin-bottom:0 !important;margin-right:0 !important;margin-left:0 !important;padding-top:0 !important;padding-bottom:0 !important;padding-right:0 !important;padding-left:0 !important;" ></body>
EDIT: Just changed it to fix the issues below, and it's still not passing the validation test - any additional steps I should take/try?
It is not that badly formed. Just call quoted_printable_decode() on it first.
edit: well, it solves a few problems, but it is still misformed as *********. Whatever possessed them not to quote whole lists of style declarations?
edit2: Ah, Bob removed the quotes all on his own. I assume with leaving the quotes there & quoted printable decode it would be solved.
Yea that looks like a mess, have you looked into or tried something like htmlPurifier?
There are a few others, but I do not know them as the only one I ever used was the htmlPurifier, but you may want to look into that (if that is what you are asking for).
You can use tidy to repair your HTML. But it looks very bad so you should start with fixing the script that produces the HTML before.
On a windows machine you might have to add or uncomment the following line in your php.ini to be able to use it:
extension=php_tidy.dll
Some very basic example from the documentation:
$html = '<p>test</I>';
$tidy = tidy_parse_string($html);
$tidy->cleanRepair();
echo $tidy;
This will output the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title></title>
</head>
<body>
<p>test</p>
</body>
</html>

Categories