I am updating an application using an old version of fPDF to the latest mPDF version 8.
My question is how to mitigate the much discussed slowness with HTML tables in mPDF.
The current fPDF code uses the Cell method and generates the PDF in less than a second.
By comparison, writing some simple HTML to use in mPDF generates a similar PDF in about 12-15 seconds.
My testing is with around 3500 records, but this can easily hit 10k records for some results.
Loading
I am loading the mpdf object like this;
$mpdf = new \Mpdf\Mpdf(array(
"tempDir" => "/tmp/pdf",
"mode" => "utf-8",
"format" => "Letter-P",
"margin_left" => "10",
"margin_top" => "10",
"margin_right" => "10",
"margin_bottom" => "10"
));
$mpdf->WriteHTML(file_get_contents("css/pdf.css"), \Mpdf\HTMLParserMode::HEADER_CSS);
$mpdf->simpleTables = true;
My css/pdf.css file
.cdrTable {
width: 25cm;
}
.cdrTable td {
text-align: center;
}
Writing
I've kept the HTML as simple as possible to prevent any auto-resizing issues.
Store this html in variable and write once using $mpdf->WriteHTML($html);
<h2>Results for this run</h2>
<p><strong>Sources are:</strong>
[pbx] From PBX
[sdc] From DC
</p>
Store a table of 50 records and write once using $mpdf->WriteHTML($html);
<table class='cdrTable'>
<tr>
<th>#</th>
<th>Call Started</th>
<th>Call From</th>
<th>Call To</th>
<th>Duration</th>
<th>Ext</th>
<th>Ext Label</th>
<th>Data Src</th>
</tr>
<tr>
<td>1</td>
<td>02-07-2020 04:52:39 PM</td>
<td>Polycom VVX601 [11]</td>
<td>Conference [86]</td>
<td>0:13</td>
<td>101</td>
<td>My Phone</td>
<td>pbx</td>
</tr>
.. Up to 50 records ..
</table>
Store this html in variable and write once using $mpdf->WriteHTML($html);
<table>
<tr>
<td>3555 records matched</td>
<td>Total: 1965:11</td>
</tr>
</table>
Finally
$mpdf->Output("reports.pdf", \Mpdf\Output\Destination::DOWNLOAD);
Conclusion
Outputting this to a browser is quick;
Peak Mem: 6.82 M, Time: 0.055243015289307
Generating the PDF is not so quick;
Peak Mem: 14.52 M, Time: 13.560908079147
I have tried writing 100, 200, 1000, and the whole result set at once without much difference in time.
Is there anything I can do to make this quicker?
Should I use something besides HTML tables or skip using HTML at all?
Related
I wanna write a table with the records from my database into PDF using mpdf. But when i retrieve the records using while loop my table header takes the effect, the records doing loop as well as table header because the "WriteHtml()" function inside the loop.
I've tried to solve this problem by calling "WriteHtml()" twice, which means to write the table header and content separtely, but the mpdf produces a blank page PDF file.
$html = "<table border='0' width='100%' cellspacing='0'>
<tr>
<th>ID</th>
<th>NAMA</th>
<th>PEKERJAAN</th>
<th>ALAMAT</th>
<th>SUKU</th>
</tr>";
$mpdf->WriteHTML(utf8_decode($html),\Mpdf\HTMLParserMode::DEFAULT_MODE, true, false);
while ($data = mysqli_fetch_array($mysqli_query)) {
$html2 = "<tr>
<td align='center'>".htmlspecialchars($id++)."</td>
<td align='center'>".htmlspecialchars($data['nama'])."</td>
<td align='center'>".htmlspecialchars($data['pekerjaan'])."</td>
<td align='center'>".htmlspecialchars($data['alamat'])."</td>
<td align='center'>".htmlspecialchars($data['suku'])."</td>
</tr>
</table>";
$mpdf->WriteHTML(utf8_decode($html2),\Mpdf\HTMLParserMode::DEFAULT_MODE,false, true);
}
$mpdf->Output();
I expect the content of PDf file output is the table like my index.php like this:
https://photos.smugmug.com/Stackoverflow/i-26sKBgt/0/ed83bc1c/L/expect_output%20-%20Copy-L.png
instead of like this:
https://photos.smugmug.com/Stackoverflow/i-cVszxJW/0/af15bde0/L/unexpect_output%20-%20Copy-L.png
i am sorry i post a link because i am not being able to post image right now
I have trouble calculating a specific column with Dom Document and Xpath.
This is how the source file looks like:
already some other tables and then...
<table><hr><tr><td><table>
<td align="center" colspan="1"><u><b>Contracts</b></u></td>
<tr><th>pos</th><th>player</th><th>age</th><th>year 1</th><th>year 2</th><th>year 3</th><th>year 4</th><th>year 5</th><th>year 6</th></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp>James Harden </td><td>27</td><td>20.00</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp>Terry Rozier </td><td>22</td><td>1.10</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SG</td><td CLASS=tdp>Danny Green </td><td>29</td><td>2.60</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SG</td><td CLASS=tdp>Marco Belinelli </td><td>30</td><td>1.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SF</td><td CLASS=tdp>Luol Deng </td><td>31</td><td>1.75</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SF</td><td CLASS=tdp>Jeremy Evans </td><td>28</td><td>7.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>PF</td><td CLASS=tdp>Jeff Withey </td><td>26</td><td>6.25</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>PF</td><td CLASS=tdp>Lavoy Allen </td><td>27</td><td>1.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp> C</td><td CLASS=tdp>Jonas Valanciunas </td><td>24</td><td>12.75</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp> C</td><td CLASS=tdp>Ryan Hollins </td><td>31</td><td>1.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SF</td><td CLASS=tdp>K.J. McDaniels </td><td>23</td><td>1.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp>Briante Weber </td><td>24</td><td>4.35</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SF</td><td CLASS=tdp>Nicolas Brussino </td><td>23</td><td>1.00</td><td></td><td></td><td></td><td></td><td></td></tr>
</table></td><td><table>
...
I worked with this code, similar to one I've found here, but I always get "0" as result.
$doc = new DOMDocument;
$doc->loadHTML('URL');
$xpath = new DOMXPath($doc);
// sum of cells of the sixth table (contracts), in the fourth column (year1), skipping the first row (ignore Year 1)
print $xpath->evaluate('sum(//table[6]//tr[position() > 1]/td[4])');
It can be difficult when using terms like table[6] in XPath as this is so dependant on the overall document structure. It's better if you can pick up on something like <b>Contracts</b> as part of the table your interested in and search for that table.
So you could try...
print $xpath->evaluate('sum(//table[td/u/b/.="Contracts"]/tr[position() > 1]/td[4])');
Update:
To help work out what it's doing you can break it down to levels and see what it's returning. To check if it's finding the table, use...
$table = $xpath->query('//table[td/u/b="Contracts"]');
echo $doc->saveHTML($table[0]);
Then add onto it to see where it's failing. One of the big difficulties can be that as your using HTML, is a constant problem of bad HTML gets converted into XML and it can loose some of it's structure.
I have to use the HTML_Template_Sigma PEAR Module to do an assignment on PHP that basically wraps all the HTML of a website to make templates with it instead of pasting the same HTML over and over. The thing is that all the content is added using variables and at some point I have to loop through an array inside one of the string variables (which has a table inside). So I checked the documentation which is not really abundant and it does have some sort of loop implementation but oriented to publications and I don't know how to use it to solve my problem.
http://www.pixel2life.com/publish/tutorials/13/pear_module_html_template_sigma/
http://pear.php.net/manual/en/package.html.html-template-sigma.intro-syntax.php
Still what they show is not exactly the same as this.
foreach ($data as $result) {
$plantilla->setCurrentBlock('table_row');
$plantilla->setVariable(array(
'date' => $result[0],
'epicentre' => $result[1],
'region' => $result[2],
'richter' => $result[3],
'mercalli' => $result[4]
));
$plantilla->parseCurrentBlock('table_row');
}
This is my variable:
content = '
<table>
<thead>
<tr>
<th>Date</th>
<th>Epicentre</th>
<th>Region</th>
<th>Mw Richter</th>
<th>Mercalli</th>
</tr>
</thead>
<tbody>
<!-- BEGIN table_row -->
<tr>
<td>{date}</td>
<td>{epicentre}</td>
<td>{region}</td>
<td>{richter}</td>
<td>{mercalli}</td>
</tr>
<!-- END table_row -->
</tbody>
</table>';
My array contains 5 columns of data. I've tried but to no avail.
Thanks in advance!
Trying to get the value of Internet Data Volume Balance - the script should echo 146.30mb
New to all these, having a look at all the tutorials.
How can this be done?
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Account Status</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">You exceeded your allowed credit.</FONT></div></td>
</tr>
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Period Free Time Remaining</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">0:00:00 hours</FONT></div></td>
</tr>
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Internet Data Volume Balance</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text" style="text-transform:none;">146.30 MB</FONT></div></td>
</tr>
If you were willing to or have already installed phpQuery, you can use that.
phpQuery::newDocumentFileHTML('htmlpage.html');
echo pq('td:eq(6)')->text();
PHP can interact with the DOM just like JavaScript can. This is vastly superior to parsing the markup, as most people will tell you is the wrong approach anyway:
Loading from an HTML File
// Start by creating a new document
$doc = new DOMDocument();
// I've loaded the table into an external file, and am loading it into the $doc
$doc->loadHTMLFile( 'htmlpage.html' );
// Since you have six table cells, I'm calling up all of them
$cells = $doc->getElementsByTagName("td");
// I'm grabbing the sixth cell's textContent property
echo $cells->item(5)->textContent;
This code will output "146.30 MB" to the screen.
Loading from a String
If you have the HTML stored within a string, you can load that into your document as well. We'll change the method used to load the file, into the method used to load from a string:
$str = "<table><tr><td>Foo</td></tr>...</table>";
$doc->loadHTML( $str );
We would then proceed with the same code as above to select the cells, and show their textContent in the output.
Check out the DOMDocument Class.
I have a table with the following structure. I cannot seem to get the data I want.
<table class="gsborder" cellspacing="0" cellpadding="2" rules="cols" border="1" id="d00">
<tr class="gridItem">
<td>Code</td><td>0adf</td>
</tr><tr class="AltItem">
<td>CompanyName</td><td>Some Company</td>
</tr><tr class="Item">
<td>Owner</td><td>Jim Jim</td>
</tr><tr class="AltItem">
<td>DivisionName</td><td> </td>
</tr><tr class="Item">
<td>AddressLine1</td><td>9314 W. SPRING ST.</td>
</tr>
</table>
This table is of course nested within another table within the page. How can I use DomDocument for example to refer to "Code" and "0adf" as a key value pair? They actually don't need to be in a key value pair but I should be able to call them each separately.
EDIT:
Using PHP Simple HTML, I was able to extract the data I needed using this:
$foo = $html->getElementById("d00")->childNodes(1)->childNodes(1);
The problem with this though is that I am getting the two <td></td> tags with my data. Is there a way to only grab the raw data without the tags?
Also, is this the right way to get my data out of this table?
If you're not dead set on using DOMDocument, try using the PHP Simple HTML DOM Parser. This has the benefit of allowing you to parse HTML which is not valid XML as well as providing a nicer interface to the parsed document.
You could write something like:
$html = str_get_html(...);
foreach($html->find('tr') as $tr)
{
print 'First td: ' . $tr->find('td', 0)->plaintext;
print 'Second td: ' . $tr->find('td', 1)->plaintext;
}