CakePHP 3 CakePDF Plugin: Generating a large PDF by looping through content? - php

So my problems stems from trying to generate large PDF files but being hit by the memory limit / execution timeouts in PHP, the data is too great in volume to simply extend these limits so that solution is out of the question.
I have a background shell task running which handles all of this rendering and then alerts the user once the PDF has been completed.
In theory I would have a loop within this shell which would take in a chunk of data and render it to file, then take the next chunk and do the same. Once out of data to render, the file would then be written and completed ready to be served. This way the memory limit of PHP would not be hit as a manageable chunk will only ever be loaded.
I am currently using the CakePDF(v.3.5) plugin for CakePHP 3 (v.3.5.13) but am sturggling to find a solution which allows the rendering of some data and then adding more data to the same pdf.
Has anyone managed this before or is it out of scope of the plugin? Would another solution be to create multiple PDF files and then merge them together after all separate PDF's have been created?
This is more of a theoretical question if this would work and if anyone has managed it before. I don't have much code to show but if more detail is required then give me a shout and I will try and get something for you or some example code!
Thanks

I don't have direct experience with that version CakePdf, but under CakePHP 2.x I use the wkhtmltopdf engine which takes an .html output to produce the PDF.
If your shell generates such .html in chunks, it is easy to append.
Of course wkhtmltopdf is likely to put some load on the machine to produce the PDF, but since it's a binary, it happens outside of PHP's memory/time contraints.

That's certainly out of the scope of the plugin, it's built around the idea of rendering a single view to a single file, the interface doesn't support chunked creation of a single file, and if I'm not mistaken, none of the supported engines do support that either, at least not in a straightforward and efficient manner when it comes to large documents.
There's certainly lots of ways to do this, creating multiple PDFs and merging/concatenating them afterwards might be one of them, generating the source content in chunks, and passing it to a PDF renderer that can handle lots of content efficiently might be another one, and surely there also might be libraries out there that do explicitly support chunked creation of PDFs...

I thought I would post what I ended up doing for anyone in the future.
I used CakePDF to generate smaller PDF's which I stored in a tmp directory these are all under the limit of PHP's execution time and memory limits as I don't believe altering those provides a good solution. In this step I also saved the names of all of the PDF's generated for use in the next step.
The code for this looked something like:
while (!is_last_pdf) {
// Generate pdf in here with a portion of the data
$CakePdf = new CakePdf();
$CakePdf->template('page', 'default');
$CakePdf->viewVars(compact('data', 'other_stuff'));
// Save file name to array
$tmp_file_list[] = $file_name;
// Update the is_last_pdf variable
is_last_pdf = check_for_more_data();
}
From this I used GhostScript from within the Shell to merge all of the PDF files, the code for this looked something like this:
$output_path = 'output.pdf';
$file_list = '';
// Create a string of all the files to merge
foreach ($tmp_file_list as $file) {
$file_list .= $file . ' ';
}
// Execute GhostScript to merge all the files into the `output.pdf` file
exec('gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=' . $output_path . ' ' . $file_list);
All of the code here was in the Shell file responsible for creating the PDF.
Hope this helps someone :)

Related

Create a zip file but provide it as a download without saving it on server [duplicate]

I am trying to generate an archive on-the-fly in PHP and send it to the user immediately (without saving it). I figured that there would be no need to create a file on disk as the data I'm sending isn't persistent anyway, however, upon searching the web, I couldn't find out how. I also don't care about the file format.
So, the question is:
Is it possible to create and manipulate a file archive in memory within a php script without creating a tempfile along the way?
I had the same problem but finally found a somewhat obscure solution and decided to share it here.
I came accross the great zip.lib.php/unzip.lib.php scripts which come with phpmyadmin and are located in the "libraries" directory.
Using zip.lib.php worked as a charm for me:
require_once(LIBS_DIR . 'zip.lib.php');
...
//create the zip
$zip = new zipfile();
//add files to the zip, passing file contents, not actual files
$zip->addFile($file_content, $file_name);
...
//prepare the proper content type
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=my_archive.zip");
header("Content-Description: Files of an applicant");
//get the zip content and send it back to the browser
echo $zip->file();
This script allows downloading of a zip, without the need of having the files as real files or saving the zip itself as a file.
It is a shame that this functionality is not part of a more generic PHP library.
Here is a link to the zip.lib.php file from the phpmyadmin source:
https://github.com/phpmyadmin/phpmyadmin/blob/RELEASE_4_5_5_1/libraries/zip.lib.php
UPDATE:
Make sure you remove the following check from the beginning of zip.lib.php as otherwise the script just terminates:
if (! defined('PHPMYADMIN')) {
exit;
}
UPDATE:
This code is available on the CodeIgniter project as well:
https://github.com/patricksavalle/CodeIgniter/blob/439ac3a87a448ae6c2cbae0890c9f672efcae32d/system/helpers/zip_helper.php
what are you using to generate the archive? You might be able to use the stream php://temp or php://memory to read and write to/from the archive.
See http://php.net/manual/en/wrappers.php.php
Regarding your comment that php://temp works for you except when you close it, try keeping it open, flushing the output, then rewind it back to 0 and read it.
Look here for more examples: http://us.php.net/manual/en/function.tmpfile.php
Also research output buffering and capturing: http://us.php.net/manual/en/function.ob-start.php
You need to use ZipArchive::addFromString - if you use addFile() the file is not actually added until you go to close it. (Horrible bug IMHO, what if you are trying to move files into a zip and you delete them before you close the zip...)
The addFromString() method adds it to the archive immediately.
Is there really a performance issue here, or does it just offend your sense of rightness? A lot of processes write temporary files and delete them, and often they never hit the disk due to caching.
A tempfile is automatically deleted when closed. That's it's nature.
There are only two ways I can think of to create a zip file in memory and serve it and both are probably more trouble than they are worth.
use a ram disk.
modify the ziparchive class to add a method that does everything the close() method does, except actually close the file. (Or add a leave-open parameter to close()).
This might not even be possible depending on the underlying C libraries.

Php count number of pages on PDF file upon upload prior to saving file

I have a function that uploads a file into a web storage and prior to saving the file on the storage system if the file is a pdf file i would like to determine how many pages a pdf file has.
Currently i have the following:
$pdftext = file_get_contents($path);
$num = preg_match_all("/\/Page\W/", $pdftext, $dummy);
return $num;
Where $path is the temporary path that i use with fopen to open the document
This function works at times but is not reliable. I know theres also this function
exec('/usr/bin/pdfinfo '.$pdf_file.' | awk \'/Pages/ {print $2}\'', $output);
But this requires the file to donwloaded on the server. Any ideas or suggestions to accomplish this?
PHP is a server-side language, meaning all processing happens on your server. There's no way for PHP to determine details of a file on the client side, it has no knowledge of it neither the required access to it.
So the answer to your question as it is now is: It's not possible. But you probably have a goal in mind why you want to check this, sharing this goal might help to get more constructive answers/suggestions.
As Oldskool already explained this is not possible with PHP on the client side. You would have to upload the PDF file to the server and then determine the amount of pages. There are libraries and command line tools that could accomplish this.
In case you don't want to upload the PDF file to the server (which seems to be the case here) you could use the pdf.js library. Now the client is able to determine the amount of pages in a PDF document on its own.
PDFJS.getDocument(data).then(function (doc) {
var numPages = doc.numPages;
}
There are other libraries as well but I'm not certain about their browser support (http://www.electronmedia.in/wp/pdf-page-count-javascript/)
Now you just submit the amount of pages from javascript to your php file that needs this information. In order to achive this you simply use ajax. In case you don't know ajax, just google it there are enough examples out there.
As a side note; Always remember to not trust the client. The client is able to modify the page count and send a completely different one.
For those of you running linux servers this actually is possible. You need the pdfinfo extension installed and using the function
$pages = exec('/usr/bin/pdfinfo '.$pdf_file.' | awk \'/Pages/ {print $2}\'', $output);
outputs the correct page number where $pdf_file is the temporary path on the server upon upload.
The reason it wasnt working for me was because i didnt have the PDFinfo installed.

Taking long time to load random images

I have a random image generator for my site. The problem is, it takes a really long time.. I was wondering if anybody could help to speed it up in any ways. The site is http://viralaftermath.com/, and this is the script:
header('Content-type: image/jpeg;');
$images = glob("images/" . '*.{jpg,jpeg,png,gif}', GLOB_BRACE);
echo file_get_contents($images[array_rand($images)]);
This is a pretty resource-intensive way to do this, as you are passing the image data through PHP and not specifying any caching headers, so the image has to be reloaded every single time you open the page.
A much better approach would be to have glob() list the files within the HTML page that you're using to embed the image. Then randomize that list, and emit an <img> tag pointing to the actual file name that you determined randomly.
When you are linking to a static image instead of the PHP script, you also likely benefit from the web server's caching defaults for static resources. (You could use PHP to send caching headers as well, but in this scenario it really makes the most sense to randomly point to static images.)
$images = glob("images/" . '*.{jpg,jpeg,png,gif}', GLOB_BRACE);
# Randomize order
shuffle ($images);
# Create URL
$url = "images/".basename($images[0]);
echo "<img src='$url'>";
Profile your code and find the bottlenecks. I can only make guesses.
echo file_get_contents($file);
This will first read the complete file into memory and then send it to the output buffer. It would be way nicer if the file goes directly into output buffer. readfile() is your friend. It would be even better to avoid buffering completely. ob_end_flush() will help you there.
A next candidate is the image directory. If searching for one image takes a significant time, you have to optimize that. This can be achieved by an index (e.g. with a database).

php and dompdf - how to generate a large pdf without having my server come to a screeching halt

I'm attempting to render a largish html table to pdf with dompdf. There is minimal css styling, but maybe 200 - 300 rows in the table. Each row has 4 td's with basic text.
I can generate a pdf of a smaller table with no issues but a larger table will exhaust memory limits and the script will terminate. What is the best way to approach this? I started a discussion on serverfault and one user suggested spawning a new process so as to not exhaust memory limits of php / apache? Would this be the best way to do this? That leads me to questions about how that would work, in that dompdf currently streams the download to the browser, but I'm assuming if I create a new process to generate the report, I can no longer send the output to the browser for the user to download?
Thanks to anyone who might be able to suggest a good way to tackle this!
If you render your HTML using a secondary PHP process (e.g. using exec()) the execution time and memory limits are eased. When rendering in this method you save the rendered PDF to a directory on the web site and redirect the user to the download (or even email them a link if you want to run a rendering queue). Generally I've found that this method does offer modest improvements in speed and memory use.
That doesn't, however, mean the render will perform significantly faster in your situation. Rendering tables with dompdf is resource intensive at present. What might work better, if you can, is to break the document into parts, render each of those parts separately (again using a secondary PHP process), combining the resulting collection of PDFs into a single file (using something like pdftk), then saving the PDF where the user can access it. I've seen a significant performance improvement using this method.
Or go with something like wkhtmltopdf (if you have shell access to your server and are willing to deal with the installation process).
I have had this trouble with many different generated file types, not just PDFs. Spawning processes did not help because the problem was in the sizes of the variables, and no matter how fresh the process, the variables were still too big. My solution was to create a file and write to it in manageable chunks, so that my variables never got above a certain size.
A basic, untested example:
$tmp = fopen($tmpfilepath, 'w');
if(is_resource($tmp)) {
echo 'Generating file ... ';
$dompdf = new DOMPDF();
$counter = 0;
$html = '';
while($line = getLineOfYourHtml()) {
$html .= $line;
$counter++;
if($counter%200 == 0) { //pick a good chunk number here
$dompdf->load_html($html);
$dompdf->render();
$output = $dompdf->output();
fwrite($tmp, $output);
echo round($counter/getTotalLines()).'%... '; //echo percent complete
$html = '';
}
}
if($html != '') { //the last chunk
$dompdf->load_html($html);
$dompdf->render();
$output = $dompdf->output();
fwrite($tmp, $output);
}
fclose($tmp);
if(file_exists($tmpfilepath)) {
echo '100%. Generation complete. ';
echo 'Download';
} else {
echo ' Generation failed.';
}
} else {
echo 'Could not generate file.';
}
Because it takes a while to generate the file, the echoes appear one after another, giving the user something to look at so they don't think the screen has frozen. The final echoed link will only appear after the file has been generated, which means the user is automatically waiting until the file is ready before they can download it. You may have to extend the max execution time for this script.

Manipulate an Archive in memory with PHP (without creating a temporary file on disk)

I am trying to generate an archive on-the-fly in PHP and send it to the user immediately (without saving it). I figured that there would be no need to create a file on disk as the data I'm sending isn't persistent anyway, however, upon searching the web, I couldn't find out how. I also don't care about the file format.
So, the question is:
Is it possible to create and manipulate a file archive in memory within a php script without creating a tempfile along the way?
I had the same problem but finally found a somewhat obscure solution and decided to share it here.
I came accross the great zip.lib.php/unzip.lib.php scripts which come with phpmyadmin and are located in the "libraries" directory.
Using zip.lib.php worked as a charm for me:
require_once(LIBS_DIR . 'zip.lib.php');
...
//create the zip
$zip = new zipfile();
//add files to the zip, passing file contents, not actual files
$zip->addFile($file_content, $file_name);
...
//prepare the proper content type
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=my_archive.zip");
header("Content-Description: Files of an applicant");
//get the zip content and send it back to the browser
echo $zip->file();
This script allows downloading of a zip, without the need of having the files as real files or saving the zip itself as a file.
It is a shame that this functionality is not part of a more generic PHP library.
Here is a link to the zip.lib.php file from the phpmyadmin source:
https://github.com/phpmyadmin/phpmyadmin/blob/RELEASE_4_5_5_1/libraries/zip.lib.php
UPDATE:
Make sure you remove the following check from the beginning of zip.lib.php as otherwise the script just terminates:
if (! defined('PHPMYADMIN')) {
exit;
}
UPDATE:
This code is available on the CodeIgniter project as well:
https://github.com/patricksavalle/CodeIgniter/blob/439ac3a87a448ae6c2cbae0890c9f672efcae32d/system/helpers/zip_helper.php
what are you using to generate the archive? You might be able to use the stream php://temp or php://memory to read and write to/from the archive.
See http://php.net/manual/en/wrappers.php.php
Regarding your comment that php://temp works for you except when you close it, try keeping it open, flushing the output, then rewind it back to 0 and read it.
Look here for more examples: http://us.php.net/manual/en/function.tmpfile.php
Also research output buffering and capturing: http://us.php.net/manual/en/function.ob-start.php
You need to use ZipArchive::addFromString - if you use addFile() the file is not actually added until you go to close it. (Horrible bug IMHO, what if you are trying to move files into a zip and you delete them before you close the zip...)
The addFromString() method adds it to the archive immediately.
Is there really a performance issue here, or does it just offend your sense of rightness? A lot of processes write temporary files and delete them, and often they never hit the disk due to caching.
A tempfile is automatically deleted when closed. That's it's nature.
There are only two ways I can think of to create a zip file in memory and serve it and both are probably more trouble than they are worth.
use a ram disk.
modify the ziparchive class to add a method that does everything the close() method does, except actually close the file. (Or add a leave-open parameter to close()).
This might not even be possible depending on the underlying C libraries.

Categories