Problem
Currently working on creating a live PDF generation preview in Laravel PHP using using PDFI + TCPDF so that a user can import a base PDF and embed text on top of it
PDF generation is working fine for all sizes, but large PDFs (e.g. A1 size) generates a 10+ MB file that is too large to serve back to the front end for preview
Looking for the quickest and best method to either optimise and reduce the PDF file size, or resize the actual PDF dimension to serve a minified version for preview only
TLDR
Looking for suggestions (other than those I've tried below), or improvements on what I've tried to create a PDF preview file either through resizing or converting to image file of original large PDF.
What I've Tried So Far
Imagick PDF to Image Conversion - Good output size (31mb > 700kb) but slow (1secs > 10secs)
Converting PDF to image using Imagick before creating a minified PDF using the image was my original idea however I found that Imagick was really slow at reading the PDF blob image (takes about 9 seconds compared to sub 1 second for the PDF generation itself). Code is as follows
// $output === the PDF generated
$downscaleSizeFactor = $this->jsonFile->canvas['downscale_size_factor'] ?? 1;
$previewWidth = $this->size['width'] / $downscaleSizeFactor;
$previewHeight = $this->size['height'] / $downscaleSizeFactor;
$im = new Imagick;
$im->readImageBlob($output); // SLOW HERE!!!
$numPages = $im->getNumberImages();
$pdfPreview = new TCPDF($this->size["orientation"], 'mm', [$previewWidth, $previewHeight], true, 'UTF-8', false);
$pdfPreview->setPrintHeader(false);
$pdfPreview->setPrintFooter(false);
$pdfPreview->SetAutoPageBreak(false, 0);
for($i=0;$i<$numPages;$i++) {
$im->setIteratorIndex($i);
$selectedIm = $im->getImage();
$selectedIm->resizeImage($previewWidth, $previewHeight, imagick::FILTER_LANCZOS, 1, true);
$selectedIm->setImageBackgroundColor('white');
$selectedIm->setImageAlphaChannel(Imagick::ALPHACHANNEL_REMOVE);
$selectedIm->mergeImageLayers(Imagick::LAYERMETHOD_FLATTEN);
$selectedIm->setImageFormat('png');
$selectedIm->setImageCompressionQuality(100);
$imageString = $selectedIm->getImageBlob();
// add a page
$pdfPreview->AddPage();
// set JPEG quality
$pdfPreview->setJPEGQuality(100);
$pdfPreview->Image('#'.$imageString, 0, 0, $previewWidth, $previewHeight);
}
$im->clear();
$im->destroy();
return $pdfPreview->output('', 'S');
Rerun FPDI + TCPDF to generated Minified Version - Bad output size (31mb > 31mb) but extremely fast (1secs > 1.5secs)
Saving the generated PDF to a temporary folder, then using the generated PDF to generate a minified version for preview only. This worked great in terms of speed, however it didn't change the file size at all. Reducing from [600 mm x 800 mm] to [10 mm x 10 mm] did not reduce the file size at all, which is weird. Maybe I missed something if anyone can see. Code as follows
$reducedPdf = new FpdiTcpdfCustom();
$tempPdfFile = storage_path('app/templates/pdf/temp/'.$name.'');
$pageCount = $reducedPdf->setSourceFile($tempPdfFile);
$pageNo = 1;
for ($pageNo; $pageNo <= $pageCount; $pageNo++) {
// Checks if the page is to be skipped
// Import a page from the blank by setting the
$pageId = $reducedPdf->importPage($pageNo);
// Return the size of the imported page
$size = $reducedPdf->getTemplateSize($pageId);
// Remove default header/footer
$reducedPdf->setPrintHeader(false);
$reducedPdf->setPrintFooter(false);
$reducedPdf->SetAutoPageBreak(false, 0);
// Creates the PDF page
$reducedPdf->AddPage($size['orientation'], [10,10]);
$reducedPdf->useTemplate($pageId, 0, 0, 10, 10);
}
return $reducedPdf->output('', 'S');
Using Spatie\PdfToImage to generate Image File - Good output size (31mb > 218kb) but rubbish speed (1secs > 27secs just to convert and save image)
Similar to previous, looking to convert the PDF to an image file, before resizing and embedding the image into a PDF for preview. But it is extremely slow so I gave up on this method before even generating the PDF.
$tempPdf = new \Spatie\PdfToImage\Pdf(storage_path('app/templates/pdf/temp/'.$name));
$tempPdf->setCompressionQuality(10);
$tempPdf->saveImage(storage_path('app/templates/pdf/temp/'));
Suggestions?
Does anyone have suggestion on either improving my attempts, or another way of achieving what I need?
Related
I'm working with the PHP ImageMagick Library to convert uploaded pdf files to single png files so I can display the pdf as a single image on my Laravel webpage. So far I'm able to convert an entire pdf to a single image with this code:
<?php
$imagick = new Imagick();
$file = new File;
// other lines of code
// ...
$imgPath = Storage::path($file->file_path);
$imgSavePath = Storage::path('uploads/buffer/'.Str::beforeLast($file->name, '.').'.png');
$imagick->readImage($imgPath);
$imagick->resetIterator();
$imagick = $imagick->appendImages(true);
$imagick->writeImages($imgSavePath, true);
This works by producing a single png image. However, I find this to be resource intensive (storage wise) and time consuming because I'm delivering the functionality through an ajax call.
I want my web application to convert only the first n pages of the pdf (say first 5 pages) into a single image to act as a preview on the site - thereafter the user can download the entire pdf to view on their local system. The function should work regardless of the number of pages in an uploaded pdf document.
So far, I only find in the documentation where I can read a page at a particular index from the Imagick object and convert to an image using:
...
$imgPath = Storage::path($file->file_path);
$index = 5;
$imagick->readImage($imgPath. '[' . $index . ']');
However, I'm finding it difficult to refactor this so that the application can read the first n pages.
Intuitively, the readImage() function seems to work in a similar way as the command line syntax. Thanks to the hint from #MarkSetchell in the comments:
<?php
$imagick = new Imagick();
$file = new File;
// other lines of code
// ...
$imgPath = Storage::path($file->file_path);
$imgSavePath = Storage::path('uploads/buffer/'.Str::beforeLast($file->name, '.').'.png');
$imagick->readImage($imgPath.'[0-4]'); // read only the first 5 pages
$imagick->resetIterator();
$imagick = $imagick->appendImages(true);
$imagick->writeImages($imgSavePath, true);
I'm using ImageMagick 6.9.10-68 and PHP 8.1.12
I do upload PDFs to PHP and extract the pages as JPG in different resolutions in a kind of batch through JS + AJAX-Calls to work arround PHP timeout.
But the font is rendering not so pretty... what can I do?
$pdf = new Imagick();
$pdf->setresolution(225, 225);
$pdf->readimage('mypdf[0-5]');
$written = $pdf->writeimages('previewfolder/pages/hq-0.jpg', FALSE);
$pdf->clear();
$pdf->destroy();
I tryed upsetting the values of setresolution to 500and 500, then the font is a Little bit better but the Image is also much bigger in Resolution. Here an screenshot: http://imgur.com/5U88bx5
My Target: small Image (1000px*1000px) but with as max font-quality as possible.
Hopefully someone has an Idea.
Regards, lippoliv
as often: mistake 40 (the mistakes sits 40cm in front of his Monitor)...
$pdf = new Imagick();
$pdf->setresolution(350, 350);
$pdf->readimage('mypdf[0-5]');
// Because we have multiple pages, we have to process each page.
foreach ($pdf as $page) {
$page->resizeimage(1500, 1500, \Imagick::FILTER_UNDEFINED, 1.1, TRUE);
}
$written = $pdf->writeimages('previewfolder/pages/hq-0.jpg', FALSE);
$pdf->clear();
$pdf->destroy();
Thanks to Mark Setchell who brought in that Idea and made me thinking about why the resize doesn't work. And than ofter another hour of Google I found an example about resizeing Images, pointing out that you have to resize each Frame.
So I thought may I have to resize each single page of my PDF (in that example there are 6 pages) and now it works: http://imgur.com/UoP3kMK
Now I can up- / downscale the Image size as I like to and get nice Fonts :) Even as a JPG.
Thank you all.
I'm trying to find the fastest way to convert a multiple pdf in to jpeg images without quality loss. I have the following code:
$this->imagick = new \Imagick();
$this->imagick->setresolution(300,300);
$this->imagick->readimage($this->uploadPath . '/original/test.pdf');
$num_pages = $this->imagick->getNumberImages();
for($i = 0;$i < $num_pages; $i++) {
$this->imagick->setIteratorIndex($i);
$this->imagick->setImageFormat('jpeg');
$this->imagick->readimage($asset);
// i need the width, height and filename of each image to add
// to the DB
$origWidth = $this->imagick->getImageWidth();
$origHeight = $this->imagick->getImageHeight();
$filename = $this->imagick->getImageFilename();
$this->imagick->writeImage($this->uploadPath . '/large/preview-' . $this->imagick->getIteratorIndex() . '.jpg');
$this->imagick->scaleImage($origWidth / 2, $origHeight / 2);
$this->imagick->writeImage($this->uploadPath . '/normal/preview-' . $this->imagick->getIteratorIndex() . '.jpg');
//add asset to the DB with the given width, height and filename ...
}
This is very slow thou, partially because the resolution is so big but is i dont add it, the text on the images is of very poor quality. Also, the fact that i'm saving the image first, and then also saving a smaller version of the file is probably not very optimized.
So does anyone have a better method of doing this? Maybe with only ghostscript?
The minimum requirements are that i need 2 versions of the converted image. A real size version and a version at half size. And i need the width and height and filename to add to the database.
You can use Ghostscript, if you set "-sDEVICE=jpeg -sOutputFile=out%d.jpg" then each page will be written to a separate file.
Note that its not really possible to render a PDF to JPEG 'without quality loss' since JPEG is a lossy compression method. Have you considered using a lossless format instead, such as PNG or TIFF ?
To get the same images at different sizes you will need to render the file twice at different resolutions, you set resolution in Ghostscript with '-r'. The width and height can be read easily enough from the image file, or you can use pdf_info.ps to get each page size. Note that a PDF file can contain multiple pages of different sizes, I hope you aren't expecting them all to be the same....
I have pdf designed letter. Now using php I would like to fetch the address and put it on pdf letter and generate another pdf file with that address dynamically.
How can I do this.
using imagick / imagickdraw (ext: php-imagick) It's a pain to setup under windows but if you're running linux it's pretty quick and easy.
$Imagick = new Imagick();
$Imagick->setResolution(300,300);
$Imagick->readImage('my.pdf[' . $page_number . ']');
$width = $Imagick->getImageWidth();
$height = $Imagick->getImageHeight();
$height *= (float) (2550 / $width);
$Imagick->scaleImage(2550, $height);
if (0 != $rotation)
$Imagick->rotateImage(new ImagickPixel(), $rotation);
$scaled_ratio = (float)$Imagick->getImageWidth() / 850;
// put white boxes on image
$ImagickDraw = new ImagickDraw();
$ImagickDraw->setFillColor('#FFFFFF');
$ImagickDraw->rectangle($x1, $y1, $x2, $y2);
$Imagick->drawImage($ImagickDraw);
// put text in white box (really on canvas that has already been modified)
$ImagickDraw = new ImagickDraw();
/* Font properties for text */
$ImagickDraw->setFont('times');
$ImagickDraw->setFontSize(42); // 10 * 300/72 = 42
$ImagickDraw->setFillColor(new ImagickPixel('#000000'));
$ImagickDraw->setStrokeAntialias(true);
$ImagickDraw->setTextAntialias(true);
// add text to canvas (pdf page)
$Imagick->annotateImage(
$ImagickDraw,
$x1 + 4, // 1 * 300/72 = 4
$y1 + 42, // 10 * 300/72 = 42
0,
$the_text // do not use html.. strip tags and replace <br> with \n if you got the text rom an editable div. (which is what I'm doing)
);
$Imagick->writeImage($filename);
I actually use ghostscript to merge the pdfs (individual pages written to a temp directory) into a single pdf. The only problem I've seen is pages seem faded where I've used $Imagick->annotateImage() or $Imagick->drawImage(). I'm figuring that out right now which is why I found this question.
I guess It's a half answer but I hope it helps someone.
--- addition via edit 4/6/2012 ---
Found a way around the PDF image fading.
$Imagick->setImageFormat("jpg");
$Imagick->writeImage('whatever.jpg');
$Imagick = new Imagick();
$Imagick->setResolution(300,300);
$Imagick->readImage('whatever.jpg');
--- another addition via edit 5/1/2012 ---
Found a way around greyscale from pdf to tif looking awful.
Just one command.
Ghostscript PDF -> TIFF conversion is awful for me, people rave about it, I alone look sullen
$Imagick->blackThresholdImage('grey');
--- end of edit 5/1/2012 ---
$Imagick->setImageFormat("pdf");
$Imagick->writeImage($filename);
It's expensive for a license, but PDFlib is designed for such things - opening a template .pdf file and adding new items dynamically to produce an output pdf. There's other free PDF manipulation libraries such as TCPDF which can probably do the same thing.
I used fpdf with fpdi. It worked fine with me. I almost overlayed thousands of file without any problem.
I have the following code. It's used to combine various image attachments (and pdfs) into one PDF. For some reason, when I take even a single PDF and put it through the code, the end result comes out looking very bad compared to the original PDF. In addition, I can select text in the source PDF, but in the generated one I cannot.
Any help would be greatly appreciated.
// PDF object
$pdf = new Imagick();
$max_resolution = array('x' => 100, 'y' => 100);
foreach($attachment_ids as $attachment_id) {
$attachment = DAO_Attachment::get($attachment_id);
$file = Storage_Attachments::get($attachment);
// Temporarily store our attachment
$im = new Imagick();
$im->readImageBlob($file);
// We need to reset the iterator otherwise only one page will be rotated
$im->resetIterator();
// Get the resolution
$resolution = $im->getImageResolution();
if($resolution['x'] > $max_resolution['x']) {
$max_resolution['x'] = $resolution['x'];
}
if($resolution['y'] > $max_resolution['y']) {
$max_resolution['y'] = $resolution['y'];
}
$num_pages = $im->getNumberImages();
$rotation = array_shift($rotations);
$degrees = $rotation > 0 ? 360 - $rotation : 0;
$pages = array();
if($degrees > 0) {
// Rotate each page
for($i = 1; $i <= $num_pages; $i++) {
$im->nextImage();
$im->rotateImage(new ImagickPixel(), $degrees);
}
}
// We need to reset the iterator again so all of our pages will be added to the pdf
$im->resetIterator();
// If the image format isn't a pdf, convert it to a png
if($im->getImageFormat !== 'pdf') {
$im->setImageFormat('png');
// Opacity
if(method_exists($im, 'setImageOpacity'))
$im->setImageOpacity(1.0);
}
$im->setImageCompression(imagick::COMPRESSION_LOSSLESSJPEG);
$im->setImageCompressionQuality(100);
$im->stripImage();
// Add the rotated attachment to the PDF
$pdf->addImage($im);
// Free
$im->destroy();
}
// Create a composite
$pdf->setImageFormat('pdf');
// Compress output
$pdf->setImageCompression(imagick::COMPRESSION_LOSSLESSJPEG);
$pdf->setImageCompressionQuality(100);
$pdf->stripImage();
// Set resolution
$pdf->setImageResolution($max_resolution['x'], $max_resolution['y']);
This may be obvious to you already but a low quality image will not result in a high quality pdf. I don't know how good Imagick's pdf generation capabilities are, but it seems from your code you are converting images? You could compare by doing the same thing with TcPDF, though if the image is low quality I doubt you will get better results.
Also, if you have access to higher DPI resolution images than the usual web-optimised format, I recommend you use those to build your PDF instead. The quality will be a lot better.
ImageMagick uses GhostScript to convert PDFs to various raster image formats. GhostScript is quite good at this, but you're hand-cuffing it by scaling the page down to a max of 100x100.
An 8.5x11 (inches) page at 72 dpi, is 612x792 pixels.
Perhaps you meant to restrict DPI rather than resolution? The output still won't scale all that well (vector formats vs pixel formats), but I suspect it would be a big improvement.
It turns out the answer to this is to set the DPI using setResolution(). We do this before using readImageBlob() to get read the file containing our image, as it will change the DPI of the image based on the current resolution (so setting it afterwards won't work).
You could also use some math and use resampleImage() to do it after the fact, but setResolution() seems to be working perfectly for us.