I have the following code. It's used to combine various image attachments (and pdfs) into one PDF. For some reason, when I take even a single PDF and put it through the code, the end result comes out looking very bad compared to the original PDF. In addition, I can select text in the source PDF, but in the generated one I cannot.
Any help would be greatly appreciated.
// PDF object
$pdf = new Imagick();
$max_resolution = array('x' => 100, 'y' => 100);
foreach($attachment_ids as $attachment_id) {
$attachment = DAO_Attachment::get($attachment_id);
$file = Storage_Attachments::get($attachment);
// Temporarily store our attachment
$im = new Imagick();
$im->readImageBlob($file);
// We need to reset the iterator otherwise only one page will be rotated
$im->resetIterator();
// Get the resolution
$resolution = $im->getImageResolution();
if($resolution['x'] > $max_resolution['x']) {
$max_resolution['x'] = $resolution['x'];
}
if($resolution['y'] > $max_resolution['y']) {
$max_resolution['y'] = $resolution['y'];
}
$num_pages = $im->getNumberImages();
$rotation = array_shift($rotations);
$degrees = $rotation > 0 ? 360 - $rotation : 0;
$pages = array();
if($degrees > 0) {
// Rotate each page
for($i = 1; $i <= $num_pages; $i++) {
$im->nextImage();
$im->rotateImage(new ImagickPixel(), $degrees);
}
}
// We need to reset the iterator again so all of our pages will be added to the pdf
$im->resetIterator();
// If the image format isn't a pdf, convert it to a png
if($im->getImageFormat !== 'pdf') {
$im->setImageFormat('png');
// Opacity
if(method_exists($im, 'setImageOpacity'))
$im->setImageOpacity(1.0);
}
$im->setImageCompression(imagick::COMPRESSION_LOSSLESSJPEG);
$im->setImageCompressionQuality(100);
$im->stripImage();
// Add the rotated attachment to the PDF
$pdf->addImage($im);
// Free
$im->destroy();
}
// Create a composite
$pdf->setImageFormat('pdf');
// Compress output
$pdf->setImageCompression(imagick::COMPRESSION_LOSSLESSJPEG);
$pdf->setImageCompressionQuality(100);
$pdf->stripImage();
// Set resolution
$pdf->setImageResolution($max_resolution['x'], $max_resolution['y']);
This may be obvious to you already but a low quality image will not result in a high quality pdf. I don't know how good Imagick's pdf generation capabilities are, but it seems from your code you are converting images? You could compare by doing the same thing with TcPDF, though if the image is low quality I doubt you will get better results.
Also, if you have access to higher DPI resolution images than the usual web-optimised format, I recommend you use those to build your PDF instead. The quality will be a lot better.
ImageMagick uses GhostScript to convert PDFs to various raster image formats. GhostScript is quite good at this, but you're hand-cuffing it by scaling the page down to a max of 100x100.
An 8.5x11 (inches) page at 72 dpi, is 612x792 pixels.
Perhaps you meant to restrict DPI rather than resolution? The output still won't scale all that well (vector formats vs pixel formats), but I suspect it would be a big improvement.
It turns out the answer to this is to set the DPI using setResolution(). We do this before using readImageBlob() to get read the file containing our image, as it will change the DPI of the image based on the current resolution (so setting it afterwards won't work).
You could also use some math and use resampleImage() to do it after the fact, but setResolution() seems to be working perfectly for us.
Related
Problem
Currently working on creating a live PDF generation preview in Laravel PHP using using PDFI + TCPDF so that a user can import a base PDF and embed text on top of it
PDF generation is working fine for all sizes, but large PDFs (e.g. A1 size) generates a 10+ MB file that is too large to serve back to the front end for preview
Looking for the quickest and best method to either optimise and reduce the PDF file size, or resize the actual PDF dimension to serve a minified version for preview only
TLDR
Looking for suggestions (other than those I've tried below), or improvements on what I've tried to create a PDF preview file either through resizing or converting to image file of original large PDF.
What I've Tried So Far
Imagick PDF to Image Conversion - Good output size (31mb > 700kb) but slow (1secs > 10secs)
Converting PDF to image using Imagick before creating a minified PDF using the image was my original idea however I found that Imagick was really slow at reading the PDF blob image (takes about 9 seconds compared to sub 1 second for the PDF generation itself). Code is as follows
// $output === the PDF generated
$downscaleSizeFactor = $this->jsonFile->canvas['downscale_size_factor'] ?? 1;
$previewWidth = $this->size['width'] / $downscaleSizeFactor;
$previewHeight = $this->size['height'] / $downscaleSizeFactor;
$im = new Imagick;
$im->readImageBlob($output); // SLOW HERE!!!
$numPages = $im->getNumberImages();
$pdfPreview = new TCPDF($this->size["orientation"], 'mm', [$previewWidth, $previewHeight], true, 'UTF-8', false);
$pdfPreview->setPrintHeader(false);
$pdfPreview->setPrintFooter(false);
$pdfPreview->SetAutoPageBreak(false, 0);
for($i=0;$i<$numPages;$i++) {
$im->setIteratorIndex($i);
$selectedIm = $im->getImage();
$selectedIm->resizeImage($previewWidth, $previewHeight, imagick::FILTER_LANCZOS, 1, true);
$selectedIm->setImageBackgroundColor('white');
$selectedIm->setImageAlphaChannel(Imagick::ALPHACHANNEL_REMOVE);
$selectedIm->mergeImageLayers(Imagick::LAYERMETHOD_FLATTEN);
$selectedIm->setImageFormat('png');
$selectedIm->setImageCompressionQuality(100);
$imageString = $selectedIm->getImageBlob();
// add a page
$pdfPreview->AddPage();
// set JPEG quality
$pdfPreview->setJPEGQuality(100);
$pdfPreview->Image('#'.$imageString, 0, 0, $previewWidth, $previewHeight);
}
$im->clear();
$im->destroy();
return $pdfPreview->output('', 'S');
Rerun FPDI + TCPDF to generated Minified Version - Bad output size (31mb > 31mb) but extremely fast (1secs > 1.5secs)
Saving the generated PDF to a temporary folder, then using the generated PDF to generate a minified version for preview only. This worked great in terms of speed, however it didn't change the file size at all. Reducing from [600 mm x 800 mm] to [10 mm x 10 mm] did not reduce the file size at all, which is weird. Maybe I missed something if anyone can see. Code as follows
$reducedPdf = new FpdiTcpdfCustom();
$tempPdfFile = storage_path('app/templates/pdf/temp/'.$name.'');
$pageCount = $reducedPdf->setSourceFile($tempPdfFile);
$pageNo = 1;
for ($pageNo; $pageNo <= $pageCount; $pageNo++) {
// Checks if the page is to be skipped
// Import a page from the blank by setting the
$pageId = $reducedPdf->importPage($pageNo);
// Return the size of the imported page
$size = $reducedPdf->getTemplateSize($pageId);
// Remove default header/footer
$reducedPdf->setPrintHeader(false);
$reducedPdf->setPrintFooter(false);
$reducedPdf->SetAutoPageBreak(false, 0);
// Creates the PDF page
$reducedPdf->AddPage($size['orientation'], [10,10]);
$reducedPdf->useTemplate($pageId, 0, 0, 10, 10);
}
return $reducedPdf->output('', 'S');
Using Spatie\PdfToImage to generate Image File - Good output size (31mb > 218kb) but rubbish speed (1secs > 27secs just to convert and save image)
Similar to previous, looking to convert the PDF to an image file, before resizing and embedding the image into a PDF for preview. But it is extremely slow so I gave up on this method before even generating the PDF.
$tempPdf = new \Spatie\PdfToImage\Pdf(storage_path('app/templates/pdf/temp/'.$name));
$tempPdf->setCompressionQuality(10);
$tempPdf->saveImage(storage_path('app/templates/pdf/temp/'));
Suggestions?
Does anyone have suggestion on either improving my attempts, or another way of achieving what I need?
I'm trying to find the fastest way to convert a multiple pdf in to jpeg images without quality loss. I have the following code:
$this->imagick = new \Imagick();
$this->imagick->setresolution(300,300);
$this->imagick->readimage($this->uploadPath . '/original/test.pdf');
$num_pages = $this->imagick->getNumberImages();
for($i = 0;$i < $num_pages; $i++) {
$this->imagick->setIteratorIndex($i);
$this->imagick->setImageFormat('jpeg');
$this->imagick->readimage($asset);
// i need the width, height and filename of each image to add
// to the DB
$origWidth = $this->imagick->getImageWidth();
$origHeight = $this->imagick->getImageHeight();
$filename = $this->imagick->getImageFilename();
$this->imagick->writeImage($this->uploadPath . '/large/preview-' . $this->imagick->getIteratorIndex() . '.jpg');
$this->imagick->scaleImage($origWidth / 2, $origHeight / 2);
$this->imagick->writeImage($this->uploadPath . '/normal/preview-' . $this->imagick->getIteratorIndex() . '.jpg');
//add asset to the DB with the given width, height and filename ...
}
This is very slow thou, partially because the resolution is so big but is i dont add it, the text on the images is of very poor quality. Also, the fact that i'm saving the image first, and then also saving a smaller version of the file is probably not very optimized.
So does anyone have a better method of doing this? Maybe with only ghostscript?
The minimum requirements are that i need 2 versions of the converted image. A real size version and a version at half size. And i need the width and height and filename to add to the database.
You can use Ghostscript, if you set "-sDEVICE=jpeg -sOutputFile=out%d.jpg" then each page will be written to a separate file.
Note that its not really possible to render a PDF to JPEG 'without quality loss' since JPEG is a lossy compression method. Have you considered using a lossless format instead, such as PNG or TIFF ?
To get the same images at different sizes you will need to render the file twice at different resolutions, you set resolution in Ghostscript with '-r'. The width and height can be read easily enough from the image file, or you can use pdf_info.ps to get each page size. Note that a PDF file can contain multiple pages of different sizes, I hope you aren't expecting them all to be the same....
I have following code to manipulate and compress the TIFF image.
<?php
try{
$imagesrc = "C:\\server\\www\\imagick\\src.tif";
$imagedestination = "C:\\server\\www\\imagick\\converted.tif";
$im=new Imagick();
$im->readImage($imagesrc); //read image for manipulation
$im->setImageColorSpace(Imagick::COLORSPACE_CMYK);
$im->setImageDepth(8); //8 Bit
$im->setImageResolution(300,300); //set output resolution to 300 dpi
$im->setImageUnits(1); //0=undefined, 1=pixelsperInch, 2=PixelsPerCentimeter
$im->setImageCompression(Imagick::COMPRESSION_LZW);
$im->setImageCompressionQuality(80);
$im->writeImage($imagedestination);
$im->clear();
$im->destroy();
$im=NULL;
}catch(ImagickException $e){
echo "Could not convert image - ".$e->getMessage();
}
?>
The source image is 19MB. When I use this code, the resulting TIF image is around 25MB. That is, the code is not compressing image at all. Also other compression methods have no effect on resulting TIFF file but however if i use compression method Imagick::COMPRESSION_JPEG, the resulting image is 2MB
I can't use JPEG compression, as I m using resulting TIFF image with itextsharp to embed in PDF. JPEG compression results in multi-strip tiff image which is not supported by itextsharp.
So there are two possible answers to my question. And either of the answer will work for me
How to effectively compress the tif?
How to convert multi-strip tif image into single strip.
Thanks
Fiddling with php-imagick got me nowhere, so I tried Magick.NET.
Only by setting the rows-per-strip define a number larger than the lines in the image (i.e. #strips=1) iTextSharp accepted the image with CompressionMethod.JPEG.
But it's still not working. All the image viewers I have on my computer render the image correctly, yet in the PDF documents it's broken.
And I found this forum entry http://itext-general.2136553.n4.nabble.com/TIFF-with-color-pages-COMPRESS-JPEG-problem-td3686051.html :
Jpeg compressed tiff images are not really supported in iText, they may work but most probably not.No idea how authoritative Paulo Soares-3's post is, but I give up.
Therefore: this is not an answer. But maybe you want to fiddle with the .NET port as well, so here's my test code - good luck :
using iTextSharp.text;
using iTextSharp.text.pdf;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace imagick_itext_test
{
class Program
{
static Image getNormalizedImage(string path)
{
Image rv;
using (MemoryStream mems = new MemoryStream())
{
using (ImageMagick.MagickImage image = new ImageMagick.MagickImage(path))
{
image.Format = ImageMagick.MagickFormat.Tiff;
image.ResolutionUnits = ImageMagick.Resolution.PixelsPerInch;
image.Depth = 300;
image.BitDepth(8); // for printing you said? ;-)
image.Adjoin = false; // is there multi-image in jpeg anyway?
image.Interlace = ImageMagick.Interlace.Jpeg; // try Interlace.Plane and Interlace.No
image.CompressionMethod = ImageMagick.CompressionMethod.JPEG; // everything's fine when using .LZW here
image.Quality = 35; // 85, 80 not even 50 got me significant reduction in file size (src size=18MB)
//image.SetDefine(ImageMagick.MagickFormat.Tiff, "rows-per-strip", image.Height.ToString());
image.SetDefine(ImageMagick.MagickFormat.Tiff, "rows-per-strip", "8192");
image.Strip(); // remove additional data, e.g. comments
image.Write(mems);
}
// store the tiff(jpeg) image for inspection
using (FileStream fos = new FileStream(#"c:\temp\so_conv.tiff", FileMode.Create) )
{
mems.Position = 0;
mems.CopyTo(fos);
}
mems.Position = 0;
rv = Image.GetInstance(mems);
//rv.ScalePercent(24f); // works for me ...
}
return rv;
}
static void Main(string[] args)
{
using (Document doc = new Document())
{
using (PdfWriter w = PdfWriter.GetInstance(doc, new FileStream(#"c:\temp\so_pdf_test.pdf", FileMode.Create)))
{
doc.Open();
doc.SetMargins(50, 50, 50, 50);
doc.Add(new Paragraph("SO Image Test"));
doc.Add(getNormalizedImage(#"c:\temp\src.tif"));
doc.Close();
}
}
}
}
}
VS2012 - .net 4.5 , ImageMagick-6.8.8-10-Q16-x64-static.exe
Both Magick.NET and iTextSharp have been added via NuGet to the project:
iTextSharp 5.50
Magick.NET-Q16-x64 6.8.8.901
From what i have testes its better to leave out JPEG in TIFF = is the smallest.
Then there is ZIP compression and then LZW and then RLE.
**Input file: jpeg 500kb.
* jpeg in tiff 1.25mb
* ZIP 2.0mb
* LZW 2.5mb
* LRE 3.2mb
One thing - dont set Quality for tiff compression as it is loseless format - it just ignores it (sets to 100 for counting). You may set it for the jpeg compression for tiff - nothing other.
But what you could do is add the line $im->stripImage(); before saving. This will strip some information from the file - maybe it will make it smaller for you.
Also please check your Imagick version my is: ImageMagick 6.7.7-7 2012-06-21 Q16 and it works well.
I’m creating a web application that uses the SoundCloud API to stream tracks of artists. I know how I can get the waveform PNG image (http://w1.sndcdn.com/fxguEjG4ax6B_m.png for example), but I actually need some sort of wave-data (when in the song is it high and when is it low?).
I don’t have access to an audio library like LAME or something like that, because my web hosting doesn’t allow it. Is it possible to
Get this data directly from the SoundCloud API in some way.
Process the waveform PNG image in PHP or JavaScript to retrieve the needed data? (And is there maybe some sort of library available for this kind of processing?)
Soundcloud starts to provide the floating points but it’s not official yet. Just a little trick, when you have your PNG:
https://w1.sndcdn.com/XwA2iPEIVF8z_m.png
Change "w1" by "wis" and "png" by "json":
https://wis.sndcdn.com/XwA2iPEIVF8z_m.json
And you get it!
It's possible to parse the waveform PNG image to translate it to an array of points. The images are vertically symmetrical and to find the peaks you only need to inspect the alpha values to count how many opaque pixels it is from the top of the image. This is how the waveforms are rendered for the widget and on the Next SoundCloud.
In PHP, you could use ImageMagick or the GD Graphics Library to read these values, and in Javascript, it's possible by putting the image onto a canvas object and then inspecting the image data from there. I won't go too much into the details of these, but you could certainly ask another question if you get stuck.
While there is no official way to get the raw waveform data directly from a SoundCloud API request, there is a way to derive the exact same data SoundCloud reveals in the unofficial endpoint (aka: Something like https://wis.sndcdn.com/XwA2iPEIVF8z_m.json) in PHP using this code like this. Simply change the value of $image_file to match whatever SoundCloud 1800 wide by 280 high PNG image you have and you are good to go:
$source_width = 1800;
$source_height = 140;
$image_file = 'https://w1.sndcdn.com/XwA2iPEIVF8z_m.png';
$image_processed = imagecreatefrompng($image_file);
imagealphablending($image_processed, true);
imagesavealpha($image_processed, true);
$waveform_data = array();
for ($width = 0; $width < $source_width; $width++) {
for ($height = 0; $height < $source_height; $height++) {
$color_index = #imagecolorat($image_processed, $width, $height);
// Determine the colors—and alpha—of the pixels like this.
$rgb_array = imagecolorsforindex($image_processed, $color_index);
// Peak detection is based on matching a transparent PNG value.
$match_color_index = array(0, 0, 0, 127);
$diff_value = array_diff($match_color_index, array_values($rgb_array));
if (empty($diff_value)) {
break;
}
} // $height loop.
// Value is based on the delta between the actual height versus detected height.
$waveform_data[] = $source_height - $height;
} // $width loop.
// Dump the waveform data array to check the values.
echo '<pre>';
print_r($waveform_data);
echo '</pre>';
The benefit of this method is while that https://wis.sndcdn.com/ URL is useful, there is no telling if/when SoundCloud would change the structure of the data coming from it. Deriving the data from the official waveform PNG offers some long term stability since they are not just going to change that PNG image without fair warning to SoundCloud API end users.
Also, note that while the $source_width is 1800 the $source_height is 140 because while the SoundCloud PNG file is 280 pixels high, the bottom half is basically just a flipped/mirrored copy of the top half. So just measuring the values from 0 to 150 will give you the necessary waveform data values.
Sorry to bump an old thread - just in case you are looking for something similar and stumbles across this post: This is now possible as per this link: Waveforms, Let's Talk About Them.
It was published shortly after this thread - so again apologies for bumping an old one.
While resizing an image, I have noticed that Imagick and Gmagick produce images with different filesize on HDD with the same options:
$image = new Imagick("c.jpg");
$image->thumbnailImage(260,195);
$image->writeImage("c_imagick.jpg");
outputs an Image with 88kb
$image = new Gmagick("c.jpg");
$image->thumbnailImage(260,195);
$image->writeImage("c_gmagick.jpg");
outputs an Image with 15kb
Does someone have any idea, why the difference is so huge?
Try setting the image compression settings prior to resizing.
$image->setImageCompression(Imagick::COMPRESSION_JPEG);
$image->setImageCompressionQuality(80);
Additionally, check the size of the resulting image. Comments in the PHP documentation lead me to believe that the automatic fit portion of thumbnailImage does not work as you would expect in IMagick.
From PHP Docs:
The fit functionality of thumbnailImage doesn't work as one would anticipate. Instead, use >this to make a thumbnail that has max of 200x82:
// Create thumbnail max of 200x82
$width=$im->getImageWidth();
if ($width > 200) { $im->thumbnailImage(200,null,0); }
$height=$im->getImageHeight();
if ($height > 82) { $im->thumbnailImage(null,82,0); }