I have some PHP code that processes a number of .PNG images, combining them pixel-by-pixel (so lots of ìmagecolorat calls). Some of these images can change, but a few are precalculated and rarely change.
The precalculated images are generated by GD and output in PHP using imagepng.
As they are read far more often than they are written, I'd like to optimize them for reading speed.
But what type of quality settings for imagepng is best to optimize reading performance in imagecreatefrompng?
Higher compression and filters create smaller files, but perhaps a bigger file with no compression or filters is faster to read?
Perhaps it's better to skip PNG files altogether and use raw, uncompressed binary files or something that can be read into a PHP array?
If you process almost the same files over and over again, you might want to stop messing with imagepng itself and move the logic one level higher. For example you may cache the finished images or function calls ( see for example Caching function results in PHP ).
Related
I noticed that PHP Imagick changes the IDAT chunks when processing PNGs.
How exactly is this done? Is there a possibility to create IDAT chunks that remain unchanged? Is it possible to predict the outcome of Imagick?
Background information to this questions:
I wondered whether the following code (part of a PHP file upload) can prevent hiding PHP code (e.g. webshells) in PNGs:
$image = new Imagick('uploaded_file.png');
$image->stripImage();
$image->writeImage('secure_file.png');
Comments are stripped out, so the only way to bypass this filter is hiding the PHP payload in the IDAT chunk(s). As described here, it is theoretically possible but Imagick somehow reinterprets this Image data even if I set Compression and CompressionQuality to the values I used to create the PNG. I also managed to create a PNG whose ZLIB header remained unchanged by Imagick, but the raw compressed image data didn't. The only PNGs where I got identical input and output are the ones which went through Imagick before. I also tried to find the reason for this in the source code, but couldn't locate it.
I'm aware of the fact that other checks are necessary to ensure the uploaded file is actually a PNG etc. and PHP code in PNGs is no problem if the server is configured properly, but for now I'm just interested in this issue.
IDAT chunks can vary and still produce an identical image. The PNG spec unfortunately forces the IDAT chunks to form a single continuous data stream. What this means is that the data can be grouped/chunked differently, but when re-assembled into a single stream will be identical. Is the actual data different or is just the "chunking" changed? If the later, why does it matter if the image is identical? PNG is a lossless type of compression, stripping the metadata and even decompressing+recompressing an image shouldn't change any pixel values.
If you're comparing the compressed data and expecting it to be identical, it can be different and still yield an identical image. This is because FLATE compression uses an iterative process to find the best matches in previous data. The higher the "quality" number you give it, the more it will search for matches and shrink the output data size. With zlib, a level 9 deflate request will take a lot longer than the default and result in slightly smaller output data size.
So, please answer the following questions:
1) Are you trying to compare the compressed data before/after your strip operation to see if somehow the image changed? If so, then looking at the compressed data is not the way to do it.
2) If you want to strip metadata without any other aspect of the image file changing then you'll need to write the tool yourself. It's actually trivial to walk through PNG chunks and reassemble a new file while skipping the chunks you want to remove.
Answer my questions and I'll update my answer with more details...
I wondered whether the following code (part of a PHP file upload) can prevent hiding PHP code (e.g. webshells) in PNGs
You should never need to think about this. If you are worried about people hiding webshells in a file that is uploaded to your server, you are doing something wrong.
For example, serving those files through the PHP parser....which is the way a webshell could be invoked to attack a server.
From the Imagick readme file:
5) NEVER directly serve any files that have been uploaded by users directly through PHP, instead either serve them through the webserver, without invoking PHP, or use readfile to serve them within PHP.
readfile doesn't execute the file, it just sends it to the end-user without invoking it, and so completely prevents the type of attack you seem to be concerned about.
I'm currently trying to speed up the websites we develop. The part I'm working on now is to optimise the images so that they are as small (filesize, not dimensions) as possible without losing quality.
Our customers can upload their own images to the website through our custom CMS, but images aren't being compressed or optimised at all. My superior explained this is because the customers can upload their own images, and these images could be optimised beforehand through Photoshop or tools like it. If you optimise already optimised images, the quality would get worse. ...right?
We're trying to find a solution that won't require us to install a module or anything. We already use imagejpeg(), imagepng() and imagegif(), but we don't use the $quality parameter because of reasons previously explained. I've seen some snippets, but they all use imagejpg() and the like.
That being said, is there a sure-fire way of optimising images without the risk of optimising previously optimised images? Or would it be no problem at all to use imagejpeg(), imagepng() and imagegif(), even if it would mean optimising already optimised images?
Thank you!
"If you optimise already optimised images, the quality would get worse. "
No if you use a method without loose.
I don't know for method directly in php but if you are on linux server you can use jpegtran or jpegoptim ( with --strip-all) for jpeg and OptiPNG or PNGOUT for png.
Going from your title, I am going to assume compression
So, lets say a normal jpg of 800x600 is uploaded by your customers.
The customers jpg is 272kb because it has full details and everything.
You need to set tresholds for filesizes at dimensions what is acceptable.
Like:
if $dimensions->equals(800,600) and file_type($image) =='jpg' and file_size($image) > 68kb
then schedule_for_compression($image)
and that way you set up parameters for what is acceptable as an upper limit of file size. If the dimensions match, and the filesize is bigger, then its not optimised.
But without knowing more details what exactly is understood about optimising, this is the only thing I can think of.
If you are using a low number of images to compress, you might find an external service, such as: https://tinypng.com/developers might be of assistance.
I've used their on-line tools for reducing filesize on both JPG and PNG file manually but they do appaear to offer a free API service for the first 500 images per month.
Apologies if this would be better as a comment than an answer, I'm fairly new to stackoverflow and haven't got enough points yet, but felt this may be a handy alternative solution.
Saving a JPEG with the same or higher quality setting will not result in a noticeable loss in quality. Just re-save with your desired quality setting. If the file ends up larger, just discard it and keep the original. Remove metadata using jpegran or jpegoptim before you optimize so it doesn't affect the file size when you compare to the original.
PNG and GIF wont't lose any quality unless you reduce the number of colors. Just use one of the optimizers Gyncoca mentioned.
I've been doing some speed optimization on my site using Page Speed and it gives recommendations like so:
Optimizing the following images could reduce their size by 35.3KiB (21% reduction).
Losslessly compressing http://example.com/.../some_image.jpg could save 25.3KiB (55% reduction).
How are the calculations done to get the file size reduction numbers? How can I determine if an image is optimized in PHP?
As I understand it, they seem to base this on the image quality (so saving at 60% in Photoshop or so is considered optimized?).
What I would like to do is once an image is uploaded, check if the image is fully optimized, if not, then optimize it using a PHP image library such as GD or ImageMagick. If I'm right about the number being based on quality, then I will just reduce the quality as needed.
How can I determine if an image is fully optimized in the standards that Page Speed uses?
Chances are they are simply using a standard compression or working on some very simple rules to calculate image compression/quality. It isn't exactly what you were after but what I often use on uploaded images etc etc dynamic content is a class called SimpleImage:
http://www.white-hat-web-design.co.uk/blog/resizing-images-with-php/
this will give you the options to resize & adjust compression, and I think even change image type (by which I mean .jpg to .png or .gif anything you like). I worked in seo and page optimization was a huge part of my Job I generally tried to make the images the size the needed to be no smaller or bigger. compress JS & CSS and that's really all most people need to worry about.
It looks like you could use the PageSpeed Insights API to get the compressions scores: https://developers.google.com/speed/docs/insights/v1/reference, though I'm guessing you'd want to run a quality check/compression locally, rather than sending everything through this API.
It also looks like they've got a standalone image optimizer available at http://code.google.com/p/page-speed/wiki/DownloadPageSpeed?tm=2, though this appears to be a Windows binary. They've got an SDK there as well, though I haven't looked into what it entails, or how easy it would be to integrate into an existing site.
I want to create multiple thumbnails using GD library in php, and I already have a script to do this, the question is what is better for me .. is it better to create thumbnail on the fly? or create a physical file on my server each time I want a thumb?? and Why?
Please, consider time consuming and storage capacity and other disadvantages for both
When you create the thumbnail depends on a couple of factors (that I'll get into) but you should never discard the output of something like this (unless you'll never use it again) as it's a really expensive operation.
Anyway your two main choices for "when to generate the thumbnail" are:
When it's first requested. This is common and it means that you don't generate thumbnails that are never used but it does mean if you have a page full of first-time-thumbnails that the server might become overwhelmed with PHP processes generating the thumbnails.
I had a similar issue with Sorl+Django where I was generating 100+ thumbnails per request for the first few requests after uploading and it basically made the entire server hang for 20 minutes. Not good.
Generate all required thumbnails when you upload. Because it takes a long time to upload, you break down the processing quite a lot. You can also pull it out-of-process (ie use another script to process uploads - perhaps not even in PHP).
The obvious downside is you're using up disk space that you otherwise might not need to use up... But unless you're talking about hundreds of thousands of thumbnails, a small percentage of unused ones probably won't break the bank.
Of course, if disk space is an issue, there might be an argument for pushing the thumbnail up to a CDN at the same time as you process it.
One note when you save the thumbnails, it's fairly common that you'll want to resize the thumbnails at some point down the line or perhaps want two small variants. I find it really useful to make the filenames very specific so if the original image is image.jpg, the 200x200 version is image-200x200.jpg.
Neither/both - don't generate the thumbnails till you need them - but keep the files you generate.
That way you'll minimise the amount of work needed and have a self-repairing system
C.
GD is really resource heavy, so you should look at if you can use ImageMagick instead (which also has a clearer syntax).
You definitely will be better off caching the created thumbnail after the first run (regardless of if you run GD or ImageMagick) and serve them from the cache. If you are worried about storage, clear out old files from the cache now and then.
Always cache (= write out to disk) the results of GD operations. They are too expensive both regarding processor time and memory to be done on the fly every time. This becomes increasingly true the more visitors/hits you have.
I have a requirement to dynamically generate and compress large batches of PDF files.
I am considering the usual algorithms
Zip
Ace
Rar
Any other suggestion are welcome.
My question is which algorithm is likely to give me the smallest file size. Speed and efficency are also important factors but size is my primary concern.
Also does it make a difference whether I have many small files, or fewer larger files in each archive.
Most of my processing will be done in PHP, but I'm happy to interface with third party executables if needed.
Edit:
The documents are primarily invoices and shouldn't contain any other images except for the company logo
I have not had much success compressing PDFs. As pointed out, they are already compressed when composed (although some PDF composition tools allow you to specify a 'compression level'). If at all possible, the first approach you should take is to reduce the size of the composed PDFs.
If you keep the PDFs in a single file, they can share any common resources (images, fonts) and so can be significantly smaller. Note that this means one large PDF file, not one large ZIP with multiple PDFs inside.
In my experience it is quite difficult to compress the images within PDFs, and that images make by far the biggest impact on file size. Ensure that you have optimised images before you start. It is even worth running a test run without your images simply to see how much size the images are contributing.
The other component is font, and if you are using multiple embedded fonts then you are packing more data into the file. Just use one font to keep size down, or use fonts that are commonly installed so that you don't need to embed them.
I think 7z is the best currently with RAR being the second, but I would recommend you trying both to find out what works best for you.
LZMA is the best if you need smallest file size.
And of course PDF can be compressed itself.
I doubt you'll get much/any reduction in filesize by compressing PDFs. However, if all you're doing is collecting multiple files into one, why not tar it?
We've done this in the past for large (and many) PDFs that store lots of text - Training Packages for Training Organisations in Australia. Its about 96% text (course info etc) and a few small diagrams. Sizes vary from 1-2Mb to 8 or 9Mb and they usually come in volumes of 4 or more.
We've found compressing with Zip OK to get good compression as the PDF format is already heavily compressed, it was more of a ease of use for our users to download it all as a batch instead of worry about the filesizes. To give you an idea, a 2.31Mb file - lots of text, several full page diagrams - compressed to 1.92Mb in ZIP and 1.90Mb in RAR.
I'd recommend using LZMA to get the best - looking at resource usage on compressing and uncompressing too.
How big are these files? Get a copy of WinRAR, WinAce and 7Zip and give it ago.
Combine my nifty tool Precomp with 7-Zip. It decompresses the zLib streams inside the PDF so 7-Zip (or any other compressor) can handle them better. You will get filesizes about 50% of the original size lossless. This tool works especially well for PDF files, but is also nice for other compressed (zLib/LZW) streams as ZIP/GZip/JAR/GIF/PNG...
For result examples have a look here or here. Speed can be slow for the precompression (PDF->PCF) part, but will be very fast for the recompression/reconstruction (PCF->PDF) part.
For even better results than with Precomp + 7-Zip, you can try lprepaq and prepaq variants, but beware, especially prepaq is slooww :) - the bright side is that prepaq offers the best (PDF) compression currently available.