I am working at the moment on an issue where we are seeing CPU Usage issues on a particular host when converting images using iMagick. The issue is pretty perfectly described here:
https://github.com/ResponsiveImagesCG/wp-tevko-responsive-images/issues/150 (I don't use that particular library, but I DO use the same responsive images classes they do, and I am timing out on that particular line, only for some images).
They seem to suggest that removing the call to ->posterizeImage() will fix their issue, and in my tests it does, I can't even tell any difference in the converted images. But this worries me because I wonder if there is a difference that I am not seeing, or one that only comes up in certain scenarios (I mean if posterizing an image didn't do anything there wouldn't be a method for it, right?). I see online that it 'Reduces the image to a limited number of color level' (136 levels in the case causing an issue for me, for what it's worth). I'm having some difficulty parsing that though, which I think is related to a poor grasp of the way various image formats store data (really it doesn't go past the idea that an image is broken up into pixels, which are broken up into proportions of red green and blue).
What actual visual differences could I expect to see if we stop posterizing images? Is it something that I would only expect in certain types of image (like, would it be more visible in transparent over non-transparent, or warmer coloured images)? Or that would be more evident in certain display styles (like print, or the warmer colour temp in iPhone displays)?
Basically I am looking for the info to make an informed choice on whether it's safe to comment out. I'm not worried if it means some images might be x Kb larger, but if it will make them look poor quality, or distort them in some way (even in corner cases) then I need to consider other options.
From the ImageMagick command line documentation:
-posterize levels
reduce the image to a limited number of color levels per channel.
Very low values of levels, e.g., 2, 3, 4, have the most visible effect.
There is a bit more info in the Color Quantization examples - it also has some example images:
The operators original purpose (using an argument of '2') is to re-color images using just 8 basic colors, as if the image was generated using a simple and cheap poster printing method using just the basic colors. Thus the operator gets its name.
...
An argument of '3' will map image colors based on a colormap of 27 colors, including mid-tone colors. While an argument of '4' will generate a 64 color colortable, and '5' generates a 125 color colormap.
Essentially it reduces the number of colors used in the image - and by extension the size. Using a level of 136 would not have much visible affect, as this translates to a 2,515,456 color colortable (136^3).
It is also worth noting from the commit for the issue you linked is that this isn't even always an effective way of reducing image size:
... it turns out that posterization only improves file sizes
for PNGs and can actually lead to slightly larger file sizes for
JPG images.
Posterisation is a reduction of the amount of colour information stored in an image - as such, it is really a decrease in quality. It's hard to imagine how stopping doing this could be detrimental. And, if it turns out later that there is/was a legitimate reason for doing it, you can always do it later because if you stop doing it now, you will still have all the original information.
If it was the other way around, and you started to introduce posterisation and later found out it was undesirable for some reason, you would no longer be able to get the original information back.
So, I would see no harm in stopping posterising. And the fact that I have written that, kind of challenges anyone who knows better to speak up and tell me I am wrong :-)
Related
Can I convert PNG or jpeg to 25% size of original image by using php GD library?
I can't install extra php extension.
If you're wanting to reduce the file size to 25% of that of the original image, you may need some trial-and-error, which can be costly from a computational and memory standpoint, as in GD the only way to do this is as follows:
$old_bytes=filesize($original_file_path);
$image=imagecreatefromjpeg($original_file_path);
imagejpeg($image, $new_file_path, $quality_guess);
$new_bytes=filesize($new_file_path);
$ratio=$old_bytes/$new_bytes;
The tricky thing is: how accurate do you want to get? You will have to encapsulate the above code, or something like it, in some sort of loop and carry it out multiple times, and then make the subjective judgment call of deciding when to stop. Ideally you want $ratio close to 4.
If your source images are all roughly similar in terms of quality settings, you might find it much better to manually test things out for a single file, and just decide on a fixed quality setting that achieves the reduction in filesize that you're looking for.
In my experience though, it is generally a bad idea to aim for consistent filesize or consistent reduction in filesize, so you might want to rethink your end goal here. Usually I approach image compression based on quality, i.e. I take a sample of images and then I compress them with different quality settings (75 is often a good starting point) and send them around to a few different users on different devices (after looking at them myself on different devices) and I pick the lowest quality setting at which no one can notice a decrease in quality. Often it is surprisingly low, but it will also be heavily dependent on the specifics of the image. Certain images are just more compressable than others, based on the structure of their content.
So if you're aiming for a fixed size-reduction, you might find that much of the time you're compressing a lot less than you could (i.e. you could lower the quality even more without users noticing) but you might also find that some of the time you are actually reducing quality to terrible levels. Like, imagine one user who leaves the quality settings on 100, and then another user that had mindfully optimized the quality settings based on perception. If your code reduces both of them to 25% file size, you'll probably produce far fewer savings than you could have on that originally-100-quality image, but you're probably going to end up with a visibly terrible image on that carefully-optimized image.
So...yes, you could do what you want, but it will be computationally intensive and I am having trouble imagining a scenario where it would be a good idea.
My users are uploading images to my website and i would like first to offer them already uploaded images first. My idea is to
1. create some kind of image "hash" of every existing image
2. create a hash of newly uploaded image and compare it with the other in the database
i have found some interesting solutions like http://www.pureftpd.org/project/libpuzzle or or http://phash.org/ etc. but they got one or more problems
they need some nonstandard extension to PHP (or are not in PHP at all) - it would be OK for me, but I would like to create it as a plugin to my popular CMS, which is used on many hosting environments without my control.
they are comparing two images but i need to compare one to many (e.g. thousands) and doing it one by one would be very uneffective / slow ...
...
I would be OK to find only VERY similar images (so e.g. different size, resaved jpg or different jpg compression factor).
The only idea I got is to resize the image to e.g. 5px*5px* 256 colors, create a string representation of it and then find the same. But I guess that it may have create tiny differences in colors even with just two same images with different size, so finding just the 100 % same would be useless.
So I would need some good format of that string representation of image which than could be used with some SQL function to find similar, or some other nice way. E.g. phash create perceptional hashes, so when two numbers are close, the images should be close as well, so i just need to find closest distances. But it is again external library.
Is there any easy way?
I've had this exact same issue before.
Feel free to copy what I did, and hopefully it will help you / solve your problem.
How I solved it
My first idea that failed, similar to what you may be thinking, is I ended up making strings for every single image (no matter what size). But I quickly worked out this fills your database super fast, and wasn't effective.
Next option (that works) was a smaller image (like your 5px idea), and I did exactly that, but with 10px*10px images. The way I created the 'hash' for each image was the imagecolorat() function.
See php.net here.
When receiving the rgb colours for the image, I rounded them to the nearest 50, so that the colours were less specific. That number (50) is what you want to change depending on how specific you want your searches to be.
for example:
// Pixel RGB
rgb(105, 126, 225) // Original
rgb(100, 150, 250) // After rounding numbers to nearest 50
After doing this to every pixel (10px*10px will give you 100 rgb()'s back), I then turned them into an array, and stored them in the database as base64_encode() and serialize().
When doing the search for images that are similar, I did the exact same process to the image they wanted to upload, and then extracted image 'hashes' from the database to compare them all, and see what had matching rounded rgb's.
Tips
The Bigger that 50 is in the rgb rounding, the less specific your search will be (and vice versa).
If you want your SQL to be more specific, it may be better to store extra/specific info about the image in the database, so that you can limit the searches you get in the database. eg. if the aspect ratio is 4:3, only pull images around 4:3 from the database. (etc)
It can be difficult to get this perfectly 5px*5px, so a suggestion is phpthumb. I used it with the syntax:
phpthumb.php?src=IMAGE_NAME_HERE.png&w=10&h=10&zc=1
// &w= width of your image
// &h= height of your image
// &zc= zoom control. 0:Keep aspect ratio, 1:Change to suit your width+height
Good luck mate, hope I could help.
For an easy php implementation check out: https://github.com/kennethrapp/phasher
However - I wonder if there is a native mySql function for "compare" (see php class above)
I scale down image to 8x8 then I convert RGB to 1-byte HSV so result hash is 172 bytes string.
HSVHSVHSVHSVHSVHSVHSVHSV... (from 8x8 block, 172 bytes long)
0fff0f3ffff4373f346fff00...
It's not 100% accurate (some duplicates aren't found) but it works nice and looks like there is no false positive results.
Putting it down in an academical way, what you are looking for is a similarity function which takes in two images and returns an indicator how far/similar the two images are. This indicator could easily be a decimal number ranging from -1 to 1 (far apart to very close). Once you have this function you can set an image as a reference and compare all the images against it. Then finding the similar images to one is as simple as finding the closest similarity factor to it which is done with a simple search over a double field within an RDBMS like MySQL.
Now all that remains is how to define the similarity function. To be honest this is problem specific. It depends on what you call similar. But covariance is usually a good starting point, it just needs your two images to be of the same size which I think is of no big deal. Yet you can find lots of other ideas searching for 'similarity measures between two images'.
I am currently working on a PHP application which is ran from the command line to optimize a folder of Images.
The PHP application is more of a wrapper for other Image Optimizer's and it simply iterates the directory and grabs all the images, it then runs the Image through the appropriate program to get the best result.
Below are the Programs that I will be using and what each will be used for...
imagemagick to determine file type and convert non-animated gif's to png
gifsicle to optimize Animated Gif images
jpegtran to optimize jpg images
pngcrush to optimize png images
pngquant to optimize png images to png8 format
pngout to optimize png images to png8 format
My problem: With 1-10 images, everything runs smooth and fairly fast however, once I run on a larger folder with 10 or more images, it becomes really slow. I do not really see a good solution around this but one thing that would help is to avoid re-processing images that have already been Optimized. So if I have a folder with 100 images and I optimize that folder and then add 5 new images, re-run the optimizer. It then has to optimize 105 images, my goal is to have it only optimize the 5 newer images since the previous 100 would have already been optimized. This alone would greatly improve performance when new images are added to the image folder.
I realize the simple solution would be to simply copy or move the images to a new folder after processing them, my problem with that simple solution is that these images are used for the web and websites, so the images are generally hard-linked into a websites source code and changing the path to the images would complicate that and possibly break it sometimes.
Some ideas I have had are: Write some kind of text file database to the image folders that will list all the images that have already been processed, so when the application is ran, it will only run on images that are not in that file already. Another idea was to cheange the file name to have some kind of identification in the name to show it has been optimized, a third idea is to move each optimized file to a final destination folder once it is optimized. Idea 2 and 3 are not good though because they will break all image path links in the websites source code.
So please if you can think of a decent/good solution to this problem, please share?
Meta data
You could put a flag in the meta info of each image after it is optimized. First check for that flag and only proceed if it's not there. You can use exif_read_data() to read the data. Writing it maybe like this.
The above is for JPGs. Metdata for PNGs is also possible take a look at this question, and this one.
I'm not sure about GIFs, but you could definitely convert them to PNGs and then add metadata... although I'm pretty sure they have their own meta info, since meta data extraction tools allow GIFs.
Database Support
Another solution would be to store information about the images in a MySQL database. This way, as you tweak your optimizations you could keep track of when and which optimization was tried on which image. You could pick which images to optimize according to any parameters of your choosing. You could build an admin panel for this. This method would allow easy experimentation.
You could also combine the above two methods.
Maximum File Size
Since this is for saving space, you could have the program only work on images that are larger than a certain file size. Ideally, after running the compressor once, all the images would be below this file size, and after that only newly added images that are too big would be touched. I don't know how practical this is in terms of implementation, since it would require that the compressor gets any image below some arbitrary files size. You could make the maximum file size dependent on image size.....
The easiest way would most likely be to look at the time of the last change for each image. If an image was changed after the last run of your script, you have to run it on this particular image.
The timestamp when the script was ran could be saved easily in a short text file.
A thought that comes to my head is to mix the simple solution with a more complicated one. When you optimize the image, move it to a separate folder. When an access is made into the original image folder, have your .htaccess file capture those links and route them to an area of which can see if that same image exists within the optimized folder section, if not, optimize, move, then proceed.
I know i said simple solution, this is a sightly complicated solution, but the nice part is that the solution will provide a scalable approach to your issue.
Edit: One more thing
I like the idea of a MySQL database because you can add a level security (not all images can be viewed by everyone) If thats a need of course. But it also makes your links problem (the hard coded one) not so much a problem. Since all links are a single file of which retrieves the images from the db and the only thing that changes are get variables which are generated. This way your project becomes significantly more scalable and easier to do a design change.
Sorry this is late, but since there is a way to address this issue without creating any files, storing any data of any kind or keeping track of anything. I thought I'd share my solution of how I address things like this.
Goal
Setup an idempotent solution that efficiently optimizes images without dependencies that require keeping track of its current status.
Why
This allows for a truly portable solution that can work in a new environment, an environment that somehow lost its tracker, or an environment that is sensitive as to what files you can actually save in there.
Diagnose
Although metadata might be the first source you'd think to check for this information, it's true that in some cases it will not be available and the nature of metadata itself is arbitrary, like comments, they can come and go and not affect the image in any way. We want something more concrete, something that is a definite descriptor of the asset at hand. Ideally you would want to "identify" if one has been optimized or not, and the way to do that is to review the image to see if it has been based on its characteristics.
Strategy
When you optimize an image, you are providing different options of all sorts in order to reach the final state of optimization. These are the very traits you will also check to come to the conclusion of whether or not it had been in fact optimized.
Example
Lets say we have a function in our script called optimize(path = ''), and let's assume that part of our optimization does the following:
$ convert /path/to/image.jpg -bit-depth=8 -quality=87% -colors=255 -colorspace sRGB ...
Note that these options are ones that you choose to specify, they will be applied to the image and are properties that can be reviewed later...
$ identify -verbose /path/to/image.jpg
Image: /path/to/image.jpg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Mime type: image/jpeg
Geometry: 1250x703+0+0
Colorspace: sRGB <<<<<<
Depth: 8-bit <<<<<<
Channel depth:
Red: 8-bit
Green: 8-bit
Blue: 8-bit
Channel statistics:
Pixels: 878750
Red:
...
Green:
...
Blue:
...
Image statistics:
Overall:
...
Rendering intent: Perceptual
Gamma: 0.454545
Transparent color: none
Interlace: JPEG
Compose: Over
Page geometry: 1250x703+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 87 <<<<<<
Properties:
...
Artifacts:
...
Number pixels: 878750
As you can see here, the output quite literally has everything I would want to know to determine whether or not I should optimize this image or not, and it costs nothing in terms of a performance hit.
Conclusion
When you are iterating through a list of files in a folder, you can do so as many times as you like without worrying about over optimizing the images or keeping track of anything. You would simply filter out all the extensions you don't want to optimize (eg .bmp, .jpg, .png) then check their stats to see if they possess the attributes your function will apply to the image in the first place. If it has the same values, skip, if not, optimize.
Advanced
If you want to get extremely efficient, you would check each attribute of the image that you plan on optimizing and in your optimization execution you would only apply the options that have not been applied to the command.
Note
This technique is obviously meant to show an example of how you can accurately determine whether or not an image needs to be optimized. The actual options I have listed above are not the complete scope of elements that can be chosen. The are a variety of available options to choose from, and you can apply and check for as many as you want.
I have a large SVG file (approx. 60 MB, 10000x10000 pixels but with the potential to get much larger), and I'm wanting to create, say, many tiled 256x256 PNG images from it (in that example there would be 1600 images; round(10000/256)^2).
Does anyone have any idea of how to do this on a web server (running PHP amongst other things)? I thought about rsvg, but it doesn't seem to have any functionality to modify the bounding box (and I'd rather avoid doing it manually for each section). ImageMagick might be able to do it, but I've not been having much luck with getting it to work. Using rsvg to create a large PNG and then using a tool dedicated to tiling very large images might work, but I've not had any luck finding such a thing! Speed isn't really an issue, although it is desirable, so if the worst comes to the worst, I might look into modifying the SVG's bounding box per section. I could see the generation taking forever, though!
Anyone know of any methods to do this?
Edit 2016-03-02:
I recently came back to needing an answer for this question again, and Inkscape appears to be the only tool which can render SVGs for a given area at given sizes (svgexport almost meets these requirements, but it doesn't let you change the aspect ratio).
My aim was to tile an SVG into 256x256 tiles, and I've now successfully made a script which can tile an arbitrarily large SVG by doing repeated renderings in inkscape of about 16,000 x 16,000 and tiling the resulting images. I've successfully rendered SVGs where the dimensions are over 500,000 x 500,000 pixels—no problems with memory usage (it just takes a long time!)
inkscape has a command line mode to export pngs, taking an optional argument to choose which area to export
inkscape vector.svg --export-png=raster.png --export-area=0:0:100:100
I'd look at Apache Batik. In particular, their SVG Rasterizer looks like just what you need.
I've never used it for giant SVG files, though, so I don't know if it's optimized for that case or not.
Check out this question i posted earlier and got working.
If the image is only 10000x10000 the script i have in the question works best.
If however you want to use much bigger images check out the script in my anser.
ImageMagick crop huge image
PanoJS seems to do what you're asking about. You need to convert the SVG to a large PNG first though (e.g. using inkscape on the command line), and then use PanoJS's tilemaker to make the tiles. It is a very memory intensive beast, but if you can get it to run successfully, you can then use the PanoJS Javascript code to point to your webserver. XKCD used it for a large image describing money.
You might want to edit the source properties of your SVG (a copy), to render certain areas only. Use the "width" and "height" properties to match your desired tile size (256) and the "viewBox" to the desired tile area (for example 'viewBox="512 256 768 512"' for the 3rd tile in the second row).
You could do something like this in a loop:
$sed = "sed 's/width=\"10000\"/width=\"256\"' ".$sourcefile;
$sed .= " | sed 's/height=\"10000\"/height=\"256\"'";
$sed .= " | sed 's/viewBox=\"0 0 10000 10000\"/viewBox=\"0 0 256 256\"'";
exec($sed." > ".$tmpfile);
exec('rsvg '.$tmpfile.' > '.$tilefile);
I don't know how this behaves on very large files though.
What's the best approach to comparing two images with php and the Graphic Draw (GD) Library?
This is the scenario:
I have an image, and I want to find which image of a given set is the most similar to it.
The most similar image is in fact the same image, not pixel perfect match but the same image.
I've dramatised the difference between the two images with the number one on the example just to ease the understanding of what I meant.
Even though it brought no consistent results, my approach was to reduce the images to 1px using the imagecopyresampled function and see how close the RGB values where between images.
The sum of the values of deducting each red, green and blue decimal equivalent value from the red, green and blue decimal equivalent value of the possible match gave me a dissimilarity index that, even though it didn't work as expected since not always the most RGB similar image was the target image, I could use to select an image from the available targets.
Here's a sample of the output when comparing 4 images against a target image, in this case the apple logo, that matches one of them but is not exactly the same:
Original image:
Red:222 Green:226 Blue:232
Compared against:
http://a1.twimg.com/profile_images/571171388/logo-twitter_normal.png
Red:183 Green:212 Blue:212 and an index of similarity of 56
Red:117 Green:028 Blue:028 and an index of dissimilarity 530
Red:218 Green:221 Blue:221 and an index of dissimilarity 13 Matched Correctly.
Red:061 Green:063 Blue:063 and an index of dissimilarity 491
May not even be doable better with better results than what I'm already getting and I'm wasting my time here but since there seems to be a lot of experienced php programmers I guess you can point me in the right directions on how to improve this.
I'm open to other image libraries such as iMagick, Gmagick or Cairo for php but I'd prefer to avoid using other languages than php.
Thanks in advance.
I'd have thought your approach seems reasonable, but reducing an entire image to 1x1 pixel in size is probably a step too far.
However, if you converted each image to the same size and then computed the average colour in each 16x16 (or 32x32, 64x64, etc. depending on how much processing time/power you wish to use) cell you should be able to form some kind of sensible(-ish) comparison.
I would suggest, like middaparka, that you do not downsample to a 1 pixel only image, because you loose all the spatial information. Downsampling to 16x16 (or 32x32, etc.) would certainly provide better results.
Then it also depends on whether color information is important or not to you. From what I understand you could actually do without it and compute a gray-level image starting from your color image (e.g. luma) and compute the cross-correlation. If, like you said, there is a couple of images that matches exactly (except for color information) this should give you a pretty good reliability.
I used the ideas of scaling, downsampling and gray-level mentioned in the question and answers, to apply a Mean Squared Error between the pixels channels values for 2 images, using GD Library.
The code is in this answer, including a test with those ideas.
Also I did some benckmarking and I think the downsampling could be not needed in those little images, cause the method is fast (being PHP), just a fraction of a second.
Using middparka's methods, you can transform each image into a sequence of numeric values and then use the Levenshtein algorithm to find the closest match.