I'm currently dealing with a large number of images (over 3000), and I have two copies of each; one large, one small. The problem is I don't have any kind of link to say which small image maps to which large image.
I know roughly which large image goes to which small image (I have an array which contains up to 5 possibilities for each small image), and I'd like to loop round the array and compare each large image to the small image, to see if it's the same image but resized.
TLDR/Didn't Understand: Is there any easy way in PHP to compare two image that are the same, one being (say) 200x200 and the other being (say) 500x500, and determine if they're an image of the same thing?
The easiest way would be resize both to a small size (16x16 or 32x32 for example depending on how much CPU you want to use keeping track of the names of each file throughout the process.
Then use imagecolorat() to compare pixel colours through each row/column. You can then define a percentage of which need to match to be considered the same picture.
Edit: here's an example of this implemented http://www.thismayhem.com/php/comparing-images-with-php-gd/
Have you tried if resizing 1 of the big image creates the exact same image as one of the smaller ones(manually use Md5 to find that out).
If it does, then your solution is to batch resize the bigger images to a tmp file and md5 it building an array of bigimagemd5s and then you md5 the smaller ones.
This method will only work if the small images were generated using the php GD algorithm and you have the same compression level and size that was used before.
A thing you could do is, take the big picture, scale it to the size of the small picture and check some random pixel-areas there. If the pictures are really different, you could use something like a avg. colour of both images.
But this way is really performance sucking...
No, there is not a same_image_but_in_different_sizes() function that returns true or false.
If I were you, I'd either sort them manually or choose a color histogram based approach.
I just saw that all "images are JPGs, and they're saved in two folders; large and small". Why don't you just throw small away and recreate it, but this time you remember which belongs to which?
I would use the imagemagick library for resize and compare the images. I would call it using exec()
First, resize the large image using the convert command http://www.imagemagick.org/script/convert.php
Then, compare the small image with the resized image using compare command http://www.imagemagick.org/script/compare.php
to sort them manually will take you much, much less time.
Please check this link:
Compare 2 images in php
It seems that using ImageMagick extension is your best bet, although you'd probably have to scale down the larger image to the same size as your smaller one first.
Related
My scenario is as follows:
I have to save 1000 of images in database, and then I have to compare new image with database images for matches (match should be 70% or more) to get the best match image from database in php.
is there any algorithm or method for fast comparison with better result ...
Thanks in advance :)
I would suggest you use a Perceptual Hash or similar - mainly for reasons of performance. In essence, you create a single number, or hash, for each image ONCE in your database at the point where you insert it, and retain that hash in the database. Then when you get a new image to insert, you calculate its hash and compare it to the PRE-CALCULATED hash of all the other images so that you don't have to drag all the megabytes of pixels of your existing images from disk to compare them.
The best pHASHes are scale-invariant and image format invariant. Here is an article by Dr Neal Krawetz... Perceptual Hashing.
ImageMagick can also do Perceptual Hashing and is callable from PHP - see here.
Try this class. It support get hash string from image to store in database and compare with new image later:
https://github.com/nvthaovn/CompareImage
It is very fast and accurate, although not optimal code. I have 20000 pictures in my database.
This depends entirely on how smart you want the algorithm to be.
For instance, here are some issues:
cropped images vs. an uncropped image
images with a text added vs. another without
mirrored images
The easiest and simplest algorithm I've seen for this is just to do the following steps to each image:
scale to something small, like 64x64 or 32x32, disregard aspect ratio, use a combining scaling algorithm instead of nearest pixel
scale the color ranges so that the darkest is black and lightest is white
rotate and flip the image so that the lighest color is top left, and then top-right is next darker, bottom-left is next darker (as far as possible of course)
Edit A combining scaling algorithm is one that when scaling 10 pixels down to one will do it using a function that takes the color of all those 10 pixels and combines them into one. Can be done with algorithms like averaging, mean-value, or more complex ones like bicubic splines.
Then calculate the mean distance pixel-by-pixel between the two images.
To look up a possible match in a database, store the pixel colors as individual columns in the database, index a bunch of them (but not all, unless you use a very small image), and do a query that uses a range for each pixel value, ie. every image where the pixel in the small image is between -5 and +5 of the image you want to look up.
This is easy to implement, and fairly fast to run, but of course won't handle most advanced differences. For that you need much more advanced algorithms.
I'm attempting to utilize PHP to compare 2 separate images which may or may not contain the same image. Essentially, they are from an item database application and would have been saved over a year apart. The images are usually resized from roughly 600x600 to 300x300 and in one I checked by eye the image was the same but the file size varied by about 8kb.
Example,
Image created 1/25/2012 is 7,102 bytes
Image created 2/1/2014 is 7,094 bytes
Any suggestions would be greatly appreciated.
Here is a quick and easy way of comparing two images in PHP, based on this blog post.
Tweak the size of $new_img and whether to use grayscaling or not, and find that sweet spot that works on the data you have. When implementing this you should manually check the results a (preferably large) dataset to find out what works best for you.
I am working on one of my project where user will upload the image. The uploaded image will be displayed using lightbox.
The problem is that user may upload an image of size say 5mb and so on. Because of this it takes large loading time. So I thought to reduce quality of image keeping the dimension same.
I know that we can use, imagejpeg() function, and pass third parameter which is quality say 90, that also reduces file size.
I need all image file size to be max 1mb, not more than that.
So, I am confused in what value should be pass as quality, so as to have optimum quality.
For eg. if uploaded image file size is 1.2mb, then say I will pass 90 as quality, that may bring down the size to less that 1mb, and also quality will be acceptable. Another case if uploaded file size is say 5mb, than if I pass 90 as quality, then file-size may not be less than 1mb. Here I need to pass less quality value (I guess).
So is there any method that helps me in determining optimum quality parameter that should be passed.
Many thanks for your time.
I spent ages messing with this same sort of thing a while back. First of all PNG and JPG formates i found to be quite awkward in guessing the results. JPG photos generally wont convert well to PNG, but vector jpg images will, vector png generally dont convert well to JPG, but photos will. You will find it very hard to ensure that the images get just under 1mb especially if you are not resizing them as well. If you do not resize them you will also hit more troubles - the quality will become less and less the larger the image is (kind of ironic).
You need a few versions of each image. Let me explain.
You want to keep the original, so you can refresh the images based on the original at any time in the future to lets say, increase the quality of the images later on or even get some new dimentions.
Using the original image with lightbox would generally be a bad idea. You would be better to resize it to fit inside say 1024 x 1024. That would more than satisfy most, and would usually give a reasonable file size. You could then offer the original to be downloaded if you wish. Using the largest file possible would usually result from unhappy visitors when the image takes ages to load and could even cost them a lot after some time through bandwidth usage (and you for the same reason).
The only way i can think of to do what you are asking is to create some horrible code performance wise that loops through, gradually decreasing the quality and checking the resulting image size. Once it finds the image size that is below 1mb it comes out the loop and uses that quality setting. I would really advise against this though.
P.S. Outside the scope of this answer i have mentioned in the comments the methodology of a solution i used.
My users are uploading images to my website and i would like first to offer them already uploaded images first. My idea is to
1. create some kind of image "hash" of every existing image
2. create a hash of newly uploaded image and compare it with the other in the database
i have found some interesting solutions like http://www.pureftpd.org/project/libpuzzle or or http://phash.org/ etc. but they got one or more problems
they need some nonstandard extension to PHP (or are not in PHP at all) - it would be OK for me, but I would like to create it as a plugin to my popular CMS, which is used on many hosting environments without my control.
they are comparing two images but i need to compare one to many (e.g. thousands) and doing it one by one would be very uneffective / slow ...
...
I would be OK to find only VERY similar images (so e.g. different size, resaved jpg or different jpg compression factor).
The only idea I got is to resize the image to e.g. 5px*5px* 256 colors, create a string representation of it and then find the same. But I guess that it may have create tiny differences in colors even with just two same images with different size, so finding just the 100 % same would be useless.
So I would need some good format of that string representation of image which than could be used with some SQL function to find similar, or some other nice way. E.g. phash create perceptional hashes, so when two numbers are close, the images should be close as well, so i just need to find closest distances. But it is again external library.
Is there any easy way?
I've had this exact same issue before.
Feel free to copy what I did, and hopefully it will help you / solve your problem.
How I solved it
My first idea that failed, similar to what you may be thinking, is I ended up making strings for every single image (no matter what size). But I quickly worked out this fills your database super fast, and wasn't effective.
Next option (that works) was a smaller image (like your 5px idea), and I did exactly that, but with 10px*10px images. The way I created the 'hash' for each image was the imagecolorat() function.
See php.net here.
When receiving the rgb colours for the image, I rounded them to the nearest 50, so that the colours were less specific. That number (50) is what you want to change depending on how specific you want your searches to be.
for example:
// Pixel RGB
rgb(105, 126, 225) // Original
rgb(100, 150, 250) // After rounding numbers to nearest 50
After doing this to every pixel (10px*10px will give you 100 rgb()'s back), I then turned them into an array, and stored them in the database as base64_encode() and serialize().
When doing the search for images that are similar, I did the exact same process to the image they wanted to upload, and then extracted image 'hashes' from the database to compare them all, and see what had matching rounded rgb's.
Tips
The Bigger that 50 is in the rgb rounding, the less specific your search will be (and vice versa).
If you want your SQL to be more specific, it may be better to store extra/specific info about the image in the database, so that you can limit the searches you get in the database. eg. if the aspect ratio is 4:3, only pull images around 4:3 from the database. (etc)
It can be difficult to get this perfectly 5px*5px, so a suggestion is phpthumb. I used it with the syntax:
phpthumb.php?src=IMAGE_NAME_HERE.png&w=10&h=10&zc=1
// &w= width of your image
// &h= height of your image
// &zc= zoom control. 0:Keep aspect ratio, 1:Change to suit your width+height
Good luck mate, hope I could help.
For an easy php implementation check out: https://github.com/kennethrapp/phasher
However - I wonder if there is a native mySql function for "compare" (see php class above)
I scale down image to 8x8 then I convert RGB to 1-byte HSV so result hash is 172 bytes string.
HSVHSVHSVHSVHSVHSVHSVHSV... (from 8x8 block, 172 bytes long)
0fff0f3ffff4373f346fff00...
It's not 100% accurate (some duplicates aren't found) but it works nice and looks like there is no false positive results.
Putting it down in an academical way, what you are looking for is a similarity function which takes in two images and returns an indicator how far/similar the two images are. This indicator could easily be a decimal number ranging from -1 to 1 (far apart to very close). Once you have this function you can set an image as a reference and compare all the images against it. Then finding the similar images to one is as simple as finding the closest similarity factor to it which is done with a simple search over a double field within an RDBMS like MySQL.
Now all that remains is how to define the similarity function. To be honest this is problem specific. It depends on what you call similar. But covariance is usually a good starting point, it just needs your two images to be of the same size which I think is of no big deal. Yet you can find lots of other ideas searching for 'similarity measures between two images'.
I have one basic question. I have project where I need more sizes of one picture.
Yes... During uploading you make thumbnails... and so on... I know this story ... performance vs. storing possibilities.
So I save original img, a make 2 thumbnails copies for example max width 100px and maxwidht 200px with respect to ratio.
Now I need show image in 150px max width so I take saved img(200px) and .....
I use getimagesize() for calculating showing width and height respected to ratio,
or I set max-widht and max-height and I leave it for browser (browser make it for me),
or I set width a keep height: auto (but I want also limit max height)
So actualy I use php and getimagesize() but this function every time work with file and I am little scared. When you process 1 img it is OK but what about 20 or 100.
And... another idea, while uploading I save to DB also size information, for this I have to save data for 3 img (now only original one) this complicate everything.
So ... any ideas? What is your practice? THX.
Two images, at a maximum: A thumbnail, and the original image are sufficient. Make sure that your upload page is well-secured, because I've seen a website taken down through DoS (abusing an unprotected image-resizing page). Also limit the maximum upload size, to prevent abuse.
You can use the max-width and max-height CSS properties to limit the size of your images.
My approach
I wrote a pretty simple gallery application in php a while ago and this is how it works:
The images are stored in a folder with subfolders representing albums (and subalbums). They are uploaded via FTP and the webserver only has read-permissions on them.
For each image there are three versions:
a full one (the original)
a "mid" one (1024x768px max)
a "thumb" one (250x250px max)
All requests for images by the browser are served by php, and not-yet-existing versions are generated on the fly. The actual data is served through X-Sendfile, but that's an implementation detail.
I store the smaller versions in separate directories. When given a path to an original image, it is trivial to find the corresponding downscaled files (and check for existence and modification times).
Thoughts on your problem
Scaling images using HTML / CSS is considered bad practice for two simple reasons: if you are scaling up, you have a blurred image. If you are scaling down, you waste bandwidth and make your page slower for no good reason. So don't do it.
It should be possible to determine a pretty small set of required versions of each file (for example those used in a layout as in my case). Depending on the size and requirements of your project there are a few possibilities for creating those versions:
on the fly: generate / update them, when they are requested
during upload: have the routine that is called during the upload-process do the work
in the background: have the upload-routine add a job to a queue that is worked on in the background (probably most scalable but also fairly complex to implement and deploy)
Scaling down large images is a pretty slow operation (taking a few seconds usually). You might want to throttle it somehow to prevent abuse / DoS. Also limit dimensions and file size. A 100 MP (or even bigger) plain white (or any color) JPG might be very small when compressed, but will use an awful lot of RAM during scaling. Also big PNGs take really long to decompress (and even more to compress).
For a small website it doesn't matter, which approach you choose. Something that works (even if it doesn't scale) will do. If you plan on getting a good amount of traffic and a steady stream of uploads, then choose wisely and benchmark carefully.