My scenario is as follows:
I have to save 1000 of images in database, and then I have to compare new image with database images for matches (match should be 70% or more) to get the best match image from database in php.
is there any algorithm or method for fast comparison with better result ...
Thanks in advance :)
I would suggest you use a Perceptual Hash or similar - mainly for reasons of performance. In essence, you create a single number, or hash, for each image ONCE in your database at the point where you insert it, and retain that hash in the database. Then when you get a new image to insert, you calculate its hash and compare it to the PRE-CALCULATED hash of all the other images so that you don't have to drag all the megabytes of pixels of your existing images from disk to compare them.
The best pHASHes are scale-invariant and image format invariant. Here is an article by Dr Neal Krawetz... Perceptual Hashing.
ImageMagick can also do Perceptual Hashing and is callable from PHP - see here.
Try this class. It support get hash string from image to store in database and compare with new image later:
https://github.com/nvthaovn/CompareImage
It is very fast and accurate, although not optimal code. I have 20000 pictures in my database.
This depends entirely on how smart you want the algorithm to be.
For instance, here are some issues:
cropped images vs. an uncropped image
images with a text added vs. another without
mirrored images
The easiest and simplest algorithm I've seen for this is just to do the following steps to each image:
scale to something small, like 64x64 or 32x32, disregard aspect ratio, use a combining scaling algorithm instead of nearest pixel
scale the color ranges so that the darkest is black and lightest is white
rotate and flip the image so that the lighest color is top left, and then top-right is next darker, bottom-left is next darker (as far as possible of course)
Edit A combining scaling algorithm is one that when scaling 10 pixels down to one will do it using a function that takes the color of all those 10 pixels and combines them into one. Can be done with algorithms like averaging, mean-value, or more complex ones like bicubic splines.
Then calculate the mean distance pixel-by-pixel between the two images.
To look up a possible match in a database, store the pixel colors as individual columns in the database, index a bunch of them (but not all, unless you use a very small image), and do a query that uses a range for each pixel value, ie. every image where the pixel in the small image is between -5 and +5 of the image you want to look up.
This is easy to implement, and fairly fast to run, but of course won't handle most advanced differences. For that you need much more advanced algorithms.
Related
I am looking for an algorithm to compare two images, one is given static in highest quality and the other is taken individually with maybe not so good quality periodically.
The static one is way smaller and should be IN the second image at different positions.
Is there an algorithm to compare whether an image is part of another image like i described and the result is maybe given as odds to be in there?
If the size of the small image does not change in the bigger images (just the position changes) then all you need is a fuzzy comparison function between two small squares of pixels in two images that compare the positions of colors and give a match score (You could even just use sum-of-squared distance in RGB space). If the small image is higher resolution than the larger images then you'll have to scale the pixel widths of the squares you are comparing accordingly, and possibly use fractional pixels to make sure the actual sizes of the square patches you are comparing are equal.
Anyway, once you get a strong enough match for a small square then you can continue to compare the rest of the squares in the images to verify that you have a match. Just keep going as long as the match score between squares is high enough, and as soon as it isn't, then move on to the next possible position for the small image inside the larger image. For images with a lot of entropy everywhere (not a lot of plain black spots, for example) this should work very fast.
My users are uploading images to my website and i would like first to offer them already uploaded images first. My idea is to
1. create some kind of image "hash" of every existing image
2. create a hash of newly uploaded image and compare it with the other in the database
i have found some interesting solutions like http://www.pureftpd.org/project/libpuzzle or or http://phash.org/ etc. but they got one or more problems
they need some nonstandard extension to PHP (or are not in PHP at all) - it would be OK for me, but I would like to create it as a plugin to my popular CMS, which is used on many hosting environments without my control.
they are comparing two images but i need to compare one to many (e.g. thousands) and doing it one by one would be very uneffective / slow ...
...
I would be OK to find only VERY similar images (so e.g. different size, resaved jpg or different jpg compression factor).
The only idea I got is to resize the image to e.g. 5px*5px* 256 colors, create a string representation of it and then find the same. But I guess that it may have create tiny differences in colors even with just two same images with different size, so finding just the 100 % same would be useless.
So I would need some good format of that string representation of image which than could be used with some SQL function to find similar, or some other nice way. E.g. phash create perceptional hashes, so when two numbers are close, the images should be close as well, so i just need to find closest distances. But it is again external library.
Is there any easy way?
I've had this exact same issue before.
Feel free to copy what I did, and hopefully it will help you / solve your problem.
How I solved it
My first idea that failed, similar to what you may be thinking, is I ended up making strings for every single image (no matter what size). But I quickly worked out this fills your database super fast, and wasn't effective.
Next option (that works) was a smaller image (like your 5px idea), and I did exactly that, but with 10px*10px images. The way I created the 'hash' for each image was the imagecolorat() function.
See php.net here.
When receiving the rgb colours for the image, I rounded them to the nearest 50, so that the colours were less specific. That number (50) is what you want to change depending on how specific you want your searches to be.
for example:
// Pixel RGB
rgb(105, 126, 225) // Original
rgb(100, 150, 250) // After rounding numbers to nearest 50
After doing this to every pixel (10px*10px will give you 100 rgb()'s back), I then turned them into an array, and stored them in the database as base64_encode() and serialize().
When doing the search for images that are similar, I did the exact same process to the image they wanted to upload, and then extracted image 'hashes' from the database to compare them all, and see what had matching rounded rgb's.
Tips
The Bigger that 50 is in the rgb rounding, the less specific your search will be (and vice versa).
If you want your SQL to be more specific, it may be better to store extra/specific info about the image in the database, so that you can limit the searches you get in the database. eg. if the aspect ratio is 4:3, only pull images around 4:3 from the database. (etc)
It can be difficult to get this perfectly 5px*5px, so a suggestion is phpthumb. I used it with the syntax:
phpthumb.php?src=IMAGE_NAME_HERE.png&w=10&h=10&zc=1
// &w= width of your image
// &h= height of your image
// &zc= zoom control. 0:Keep aspect ratio, 1:Change to suit your width+height
Good luck mate, hope I could help.
For an easy php implementation check out: https://github.com/kennethrapp/phasher
However - I wonder if there is a native mySql function for "compare" (see php class above)
I scale down image to 8x8 then I convert RGB to 1-byte HSV so result hash is 172 bytes string.
HSVHSVHSVHSVHSVHSVHSVHSV... (from 8x8 block, 172 bytes long)
0fff0f3ffff4373f346fff00...
It's not 100% accurate (some duplicates aren't found) but it works nice and looks like there is no false positive results.
Putting it down in an academical way, what you are looking for is a similarity function which takes in two images and returns an indicator how far/similar the two images are. This indicator could easily be a decimal number ranging from -1 to 1 (far apart to very close). Once you have this function you can set an image as a reference and compare all the images against it. Then finding the similar images to one is as simple as finding the closest similarity factor to it which is done with a simple search over a double field within an RDBMS like MySQL.
Now all that remains is how to define the similarity function. To be honest this is problem specific. It depends on what you call similar. But covariance is usually a good starting point, it just needs your two images to be of the same size which I think is of no big deal. Yet you can find lots of other ideas searching for 'similarity measures between two images'.
How can I calculate what the majority colors of an image are in PHP? I'd prefer to group different shades of a similar color into a single bucket, so for example all shades of blue are just counted as "blue".
In other words, I'd like a function that takes an image and returns a simple array similar to:
"blue":90%, "white":10%
No need for high accuracy, just enough to categorize the images by dominant and sub-dominant colors. Thanks!
Here's one approach:
1) Define a set of colours which we'll call centroids -- these are the middle of the basic colours you want to break images into. You can do this using a clustering algorithm like k-means, for example. So now you've got, say, 100 centroids (buckets, you can think of them as), each of which is an RGB colour triple with a name you can manually attach to it.
2) To generate the histogram for a new image:
open the image in gd or whatever
convert it to an array of pixel values (e.g. using imagecolorat)
determine the distance (euclidean distance is ok) between the pixel value and all the centroids. Classify each pixel as to which bucket it's closest to.
Your output is a centroid assignment for each pixel. Or, given you just want a histogram, you can just count how many times each centroid occurs.
Bear in mind that this kind of colour assignment is somewhat subjective. I'm not sure there'll be a definitive mapping from colours to names (e.g., it's language dependent). But if you google, there might exist a look-up table that you could use, although I've not come across one.
Hope this helps!
Ben
I'm currently dealing with a large number of images (over 3000), and I have two copies of each; one large, one small. The problem is I don't have any kind of link to say which small image maps to which large image.
I know roughly which large image goes to which small image (I have an array which contains up to 5 possibilities for each small image), and I'd like to loop round the array and compare each large image to the small image, to see if it's the same image but resized.
TLDR/Didn't Understand: Is there any easy way in PHP to compare two image that are the same, one being (say) 200x200 and the other being (say) 500x500, and determine if they're an image of the same thing?
The easiest way would be resize both to a small size (16x16 or 32x32 for example depending on how much CPU you want to use keeping track of the names of each file throughout the process.
Then use imagecolorat() to compare pixel colours through each row/column. You can then define a percentage of which need to match to be considered the same picture.
Edit: here's an example of this implemented http://www.thismayhem.com/php/comparing-images-with-php-gd/
Have you tried if resizing 1 of the big image creates the exact same image as one of the smaller ones(manually use Md5 to find that out).
If it does, then your solution is to batch resize the bigger images to a tmp file and md5 it building an array of bigimagemd5s and then you md5 the smaller ones.
This method will only work if the small images were generated using the php GD algorithm and you have the same compression level and size that was used before.
A thing you could do is, take the big picture, scale it to the size of the small picture and check some random pixel-areas there. If the pictures are really different, you could use something like a avg. colour of both images.
But this way is really performance sucking...
No, there is not a same_image_but_in_different_sizes() function that returns true or false.
If I were you, I'd either sort them manually or choose a color histogram based approach.
I just saw that all "images are JPGs, and they're saved in two folders; large and small". Why don't you just throw small away and recreate it, but this time you remember which belongs to which?
I would use the imagemagick library for resize and compare the images. I would call it using exec()
First, resize the large image using the convert command http://www.imagemagick.org/script/convert.php
Then, compare the small image with the resized image using compare command http://www.imagemagick.org/script/compare.php
to sort them manually will take you much, much less time.
Please check this link:
Compare 2 images in php
It seems that using ImageMagick extension is your best bet, although you'd probably have to scale down the larger image to the same size as your smaller one first.
What's the best approach to comparing two images with php and the Graphic Draw (GD) Library?
This is the scenario:
I have an image, and I want to find which image of a given set is the most similar to it.
The most similar image is in fact the same image, not pixel perfect match but the same image.
I've dramatised the difference between the two images with the number one on the example just to ease the understanding of what I meant.
Even though it brought no consistent results, my approach was to reduce the images to 1px using the imagecopyresampled function and see how close the RGB values where between images.
The sum of the values of deducting each red, green and blue decimal equivalent value from the red, green and blue decimal equivalent value of the possible match gave me a dissimilarity index that, even though it didn't work as expected since not always the most RGB similar image was the target image, I could use to select an image from the available targets.
Here's a sample of the output when comparing 4 images against a target image, in this case the apple logo, that matches one of them but is not exactly the same:
Original image:
Red:222 Green:226 Blue:232
Compared against:
http://a1.twimg.com/profile_images/571171388/logo-twitter_normal.png
Red:183 Green:212 Blue:212 and an index of similarity of 56
Red:117 Green:028 Blue:028 and an index of dissimilarity 530
Red:218 Green:221 Blue:221 and an index of dissimilarity 13 Matched Correctly.
Red:061 Green:063 Blue:063 and an index of dissimilarity 491
May not even be doable better with better results than what I'm already getting and I'm wasting my time here but since there seems to be a lot of experienced php programmers I guess you can point me in the right directions on how to improve this.
I'm open to other image libraries such as iMagick, Gmagick or Cairo for php but I'd prefer to avoid using other languages than php.
Thanks in advance.
I'd have thought your approach seems reasonable, but reducing an entire image to 1x1 pixel in size is probably a step too far.
However, if you converted each image to the same size and then computed the average colour in each 16x16 (or 32x32, 64x64, etc. depending on how much processing time/power you wish to use) cell you should be able to form some kind of sensible(-ish) comparison.
I would suggest, like middaparka, that you do not downsample to a 1 pixel only image, because you loose all the spatial information. Downsampling to 16x16 (or 32x32, etc.) would certainly provide better results.
Then it also depends on whether color information is important or not to you. From what I understand you could actually do without it and compute a gray-level image starting from your color image (e.g. luma) and compute the cross-correlation. If, like you said, there is a couple of images that matches exactly (except for color information) this should give you a pretty good reliability.
I used the ideas of scaling, downsampling and gray-level mentioned in the question and answers, to apply a Mean Squared Error between the pixels channels values for 2 images, using GD Library.
The code is in this answer, including a test with those ideas.
Also I did some benckmarking and I think the downsampling could be not needed in those little images, cause the method is fast (being PHP), just a fraction of a second.
Using middparka's methods, you can transform each image into a sequence of numeric values and then use the Levenshtein algorithm to find the closest match.