How to check whether an image is part of an larger image

How to check whether an image is part of an larger image - php

I am looking for an algorithm to compare two images, one is given static in highest quality and the other is taken individually with maybe not so good quality periodically.
The static one is way smaller and should be IN the second image at different positions.
Is there an algorithm to compare whether an image is part of another image like i described and the result is maybe given as odds to be in there?

If the size of the small image does not change in the bigger images (just the position changes) then all you need is a fuzzy comparison function between two small squares of pixels in two images that compare the positions of colors and give a match score (You could even just use sum-of-squared distance in RGB space). If the small image is higher resolution than the larger images then you'll have to scale the pixel widths of the squares you are comparing accordingly, and possibly use fractional pixels to make sure the actual sizes of the square patches you are comparing are equal.
Anyway, once you get a strong enough match for a small square then you can continue to compare the rest of the squares in the images to verify that you have a match. Just keep going as long as the match score between squares is high enough, and as soon as it isn't, then move on to the next possible position for the small image inside the larger image. For images with a lot of entropy everywhere (not a lot of plain black spots, for example) this should work very fast.

Related

Image comparison in php

My scenario is as follows:
I have to save 1000 of images in database, and then I have to compare new image with database images for matches (match should be 70% or more) to get the best match image from database in php.
is there any algorithm or method for fast comparison with better result ...
Thanks in advance :)

I would suggest you use a Perceptual Hash or similar - mainly for reasons of performance. In essence, you create a single number, or hash, for each image ONCE in your database at the point where you insert it, and retain that hash in the database. Then when you get a new image to insert, you calculate its hash and compare it to the PRE-CALCULATED hash of all the other images so that you don't have to drag all the megabytes of pixels of your existing images from disk to compare them.
The best pHASHes are scale-invariant and image format invariant. Here is an article by Dr Neal Krawetz... Perceptual Hashing.
ImageMagick can also do Perceptual Hashing and is callable from PHP - see here.

Try this class. It support get hash string from image to store in database and compare with new image later:
https://github.com/nvthaovn/CompareImage
It is very fast and accurate, although not optimal code. I have 20000 pictures in my database.

This depends entirely on how smart you want the algorithm to be.
For instance, here are some issues:
cropped images vs. an uncropped image
images with a text added vs. another without
mirrored images
The easiest and simplest algorithm I've seen for this is just to do the following steps to each image:
scale to something small, like 64x64 or 32x32, disregard aspect ratio, use a combining scaling algorithm instead of nearest pixel
scale the color ranges so that the darkest is black and lightest is white
rotate and flip the image so that the lighest color is top left, and then top-right is next darker, bottom-left is next darker (as far as possible of course)
Edit A combining scaling algorithm is one that when scaling 10 pixels down to one will do it using a function that takes the color of all those 10 pixels and combines them into one. Can be done with algorithms like averaging, mean-value, or more complex ones like bicubic splines.
Then calculate the mean distance pixel-by-pixel between the two images.
To look up a possible match in a database, store the pixel colors as individual columns in the database, index a bunch of them (but not all, unless you use a very small image), and do a query that uses a range for each pixel value, ie. every image where the pixel in the small image is between -5 and +5 of the image you want to look up.
This is easy to implement, and fairly fast to run, but of course won't handle most advanced differences. For that you need much more advanced algorithms.

Sort very similar images PHP OCR

thanks for looking at my question.
Basically what I'm trying to do is find all images that look like the first and the third image here: http://imgur.com/a/IhHEC
and remove all the ones that don't look like that (2,4).
I've tried several libraries to no avail.
Another acceptable way to do this is to check if the images contain "Code:", as that string is in each one that I have to sort out.
Thank you,
Steve
EDIT: Although the 1st and 3rd images seem like they are the same size, they are not.

If those are the actual images you're going to use, it looks like histogram similarity will do the job. The first and third are very contrasty, the second and fourth, especially the fourth, have a wide range of different intensities.
You could easily make a histogram of the shades of grey in the image and then apply thresholds to the shape of the histogram to classify them.
EDIT: To actually do this: you can iterate through every pixel and create an array of pixel value => number of times found. As it's greyscale you can take either the R, G or B channel. Then divide each number by the number of pixels in the image to normalise, so it will work for any size. Each entry in the histogram will then be a fraction of the number of pixels used. You can then measure the number of values above a certain threshold. If there are lots of greys, you'll get a large number of small values. If there aren't, you'll get a small number of large values.

Due to my background in working more with text from images than image objects, I would do this in post-OCR process, by searching the text content for 'keywords' or checking for 'regular expression' representing your desired data. This means that the entire job needs to be separated into two stages: image-to-text OCR (free or cheap, software or cloud), and actual separation process (simple programming).

Find similar images in (pure) PHP / MySQL

My users are uploading images to my website and i would like first to offer them already uploaded images first. My idea is to
1. create some kind of image "hash" of every existing image
2. create a hash of newly uploaded image and compare it with the other in the database
i have found some interesting solutions like http://www.pureftpd.org/project/libpuzzle or or http://phash.org/ etc. but they got one or more problems
they need some nonstandard extension to PHP (or are not in PHP at all) - it would be OK for me, but I would like to create it as a plugin to my popular CMS, which is used on many hosting environments without my control.
they are comparing two images but i need to compare one to many (e.g. thousands) and doing it one by one would be very uneffective / slow ...
...
I would be OK to find only VERY similar images (so e.g. different size, resaved jpg or different jpg compression factor).
The only idea I got is to resize the image to e.g. 5px*5px* 256 colors, create a string representation of it and then find the same. But I guess that it may have create tiny differences in colors even with just two same images with different size, so finding just the 100 % same would be useless.
So I would need some good format of that string representation of image which than could be used with some SQL function to find similar, or some other nice way. E.g. phash create perceptional hashes, so when two numbers are close, the images should be close as well, so i just need to find closest distances. But it is again external library.
Is there any easy way?

I've had this exact same issue before.
Feel free to copy what I did, and hopefully it will help you / solve your problem.
How I solved it
My first idea that failed, similar to what you may be thinking, is I ended up making strings for every single image (no matter what size). But I quickly worked out this fills your database super fast, and wasn't effective.
Next option (that works) was a smaller image (like your 5px idea), and I did exactly that, but with 10px*10px images. The way I created the 'hash' for each image was the imagecolorat() function.
See php.net here.
When receiving the rgb colours for the image, I rounded them to the nearest 50, so that the colours were less specific. That number (50) is what you want to change depending on how specific you want your searches to be.
for example:
// Pixel RGB
rgb(105, 126, 225) // Original
rgb(100, 150, 250) // After rounding numbers to nearest 50
After doing this to every pixel (10px*10px will give you 100 rgb()'s back), I then turned them into an array, and stored them in the database as base64_encode() and serialize().
When doing the search for images that are similar, I did the exact same process to the image they wanted to upload, and then extracted image 'hashes' from the database to compare them all, and see what had matching rounded rgb's.
Tips
The Bigger that 50 is in the rgb rounding, the less specific your search will be (and vice versa).
If you want your SQL to be more specific, it may be better to store extra/specific info about the image in the database, so that you can limit the searches you get in the database. eg. if the aspect ratio is 4:3, only pull images around 4:3 from the database. (etc)
It can be difficult to get this perfectly 5px*5px, so a suggestion is phpthumb. I used it with the syntax:
phpthumb.php?src=IMAGE_NAME_HERE.png&w=10&h=10&zc=1
// &w= width of your image
// &h= height of your image
// &zc= zoom control. 0:Keep aspect ratio, 1:Change to suit your width+height
Good luck mate, hope I could help.

For an easy php implementation check out: https://github.com/kennethrapp/phasher
However - I wonder if there is a native mySql function for "compare" (see php class above)

I scale down image to 8x8 then I convert RGB to 1-byte HSV so result hash is 172 bytes string.
HSVHSVHSVHSVHSVHSVHSVHSV... (from 8x8 block, 172 bytes long)
0fff0f3ffff4373f346fff00...
It's not 100% accurate (some duplicates aren't found) but it works nice and looks like there is no false positive results.

Putting it down in an academical way, what you are looking for is a similarity function which takes in two images and returns an indicator how far/similar the two images are. This indicator could easily be a decimal number ranging from -1 to 1 (far apart to very close). Once you have this function you can set an image as a reference and compare all the images against it. Then finding the similar images to one is as simple as finding the closest similarity factor to it which is done with a simple search over a double field within an RDBMS like MySQL.
Now all that remains is how to define the similarity function. To be honest this is problem specific. It depends on what you call similar. But covariance is usually a good starting point, it just needs your two images to be of the same size which I think is of no big deal. Yet you can find lots of other ideas searching for 'similarity measures between two images'.

How can you find the "majority colors" of an image using PHP?

How can I calculate what the majority colors of an image are in PHP? I'd prefer to group different shades of a similar color into a single bucket, so for example all shades of blue are just counted as "blue".
In other words, I'd like a function that takes an image and returns a simple array similar to:
"blue":90%, "white":10%
No need for high accuracy, just enough to categorize the images by dominant and sub-dominant colors. Thanks!

Here's one approach:
1) Define a set of colours which we'll call centroids -- these are the middle of the basic colours you want to break images into. You can do this using a clustering algorithm like k-means, for example. So now you've got, say, 100 centroids (buckets, you can think of them as), each of which is an RGB colour triple with a name you can manually attach to it.
2) To generate the histogram for a new image:
open the image in gd or whatever
convert it to an array of pixel values (e.g. using imagecolorat)
determine the distance (euclidean distance is ok) between the pixel value and all the centroids. Classify each pixel as to which bucket it's closest to.
Your output is a centroid assignment for each pixel. Or, given you just want a histogram, you can just count how many times each centroid occurs.
Bear in mind that this kind of colour assignment is somewhat subjective. I'm not sure there'll be a definitive mapping from colours to names (e.g., it's language dependent). But if you google, there might exist a look-up table that you could use, although I've not come across one.
Hope this helps!
Ben

Using imagecolorat() for motion detection

I've got a security camera set up, and i'm comparing 1 minute batches of images to check if there's been any motion. I have 10 coordinates that I check between each image. If any pixels don't match the previous image, it triggers a warning message.
Problem is, it works too well.
The logic is basically if imagecolorat() is greater or less than a 10% difference from the previous imagecolorat(), it triggers. So, if a cloud comes over the house, it triggers. Basically any change in light triggers it. So, I've moved the threshold from 10% to 30% and it triggers less but I'm worried now that if I move any past that, real motion wont be detected.
Note: I'm using the raw output of imagecolorat(), not the RGB values. I'm not sure if this would have an impact.

You are looking for larger discontinuities - things like noise and slow variation should be discounted (unless this motion detection is for very slow moving small things, like garden gnomes).
Therefore, do a histogram equalize or something similar before image subtraction to account for global shifts in light, and do some filtering / edge enhancement before differencing to enhance the changes. Using more of the image will be better I would think, than just 10 points.
Histogram equalization entails looping through the image, and counting bins for each brightness value - so you have a resulting data set that says how many pixels are in a set of tonal ranges. In other words, say you divvy up into 16 bins - pixels that have a greyscale value (or alternately, the Brightness in an HSB model) in the value of 0..15 (assuming an 8 bit image here in this channel) are all in bin 1. Then you go forth and compute a series of linear stretches to apply to each bin so that it occupies an output range in proportion to its population. So if ALL of your pixels are in the 0.. 15 bin, you would just multiply everything by 16 to stretch them out. The goal is that your histogram of your equalized image is flat - equal amounts of pixel in every bin.
Edge enhancement can be simply done with the application of a Sobel filter.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.