I've used GD before, but only ever for resizing/generating images on the fly - though I'm pretty positive it has the capabilities to do what I'm after.
As simply as possible, I need to check an image to find out whether it has a light background or a dark background. I.e. if the background is predominately 'light' I'm returned a value of '1', and if it is predominately 'dark' it returns '0'.
There's only going to be 5 images iterated through in this process at a time, but I'm very concious here of processing time - the page is going to be called often.
Can anyone point me in the right direction on where to go with this?
First see if there are any patterns you can take advantage of - for instance, is the top-left or top-right corner (for example) always going to be of the background colour? If so, just look at the colour of that pixel.
Maybe you can get a "good enough" idea by looking at some key pixels and averaging them.
Failing something simple like that, the work you need to do starts to rise by orders of magnitude.
One nice idea I had would be to take the strip of pixels going diagonally across from the top-left corner to the bottom-right corner (maybe have a look at Bresenham's line algorithm). Look for runs of dark and light colour, and probably take the longest run; if that doesn't work, maybe you should "score" runs based on how light and dark they are.
If your image is unnecessarily large (say 1000x1000 or more) then use imagecopyresized to cheaply scale it down to something reasonable (say 80x80).
Something that will work if MOST of the image is background-colour is to resample the image to 1 pixel and check the colour of that pixel (or maybe something small, 4x4 or so, after which you count up pixels to see if the image is predominantly light or dark).
Note that imagecopyresampled is considerably more expensive than imagecopyresized, since 'resized just takes individual pixels from the original whereas 'resampled actually blends the pixels together.
If you want a measure of "lightness" you could simply add the R, G and B values together. Or you could go for the formula for luma used in YCbCr:
Y' = 0.299 * R + 0.587 * G + 0.114 * B
This gives a more "human-centric" measure of lightness.
Related
I don't know what the best technology to use here is, I know PHP can do vaguely similar things but point me in the right direction if I'm wrong.
I'm building an online store and I'd like an easy (automated) way to categorise the colours of each item for sale.
I've seen numerous posts on Stack which are related to this, here are some good discussions for those interested:
Programmatically determine human readable colours
Get Image Colour
Detect overall average colour of a picture
These are all well and good. However, my issue is a little different. The images in question are all on different coloured backgrounds, and these affect the "average colour" of the image. I've tried resizing my images down to 1px to get a colour average, but this doesn't quite work.
As you can see, for image #1 the average colour is going to be a lot whiter than the product colour; for #2 and #3 it's going to be a lot more brown.
Can anyone think of any methods I could use to get the right average colour, in an automated way, with PHP, Ruby, Python, or anything similar? My idea was to take a section from the middle of each photo (which is usually where the product in question is) and take the average of that. For instance, get a 30px x 30px square in the centre of the image and process that.
This won't be absolutely perfect though, and I'm completely new to this sort of programming - is there any better way to determine foreground colour?
I'd suggest you explode the image, giving weight to the center of the image.
convert image_source.jpg -implode -32 image_destination.jpg
Then calculate the average color (by scaling to 1x1) or pick an average from a centered box.
If you need more precision, you'll need a computer vision algorithm, to segregate the foreground from the background; you can have a look at OpenCV
It's quite a hard task what your up to.
My suggestion is that you maybe use quite a little more input
One picture only the background (without the object) and one with the object.
Now if you threshold the subtraction you can get the object pixels. (I meen just take those who change between the two pictures)
Using these Pixels you could take the histogramm and select the most common ones.
(http://php.net/manual/en/imagick.getimagehistogram.php)
I've got a security camera set up, and i'm comparing 1 minute batches of images to check if there's been any motion. I have 10 coordinates that I check between each image. If any pixels don't match the previous image, it triggers a warning message.
Problem is, it works too well.
The logic is basically if imagecolorat() is greater or less than a 10% difference from the previous imagecolorat(), it triggers. So, if a cloud comes over the house, it triggers. Basically any change in light triggers it. So, I've moved the threshold from 10% to 30% and it triggers less but I'm worried now that if I move any past that, real motion wont be detected.
Note: I'm using the raw output of imagecolorat(), not the RGB values. I'm not sure if this would have an impact.
You are looking for larger discontinuities - things like noise and slow variation should be discounted (unless this motion detection is for very slow moving small things, like garden gnomes).
Therefore, do a histogram equalize or something similar before image subtraction to account for global shifts in light, and do some filtering / edge enhancement before differencing to enhance the changes. Using more of the image will be better I would think, than just 10 points.
Histogram equalization entails looping through the image, and counting bins for each brightness value - so you have a resulting data set that says how many pixels are in a set of tonal ranges. In other words, say you divvy up into 16 bins - pixels that have a greyscale value (or alternately, the Brightness in an HSB model) in the value of 0..15 (assuming an 8 bit image here in this channel) are all in bin 1. Then you go forth and compute a series of linear stretches to apply to each bin so that it occupies an output range in proportion to its population. So if ALL of your pixels are in the 0.. 15 bin, you would just multiply everything by 16 to stretch them out. The goal is that your histogram of your equalized image is flat - equal amounts of pixel in every bin.
Edge enhancement can be simply done with the application of a Sobel filter.
We have this map, we need to use PHP to take all the shades of blue out, as well as the percentages. The problem is, is that some of the percentages are in the same color as the borders, and other times, the percentages go into the border. We need to use this image.
There are not (AFAIK) really easy ways.
The easiest way doesn't give you good results: separate channels and delete small components.
The result is this:
As you can see there are a few numbers and percent signs remaining because they are connected to the delimiting lines and deleting small components doesn't work for them.
If you need a better job, you should correlate the image with a template of each number, and once identified, delete it.
Here you can see the result of the correlation with the number "2":
One wrong "2" is identified, (see top left), so a more sophisticated approach may be needed for a general procedure.
Anyway, I think these kind of manipulation is well beyond what you can expect from K-12.
HTH!
Edit
As per your request, some detail about the first method.
You first separate the three channels, and get three images:
You keep the third (the blue channel)
Then you need to delete the smaller components. There are a lot of methods to do that, probably the easiest is derived from the connectivity detection for example in the flood-fill algorithm, but you just measure the components without filling them.
The basic (not optimized) idea is to travel every pixel in the image and count how many pixels are "connected" with it. If there are less than a specified number (threshold), you just delete the whole set. Most image manipulation libraries have all these functions already implemented.
For this specific image, if you open the image in image editing software, convert the mode from index to true color (RGB), and then color dodge the entire image with yellow (RGB: 255,255,0), you wind up with a black and white image consisting of the outlines and numbers. (this is also what the blue channel looks like BTW)
So either throw away the red and green channels, or implement a color dodge algorithm.
Another alternative is to sample each pixel, and the set that pixel's R & G components to the B value
edit: actually, I forgot about the white numbers. to get those, flood fille the outer white with the rgb(0,0,255), invert the entire image, and color dodge with (255,255,0), the red or green channel is now the missing numbers. Overlay these on top of the processed image from previous steps above.
Getting rid of the shaded colors should be easy.
Getting rid of the numbers is more tricky. I would:
Make a lookup table of the pixel data associated with each number and the % sign.
When clearing an area, look for the numbers (black or white) and only clear out exact patterns from the lookup table.
Recreate the border between areas by adding a black color between different shades.
It's impossible to do this with guaranteed accuracy simply because the digits hide original information. However, I think the above steps would give you close to 100% accuracy without a lot of effort.
I am working on a real estate website and i would like to write a program that
can figure out(classify) if an image is a floor plan or a company logo.
Since i am writing in php i will prefer a php solution but any c++ or opencv solution will be fine as well.
Floor Plan Sample:
alt text http://www.rentingtime.com/uploads/listing/l0050/0000050930/68614.jpg
alt text http://www.rentingtime.com/uploads/listing/l0031/0000031701/44199.jpg
Logo Sample:
alt text http://www.rentingtime.com/uploads/listing/l0091/0000091285/95205.jpg
As always, there is a built-in PHP function for this. Just joking. =)
All the floor plans I've seen they are pretty monochromatic, I think you can play with the number of colors and color saturation to have a pretty good guess is the image is a logo or a floor plan.
E.g.: is the image has less than 2 or 3 colors is a floor plan.
E.g.: if the sum / average of the saturation is less than X it's a floor plan.
Black and white (and other similar colors that are used in floor plans) have a saturation that is zero, or very close to zero, while logos tend to be more visually attractive, hence use more saturated colors.
Here is a simple function to compute the saturation of a Hex RGB color:
function Saturation($color)
{
$color = array_map('hexdec', str_split($color, 2));
if (max($color) > 0)
{
return (max($color) - min($color)) / max($color);
}
return 0;
}
var_dump(Saturation('000000')); // black 0.0000000000000000
var_dump(Saturation('FFFFFF')); // white 0.0000000000000000
var_dump(Saturation('818185')); // grey 0.0300751879699249
var_dump(Saturation('5B9058')); // green 0.3888888888888889
var_dump(Saturation('DE1C5F')); // pink 0.8738738738738738
var_dump(Saturation('FE7A15')); // orange 0.9173228346456692
var_dump(Saturation('FF0000')); // red 1.0000000000000000
var_dump(Saturation('80FF80')); // --- 0.4980392156862745
var_dump(Saturation('000080')); // --- 1.0000000000000000
Using imagecolorat() and imagecolorsforindex() you can implement a simple function that loops trough all the pixels of the image and sums / computes the average of the saturation. If the image has a saturation level above of a custom threshold you define you can assume that the image is a logo.
One thing you shouldn't forget is that images that have a higher resolution will normally have more saturation (more pixels to sum), so for the sake of this algorithm and also for the sake of your server performance it would be wise to resize all the images to a common resolution (say 100x100 or 50x50) to classify them and once classified you could use the original (non-resized) images.
I made a simple test with the images you provided, here is the code I used:
$images = array('./44199.jpg', './68614.jpg', './95205.jpg', './logo.png', './logo.gif');
foreach ($images as $image)
{
$sat = 0;
$image = ImageCreateFromString(file_get_contents($image));
for ($x = 0; $x < ImageSX($image); $x++)
{
for ($y = 0; $y < ImageSY($image); $y++)
{
$color = ImageColorsForIndex($image, ImageColorAt($image, $x, $y));
if (is_array($color) === true)
{
$sat += Saturation(dechex($color['red']) . dechex($color['green']) . dechex($color['blue']));
}
}
}
echo ($sat / (ImageSX($image) * ImageSY($image)));
echo '<hr />';
}
And here are the results:
green floor plant: 0.0151028053
black floor plant: 0.0000278867
black and white logo: 0.1245559912
stackoverflow logo: 0.0399864136
google logo: 0.1259357324
Using only these examples, I would say the image is a floor plant if the average saturation is less than 0.03 or 0.035, you can tweak it a little further by adding extra examples.
It may be easiest to outsource this to humans.
If you have a budget, consider Amazon's Mechanical Turk. See Wikipedia for a general description.
Alternatively, you could do the outsourcing yourself. Write a PHP script to display one of your images and prompt the user to sort it as either a "logo" our "floorplan." Once you have this running on a webserver, email your entire office ånd ask everyone to sort 20 images as a personal favor.
Better yet, make it a contest-- the person who sorts the most images will win an ipod!
Perhaps most simply, invite everyone you know over for pizza and beers and setup a bunch of laptops and get everyone to spend a few minutes sorting.
There are software ways to accomplish your task, but if it is a one-off event with less than a few thousand images and a budget of at least a few hundred dollars, than I think your life may be easier using humans.
One of the first things that comes to mind is the fact that floor plans tend to have considerably more lines oriented at 90 degrees than any normal logo would.
A fast first-pass would be to run Canny edge detection on the image and vote on angles using a Hough transform and the rho, Theta definition of a line. If you see a very strong correspondence for Theta=(0, 90, 180, 270) summed over rho, you can classify the image as a floor plan.
Another option would be to walk the edge image after the Canny step to only count votes from long, continuous line segments, removing noise.
I highly doubt any such tool already exists, and creating anything accurate would be non-trivial. If your need is to sort out a set of existing images ( e.g., you have an unsorted directory ), then you might be able to write a "good enough" tool and manually handle the failures. If you need to do this dynamically with new imagery, it's probably the wrong approach.
Were I to attempt this for the former case, I would probably look for something trivially different I can use as a proxy. Are floor plans typically a lot larger then logos ( in either file size or image dimensions )? Do floor plans have less colors then a logo? If I can get 75% accuracy using something trivial, it's probably the way to go.
Stuff like this - recoginition of patterns in images - tends to be horribly expensive in terms of time, horribly unreliable and in constant need of updating and patching to match new cases.
May I ask why you need to do this? Is there not a point in your website's workflow where it could be determined manually whether an image is a logo or a floor plan? Wouldn't it be easier to write an application that lets users determine which is which at the time of upload? Why is there a mixed set of data in the first place?
Despite thinking this is something that requires manual intervention, one thing you could do is check the size of the image.
A small (both in terms of MB and dimensions) image is likely to be a logo.
A large (both in terms of MB and dimensions) image is likely to be a floorplan.
However, this would only be a probability measurement and by no means foolproof.
The type of image is also an indicator, but less of one. Logos are more likely to be JPG, PNG or GIF, floorplans are possibly going to be TIFF or other lossless format - but that's no guarantee.
A simple no-brainer attempt I would first try would be to use SVM to learn the SIFT keypoints obtained from the samples. But before you can do that, you need to label a small subset of the images, giving it either -1 (a floor plan) or 1 (a logo). if an image has more keypoints classified as a floor plan then it must be a floorplan, if it has more keypoints classified as a logo then it must be a logo. In Computer Vision, this is known as the bag-of-features approach, also one of the simplest methods around. More complicated methods will likely yield better results, but this is a good start.
As others have said, such image recognition is usually horribly complex. Forget PHP.
However, looking over your samples I see a criteria that MIGHT work pretty well and would be pretty easy to implement if it did:
Run the image through good OCR, see what strings pop out. If you find a bunch of words that describe rooms or such features...
I'd rotate the image 90 degrees and try again to catch vertical labels.
Edit:
Since you say you tried it and it doesn't work maybe you need to clean out the clutter first. Slice the image up based on whitespace. Run the OCR against each sub-image in case it's getting messed up trying to parse the lines. You could test this manually using an image editor to slice it up.
Use both color saturation and image size (both suggested separately in previous answers). Use a large sample of human-classified figures and see how they plot in the 2-D space (size x saturation) then decide where to put the boundary. The boundary needs not be a straight line, but don't make too many twists trying to make all the dots fit, or you'll be "memoryzing" the sample at the expense of new data. Better to find a relatively simple boundary that fits most of the samples, and it should fit most of the data.
You have to tolerate a certain error. A foolproof solution to this is impossible. What if I choose a floorplan as my company's logo? (this is not a joke, it just happens to be funny)
How would you develop something similar to what is described in this DabbleDB blog post?
Just answered a kinda related SO question yesterday. Some of the concepts there, along with the resulting test code (on git hub) could be a good start.
As evolve mentions, scanning every pixel in an image (or even just the border) can be resource intensive. However, in this case (since you want to identify far more than just average color) it may be the way to go. Resizing the logo to a sane size would help reduce the server load, and shouldn't really effect the output.
Update: For these examples assume an image object has been created and $width and $height have been determined using imagesx(), getimagesize(), etc.
Background Color
The first thing we needed to do was figure out the logo’s background color. And that’s all the first version did, by using the corner pixels.
Here use imagecolorat() to find the corner colors. Alternatively, use the average border color method from the referenced answer at the top.
$color = imagecolorat($image, 0, 0); //do this for each corner
$rgb = imagecolorsforindex($image, $color); //convert each color to a RGB array
//average colors
Saturation
It turns out color theory has a way of measuring interestingness: saturation. So we pulled out an interesting color from the logo. Using the same color for the border and the text made things a bit more monotone than we wanted, so finally we decided to try and grab two interesting colors if they’re present.
You can use the RGB to HSL functions at the imagecolorsforindex() manual page along with the pixel scanning code mentioned at the top to find color values with high saturation.
Luminance
We turned again to color theory and realized we could use the border color’s luminance to decide whether black or white text was more suitable.
This SO thread lists different RGB to luminance calculations. I'm not certain what method is best (or technically correct) to convert 0-255 RGB images. But for what this is accomplishing (should the text be dark or light), I don't think it'll matter that much.