Compare Images using a database of images using ImageMagick? - php

I know I can compare two images (to check whether they are visually the same, not to check their file format, EXIF, etc.) using compareImages( Imagick $compare , int $metric ) function of ImageMagick library in PHP (also available in several other programming languages).
Sample codes to compare 2 images in PHP:
<?php
$image1 = new imagick("image1.png");
$image2 = new imagick("image2.png");
// TODO: have to resize 2 images to same dimension first
$result = $image1->compareImages($image2, Imagick::METRIC_MEANSQUAREERROR);
$result[0]->setImageFormat("png");
header("Content-Type: image/png");
echo $result[0]; // display the result
// TODO: Add exception handling
?>
But with thousands of images to compare against, this function seems to be inefficient, as it can only compare one by one. Is there any function that I can use to make a fingerprint (something like that) of an image, so that I can easily search in the database?
Methods I can think of:
Convert the image to Base64 string
Fetch few sample pixels from each image, store the colors in the database (but this method is not accurate)
Use Image recognition library (something like Machine Learning) to add some tags for each image, then search by tag (this method is not accurate as well)
(anything else?)
All suggestions are welcomed.
p.s. programming language does not necessary to be in PHP.

You could use the getImageSignature function, which will return a string containing the SHA-256 hash of the file.
So either loop over all images and add the image signature to your database of images, or just add the signature on every comparisson you perform.
Hope that helps.

Related

Exploding Animated GIF and Manipulating Frames (GD Library)

So since animated GIFs are a series of GIFs concatenated together with "\x00\x21\xF9\x04", I am able to explode the GIF and implode it to take it apart and build it again. However I can't seem to get GD to create an image from the data.
Is there something I need to append in order to have GD recognize the data?
$im = file_get_contents('test.gif'); //get the data for the file
$imgarray = explode("\x00\x21\xF9\x04", $im); //split up the frames
foreach ($imgarray as $frame) {
$img[] = imagecreatefromstring($frame); //this is the line that fails
}
$new_gif = implode("\x00\x21\xF9\x04", $img); //this should work but imagecreatefromstring fails
$new_gif = implode("\x00\x21\xF9\x04", $imgarray); (Does work as it just puts the image back together)
A GIF does not contain just separate images appended after each other. A frame in a GIF may change just a part of the image - it does not have to cover the whole frame. It can also contain a local palette, but otherwise it relies on the global palette of the image - which is stored for the file itself and not just the frame.
I.e. - you can't just explode the file and decode each segment separately and except to get useful images from GD.
You'll at least have to add the gif header to each set of image data, but I strongly recommend using the PHP ImageMagick interface instead if possible - it has far better support for iterating through frames in an image.
Another option is using a pure PHP implementation that does what you want, such as GifFrameExtractor.
The relevant code is located at line 137 of the source file:
$img = imagecreatefromstring(
$this->fileHeader["gifheader"] .
$this->frameSources[$i]["graphicsextension"] .
$this->frameSources[$i]["imagedata"] .
chr(0x3b)
);
As you can see, there is far more data necessary (the header, the extension (87a vs 89) and a terminating character) to make it valid GIF data.
In Imagemagick, this is pretty trivial. You can coalesce the image to fill out any frames that have been optimized, then do your processing, then optimize it again.
Input:
convert bunny.gif -coalesce -resize 50% -layers optimize bunny_small.gif

How efficient is a PHP image resource?

I have a very large PNG image, and I am writing a method to get the value for a color at a specific (but changing) pixel of that image. When I create the image using:
$image = imagecreatefrompng('map.png');
Is the whole image loaded into memory (not ideal), or does it just read the meta data and prepare for other calls so that when I call:
int imagecolorat ( resource $image , int $x , int $y )
Will it file seek to the right pixel or pull from memory? If I'm trying to optimize this routine to be called repeatedly, would I be better off converting the image data I need into some raw binary format and using file seek? I'd like to avoid repeatedly loading the whole file into memory if possible.
You need a big php memory to play with php image resources.
Use graphicsMagick instead. http://www.graphicsmagick.org/

How can i get the name of resulting converted images from a pdf using php imagemagick?

Im converting some pdfs to pngs wth
exec("convert readme.pdf readme.png")
And then i need to store in a mysql database the resulting image filenames from any given pdf.
This is because when i convert in this manner and the source pdfs are more than one page i get a series of: readme-0.png readme-1.png readme-2.png
So the question is: how can i determine after conversion the resulting image filenames?
Thanks in advance, hope i made myself clear.
It'll always be originalname-#.png for multi-page conversions.
$input = "readme.pdf";
$basename = basename($input, '.pdf');
$images = glob("{$basename}-*.png");

Need help understanding difference in raw image binary data for PHPUnit test

So I wrote a Unit Test to compare cropped images (using imagemagick) in PHP. The test works, but i've been running into problems when it comes to comparing a large number of images at a time. Depending on the time the image is created at, each image receives a timestamp that is embedded directly into the raw data. I've been using a regular expression to pull out that timestamp right before comparing the files but it appears as though every once in a while, one of the image files will have additional raw data in it even though they're exactly the same.
To give an example, here's the result from one of my tests (note, i'm comparing the binary data of the images as a string):
ImageTest::testAutoCrop
Failed asserting that two strings are equal.
--- Expected
+++ Actual
## ##
?n??m?
-?F sO=f??????????^???????w??>
?(???/o????M)???o%tEXt??%tEXt
+?F sO=f??????????^???????w??>
?(???/o????M)???o%tEXt
As you can see....the only difference between these two files is that the expected image has this additional string in it: "?%tEXt".
Can someone help me understand what this random piece of data represents? That will help me figure out how to modify my unit test so that issues like this won't happen anymore.
Thanks,
Malcolm
PS: Please let me know if I need to provide more information.
So I eventually came up with a solution to this issue. Couple things to clarify:
The reason why I was making unit tests is because our imageservice web application ( PHP ) uses Imagemagick to handle all image processing, manipulation , conversion of HTML to image, and PDF to image ( jpg,png,gif, all non cmyk, pdf ) conversions that happen on our main website. Needed to make sure that as we added new features to this image service application, there were enough tests in place to ensure that everything still functioned correctly.
The string data that we saw in each image ( aka: ?%tEXt ) is the image's exif data. ( http://en.wikipedia.org/wiki/Exchangeable_image_file_format ) in order to compare pictures ( suggestion taken from David Andersson's reply ( https://stackoverflow.com/users/904933/david-andersson ) we needed to completely strip all comment data out of the image along with the creation date time stamp / modified on info. That way you're dealing with simply an image and no other type of meta data. We did that with the following function:
protected static function _removeTimeStamp( $string, $pdf = false ) {
/* Note: Assume $string parameter is the image you're planning on cleaning in string format. */
/* If you're working with a pdf, you need to remove the CreationDate using regex from the string representation. */
if ( $pdf )
return preg_replace( '/(CreationDate[^)]+)/', '', $string );
/* Create a path for the temporary image we're going to need to create that will hold the exif free image */
$strip_tmp = 'test/strip_tmp';
/* write contents of string to temp string file */
file_put_contents( $strip_tmp, $string );
/* this will remove all exif data along with the date:create and date:modify properties from the image */
exec( 'convert ' . $strip_tmp . ' -strip +set date:create +set date:modify ' . $strip_tmp . ' 2> /dev/null' );
/* get the string representation of the new "cleaned" image */
$result = file_get_contents( $strip_tmp );
/* delete the temp file */
unlink( $strip_tmp );
/* return the cleaned string */
return $result;
} // _removeTimeStamp
This was run on each image before comparing them to each other ( in String format ). Hopefully this helps someone in the future who might be doing something similar.
I plan on writing a blog post about this in more detail to show how I took care of a number of other tests. When I do I will update this question with the link in either the comments or this answer. Hope this helps someone.
In unit tests you should only test your units, not third party code's units.
You have not specified any details about your image resizer, but I assume you're making use of third-party functions which count as units of their own (one function is a unit, like one class is a unit).
So the question would be: Is the binary data generated by your code, your units? I guess not, otherwise you would have known why the binary data differs.
As those aren't your units, don't write tests for them. Instead, go to the project the original units come from (upstream) and check for their test-suite instead.
If you're concerned for integration tests (test that different units work with each other), you should define stable tests that can deal with the (different) data returned by sub-components. E.g. you might need an image comparison (is the pixel size and are the pixel values (as well as the fileformat maybe) correct) instead of comparing binary data which can differ as file-formats often allow more than one way how to encode the same image data (plus meta data).

Pixel Drawing Algorithm

I need an example algorithm that will draw pixels one at a time on a grid based (x,y) system, and also color them based on an rbg value based on binary data that is provided in some form. I am looking for anything written in php or a php like language such as C, but that does not use any sort of library or graphics card api, as i am coding in php.
Here is something that i wrote in php, that uses random color values but it takes 15 seconds to render in an html canvas:
<?php
$r_max = 240;
$c_max = 320;
$row = -1;//-1 to offset while
while ($row<$r_max){
++$row;
for($column=0; $column<=$c_max; ++$column)
{
echo 'ctx.fillStyle = "rgb(', rand()%255, ',', rand()%255, ',', rand()%255, ')";';
echo 'ctx.fillRect(', $column, ',', $row, ',1,1);';
}
}
?>
Not really sure i quite understand your question but .. PHP has GD functions that include image allocate and setpixel calls, line drawing etc .. check here
oh and yes imagemagick also for more exotic uses
It seem you are trying to output JavaScript commands for drawing on a <canvas> tag. A faster way to draw the pixels might be to use moveTo and lineTo. Btw, why isn't you outer loop a for loop as well?
Doesn't
for($row=0; $row<=$r_max; ++$row) {
for($column=0; $column<=$c_max; ++$column) {
# draw pixel
}
}
seem more natural?
The issue is that you're generating code for each pixel. Instead, why not have the code write the pixel info to your favorite image format, then display that in the page? That's the most reasonable (to me) algorithmic solution... I'm not sure if it'll fit into what you're trying to do.
I cant use an image format, because it is not efficient for my usage. I am looking for some example code where an image might be displayed based on data, just so I can get an idea of how to do what I am doing at a rate faster then 15 seconds per render. The nested loops I included above are way to slow.

Categories