**hello, i'm trying to develop new web site for special purpose, i have list of images in uploaded in sever, i need to upload image from pc and doing search in list of images in server and return list of images has best similarity of uploaded image depend on image color not face all of those using php
this link describe my problem but no codes thanks **
This is a very complex task you are trying to do (and especially hard, because you want to do it in PHP).
What I can think of (in general) to achieve this contains the following sub-tasks:
Recognize colours
Recognize shapes
Recognize the connections of the upper two
In PHP the last two is nearly impossible (and makes no sense as PHP is not an image processing library, there are only basic functions in it). But you can do the first one using this library:
https://github.com/thephpleague/color-extractor
You can make the comparison as fine as you want. Get the most used colours (eg. 1000 one) and compare them as an array. Obviously you won't get an exact match, but if you compare the first 1000 and you find 500 match than that picture is somewhat similar to the other. However you could get completely false results, so this is rather a programmatic solution than a logical one.
Related
I have been working on a eCommerce project and now I am trying to implement search based on image .I have searched the web for possible solution.I came to know google and yahoo has stopped its support for image search API.I would like to know what needs to be extracted from an image and based on what i need to search in my db.Any suggestions will be helpful.Thanks
If you want a brute force method, you can calculate the hash of each image, store it in a database and calculate the hash of the file to search for, match it with the database and well... now you found the exact match of the image.
That may be useful in some situations, but in most you would want to find "similar" images. You could extract meta data from the image, like date taken, filename, etc. If you want to search your own fotoalbum, it is likely that images taken around the same time, are near the same location and thus contain the same content.
Google uses an (as far as I know) unknown method to take a part of the image and search using that information. For example: if you split the image in a X by Y grid and calculate the mean colour value, you can search the database for a match (obviously, you'll have to do this for every image and store the result in the database). If you allow a certain difference in value between the search image and database values, you are likely to find another image that is similar. Searching for only a part of the image, in the database, allows you to find pictures that look the same, but are moved.
Microsoft has created photoDNA, a method that finds the "edges" of the objects in the picture, turning it in a black-white image. Than they resize it to a small resolution and calculate a has. Using this method, you can find photo's that are near to the same, but slightly differ. Ideal if you want to find edited images and resized images.
Another method is to calculate the colour spectrum of the image, normalise it and search for that (with small variations) in a database. Than you'll get images that have near to the same colours, yet the content can be entirely different!
Deep learning could also be an option, if you have allot of images of the same object. By training the computer (e.g. with nVidea cuda), you can make the model recognise objects. If you than search with a photo with a dog on it, your result might be other images with a dog on it.
In summary: there are allot of different methods, each and every one has its own strength and weaknesses, but one they have in common: it's not easy to make!
I want to extract the table data from images or scanned documents and map the header fields to their particular values mostly in an insurance document.I have tried by extracting them line by line and then mapping them using their position on the page. I gave the table boundary by defining a table start and end pivot, but it doesn't give me proper result, since headers have multiple lines sometimes (I had implemented this in php). I also want to know whether I can use machine learning to achieve the same.
For pdf documents I have used tabula-java which worked pretty well for me. Is there a similar type of implementation for images as well?
Insurance_Image
The documents would be of similar type as in the link above but of different service providers so a generic method of extracting such data would be very useful.
In the image above I want map values like Make = YAMAHA, MODEL= FZ-S, CC= 153 etc
Thanks.
I would definitively give a go to Tesseract, a very good OCR engine. I have been using it successfully in reading all sorts of documents embedded in emails (PDF, images) and a colleague of mine used it for something very similar to your use case - reading specific fields from invoices.
After you parse the document, simply use regex to pick the fields of interest.
I don't think machine learning would be particularly useful for you, unless you plan to build your own OCR engine. I'd start with existing libraries, they offer very good performance.
The easiest and most reliable way to do it without much knowledge in OCR would be this:
- Take an empty template for reference and mark the boxes coordinates that you need to extract the data from. Label them and save them for future use. This will be done only once for each template.
- Now when reading the same template, resize it to match the reference templates dimensions (If it's not already matching).
- You have already every box's coordinates and know what data it should contain (because you labeled them and saved them on the first step).
Which means that now you can just analyze the pixels contained in each box to know what is written there.
This means that given a list of labeled boxes (that you extracted in the first step), you should be able to get the data in each one of these boxes. If this data is typed and not hand written the extracted data would be easier to analyze or do whatever you want with it using simple OCR libraries.
Or if the data is always the same size and font like your example template above, then you could just build your own small database of letters of that font and size. or maybe full words? Depends on each box's possible answers.
Anyway this is not the best approach by far but it would definitely get the work done with minimal effort and knowledge in OCR.
I have over 1.3milion images that I have to compare with each other, and a few hundreds per day are added.
My company take an image and create a version that can be utilized by our vendors.
The files are often very similar to each other, for example two different companies can send us two different images, a JPG and a GIF, both with the McDonald Logo, with months between the submissions.
What is happening is that at the end we find ourselves creating two different times the same logo when we could simply copy/paste the already created one or at least suggest it as a possible starting point for the artists.
I have looked around for algorithms to create a fingerprint or something that will allow me to do a simple query when a new image is uploaded, time is relatively not an issues, if it takes 1 second to create the fingerprint it will take 150 days to create the fingerprints but it will be a great deal in saving that we might even get 3 or 4 servers to do it.
I am fluent in PHP, but if the algorithm is in pseudocode or even C I can read it and try to translate (unless it uses some C specific libraries)
Currently I am doing an MD5 of all the images to catch the ones that are exactly the same, this question came up when I was thinking to do a resize of the image and run the md5 on the resized image to catch the ones that have been saved in a different format and resized, but then I would still not have a good enough recognition.
If I didn't mention it, I will be happy with something that just suggest possible "similar" images.
EDIT
Keep in mind that the check needs to be done multiple times per minute, so the best solution is one that gives me some values per image that I can store and use in the future to compare with the image that I am looking at without having to re-scan the whole server.
I am reading some pages that mention histograms, or resizing the image to a very small size, strip possible tags and then convert it in grayscale, do the hash of that files and use it for comparison. If I am succesful I will post the code/answer here
Try using file_get_contents and:
http://www.php.net/manual/en/function.hash-file.php
If the hashes match, then you know they are the exact same.
EDIT:
If possible I would think storing the image hashes, and the image path in a database table might help you limit server load. It is much easier to run the hash algorithm once on your initial images and store the hash in a table... Then when new images are submitted you can hash the image and then do a lookup on the database table. If the hash is already there discard it. You can use the hash as the table index and so once you find a match you dont need to check the rest.
The other option is to not use a database...But then you would have to always do a n lookup. That is check hash the incoming image and then run in memory a n time search against all saved images.
EDIT #2:
Please view the solution here: Image comparison - fast algorithm
To speedup the process, sort all the files with size and compare internals only if two sizes are equal. To compare internal data, using hash comparison is also fastest way. Hope this helps.
suppose there is an image on web without watermark. And someone downloads it and makes some edits on it like adding watermark etc etc. Is it possible to write a script in php to compare these two images. Like when I submit these two images to the script, it should be able to output the original image and manipulated image.
I read google's webmaster page which says
Google often finds multiple copies of the same image online. We use many different signals to identify the original source of the image
Blockquote
This is the main concern of my question
One more doubt is will there be any meta tags inside an image. if at all how to read them. Is it possible to edit them. Are there any information(not visual) inside an image which cannot be edited.
Anything within the image can be edited (it is, after all, just a collection of bytes), and it's definitely trivial for someone to add a watermark to an image, or simply change the contrast ever-so-slightly, to make it a very different file from the original. There are several other non-destructive changes that would make image files look completely different to a naive comparison algorithm (e.g., scaling, changing filetypes and compression, changing brightness, rotation, etc.).
Advanced image processing algorithms, however, can still often identify similarities between images that have been manipulated in ways like those above. There are many algorithms to do this, and honestly you could spend thousands of hours trying to roll an algorithm like this yourself. These sorts of algorithms are referred to as "content-based image retrieval."
You might be better off calling into engine that's already been developed to do exactly this. Here are some possibilities:
TinEye has a RESTful API that you can use, described here.
You could scrape the response from Google's Search by Image results using this technique.
You could use any of the number of suggestions within this slightly older StackOverflow post.
Good luck!
Photos taken by digital cameras usually have exif data embedded.
You can get the data with the exif_read_data function in PHP.
As for identifying similar images, here's some useful resources:
TinEye
SO Q on image similarity
The comments on Resig's article
You could submit both images to ImageEdited and see which one has been edited. Even if the exif data's missing, it tells when an image has been created with a program.
I run a site with lots of small images (www.iconfinder.com) and would like to develop a feature that can compare and recognize images. A user should be able to upload an image (icon) and then the site will respond with information about the image if it's in the database.
What is the approach to finding similar (or the same image). I know I can compare md5 of the two images, but I also want be able to find matches if the are scaled.
This is a good start if you are interested in looking at doing it in PHP:
http://www.intelliot.com/blog/2008/03/sorted-directory-listing-image-resizing-comparison-and-similarity-in-php/
There probably aren't a lot of languages LESS suited to this task than PHP. You should really look for an image comparison library with a C compatible API and figure out how to glue that into your PHP application.
Identical images can be checked with an md5sum, but detecting if somebody uploads a scaled image, which displays the same thing as the other is very hard. This requires digital image processing.
An approach is to scale down all images to a certain width (say 100px). Then check a few coordinates for the color. If another image matches a big part (say 80%), it might be the same image.
But if the image is lighter... this won't work.