Algorithm for sorting images by relevance - php

I'm developing a feature on a forum site that will allow to include a link and other type of content on a post (for clarifying the question or answer).
Related to the link feature implementation, I have several things to work on:
Validate the URI entered (well formed, valid scheme, etc.)
Validate that the remote resource exists
Extract images from within the remote page
Show to the user the set of images and let him choose one
Here comes the challenge. Previous to step 4, it would be great to sort this set of images in order of 'relevance'. I know that it's a goal quite ambiguous :-) but I can explain what I've gone through with the results given in step 4 and you will know why I'm dealing with this solution.
Many times, I get this kind of things into the set of images:
Images used for the layout of the page (tiny and useless)
Banners and ads
Pseudo-duplication of images (original and resized one)
Anarchical order of the set (logo on last position, etc.)
I decide to clean up this mess removing tiny images and sorting them by size, but I know that will be far away from a good solution.
Any ideas on that???
Thank you very much!

You could sort by saturation (which is a good indicator of how interesting an image might be), take a look at the question "Image Classification - Detecting Floor Plans" for a sample implementation.
The hardest thing is separating image ads from regular images (since they are designed to look very interesting), to do this I suggest one or more of the following possible solutions:
ignore images that have standard dimensions of ads
query the page twice and ignore the images that change (ads tend to be dynamic)
ignore images hosted on external sites (watchout for CDNs!) or specific ad-serving URLs
To overcome the problem of duplicated images in resolution you could resize them all to a very low resolution (like 8x8 or 4x4) and if two or more images are alike ignore the small(er) one(s).

You might also want to sort images by where they're hosted - on-site hosted images first, off-site images second. Most ad images these days are served from 3rd-party servers, so oftentimes local images are the more relevant ones.

Related

AVATAR Saving: "Compressing" multiple layered CSS div images into one file for download

I am building an avatar-generator for a PHP/MySQL site I am working on. It uses CSS to layer multiple .png files to create the background, body, facial expressions, etc. for a user's avatar. This I have covered.
I want to add a feature to my site that will allow the user to download their layered avatar "image" as one .jpg file. Is this even possible? I think I have seen this functionality before but can't recall the site where I saw this now.
Of course, I could come up with a series of pre-generated files that would cover all of the computations possible with my images, but with somewhere around 200 objects to choose from and a maximum of 10 layers of choices, the number of permutations possible is somewhere around 8.14702044e+22! Obviously, this is possible for me to do but I would be old and gray before completing the task!
Poking around the Internet has led me to believe there might be some way to "screen cap" - with what software and if it can capture a small section of the screen I don't know. Besides, would this bog down my site (which is currently running at top speed)?
I've searched through Stack Overflow for similar questions but didn't find anything that addresses my problem specifically. That said, I am not certain what to even search for (the precise terminology) as this concept of layering and saving as one image is foreign to me.
I found a solution to my own problem. It's not quite what I had in mind when planning this part of the project, but the end result will be close enough (with some other modifications).
I will implement html2canvas (http://html2canvas.hertzen.com) to take a screenshot of the user's avatar when they have pressed "Save". I will store the resultant image to my server and this will be their avatar. Their selected variable data will be stored in a database so that they can load up their avatar at a later date and make changes to it.

Gallery Images Ideas

Since multiple requests can slow down the speed at which a site loads, I was thinking that in the case of a gallery, would one large image containing all the thumbnails be better than loading individual thumbnails?
The large image would then use PHP to "chop up" the thumbnails and place them in the relevant locations on the page.
My main concern is would this have a negative impact on SEO? Since Google would only see one large image file, instead of many smaller ones. Would a way around this be to set the src of all thumbnails to redirect to the script that handles the thumbnail generation, where the image file name refers to a particular set of coordinates for that image?
As a rule of the thumb; for buttons/icons/stufflikethat use image sprites (one large image combining all images, uses css to only show a part between specific coordinates), for 'real' content images, just use separate images.
This has several reasons; icons, buttons and so on are images that appear on sometimes every page of your site and often multiple times on the same page. So it is really useful to combine them, as ie. it is really inefficient to start a new http connection to download an icon of 1kb (or less), imagine what will happen if you use hundreds. Furthermore this type of images are not important at all for your seo rank, only for the look of your site (but google doesn't care if your site is ugly as hell or beautiful as a princess)
But on the other hand, 'content' images, such as thumbnails, photo's of your holiday or baseball tournament are often big enough to rule out the efficiency part. As you can see in the chrome developer tools or firebug the browser will start the download of all images simultaneously. So downloading one image is pretty much as fast as downloading a hundred. But if you combine a hundred images, the download will be slower, as you have to download a larger bit of data in one piece. In comparison; pushing 2 gallons of water trough one hose will take longer than pushing the same 2 gallons trough 10 hoses. (offcourse this metaphore has it's holes, but it illustrates my point).
But more importantly; google reads out the img tags and uses the filename (src), the title and (less importantly) the alt attributes to determine how your image should relate to your seo rank. Images do have a relevant effect on your seo rank! But google also knows if it is the same image showing, or a different one, so a sprite wouldn't help you here. A script, with parameters saying which part of the image has to be loaded wouldn't help you at all, I believe if you think it over you can figure out why ;)
So don't bother about merging the thumbnails and stuff like that. If you want to improve speed, move your attention to caching and speeding up transmission. Some really simple improvements can be implemented by using for example gzip compression (google .htaccess gzip for that), proper caching headers etc.
You got it right, it is always better to download one large image and get all the image from there, i guess you meant javascript with the chop out thing, because you have to do that on the client side. This in terms of performance is a very good idea and lot of sites do it. Another idea is to use smaller images and resize them in the client side.Just be careful with the resizing not affecting the resolution of the image.
Im not so sure about this being negative to SEO, and as far as i know google doesnt execute any javascript function, so that work around, i dont think it would work. But in all honesty im not so sure about this, in my last job we never considered images as a big impact in SEO.

given two images, determine whether one is edited from the other (and which is the original)

suppose there is an image on web without watermark. And someone downloads it and makes some edits on it like adding watermark etc etc. Is it possible to write a script in php to compare these two images. Like when I submit these two images to the script, it should be able to output the original image and manipulated image.
I read google's webmaster page which says
Google often finds multiple copies of the same image online. We use many different signals to identify the original source of the image
Blockquote
This is the main concern of my question
One more doubt is will there be any meta tags inside an image. if at all how to read them. Is it possible to edit them. Are there any information(not visual) inside an image which cannot be edited.
Anything within the image can be edited (it is, after all, just a collection of bytes), and it's definitely trivial for someone to add a watermark to an image, or simply change the contrast ever-so-slightly, to make it a very different file from the original. There are several other non-destructive changes that would make image files look completely different to a naive comparison algorithm (e.g., scaling, changing filetypes and compression, changing brightness, rotation, etc.).
Advanced image processing algorithms, however, can still often identify similarities between images that have been manipulated in ways like those above. There are many algorithms to do this, and honestly you could spend thousands of hours trying to roll an algorithm like this yourself. These sorts of algorithms are referred to as "content-based image retrieval."
You might be better off calling into engine that's already been developed to do exactly this. Here are some possibilities:
TinEye has a RESTful API that you can use, described here.
You could scrape the response from Google's Search by Image results using this technique.
You could use any of the number of suggestions within this slightly older StackOverflow post.
Good luck!
Photos taken by digital cameras usually have exif data embedded.
You can get the data with the exif_read_data function in PHP.
As for identifying similar images, here's some useful resources:
TinEye
SO Q on image similarity
The comments on Resig's article
You could submit both images to ImageEdited and see which one has been edited. Even if the exif data's missing, it tells when an image has been created with a program.

How to improve this wallpaper gallery?

Gallery - http://schnell.dreamhosters.com/wallpapers.php
The purpose of this gallery is simple - store a lot of wallpapers and sort them by resolution and/or aspect ratio for people to browse and download as they like. There's a few features I've wanted to work in, but I'm not quite sure how best to do them or how to do them at all. The presentation is in HTML 4, CSS, Javascript and jQuery + plugins. The work behind the scenes is done in PHP.
1 - Make the images downloadable without 'Save Image As...'. Right now I'm using a contrivance whereby clicking the Download link in the bottom-right of each image's box opens a new box with instructions telling the user to 'Right Click. Save Image As...'. I'd like to avoid this entirely if possible.
2 - Make the searching and sorting faster and more efficient. Right now all the images are stored in a folder on my webspace and I use a shell command and a lot of fancy filtering in PHP to get the images I want based on the filters (the page number I'm on and the aspect ratio or resolution I chose). I thought of maybe doing something with MySQL, but I haven't quite figured out yet how I'd do that and maintain the structure my page has.
3 - Make the images load faster. There's probably no easy coding solution to this, so this one is more of a 'I wish' than a 'I want to'.
4 - Improve the layout. This one is more subjective and 'artsy' I suppose, but any suggestions would be nice.
5 - An upload system. Give the ability to upload your own wallpapers and maybe include a short description or some tags. I have absolutely no idea how to handle this as I've never worked with uploading of files before. And this also leads to...
6 - A tagging system or some other type of user-made sorting system. Again, no experience here.
Any insight on any of these issues would be great, and feel free to throw in any suggestions of your own.
Send the files with the MIME type "application/octet-stream" to make a browser download rather than display them
It would definitely be better to store information about the images in a database rather than exploring the filesystem
The images really aren't loading slowly for me, so I can't really suggest anything here. If your site gets larger (much larger) you might want to look into CDNs
The layout is OK but it needs some design, it's incredibly plain at the moment. It would also be nice to see more information on the images - what they are of, where they're from, who made them, etc (don't forget: correct copyright attribution)
You probably want to read the PHP handbook section on handling file uploads. To handle description and tags, you'll definitely want a database of some sort.
Also not hard if you have a correctly formed database. If you've never designed a schema before you probably want to learn a little about normalisation and many-to-many relationships to do the tags.
Lastly you didn't ask for it, but it'd be nice if it were possible to have the same image in multiple resolutions (quite common on image sites - think Flickr, Deviantart, etc).

Programmatically combining images in PHP

I'm a big fan of Yahoo's recommendations for speeding up websites. One of the recommendations is to combine images where possible to cut down on size and the number of requests. However, I've noticed that while it can be easy to use CSS sprites for layouts, other image uses aren't as easily combined. The primary example I'm thinking of is a blog or article list, where each blog or article also has an image associated with it. Those images can greatly affect load time and page size, especially if they aren't optimized. What I'm looking for, in concept or in practice, is a way to dynamically combine those images while running them through a loss-less compression using PHP.
A few added thoughts or concerns:
Combining the images and generating
a dynamic CSS stylesheet to position
the backgrounds of the images might
be one way to go about it, but I
also worry about accessibility and
semantics. As far as I understand,
CSS images should be used for layout
elements and the img tag (with the
alt attribute) should be used for
images that are meant to convey
information. I could set the image
as a background to a div element and
substitute a title attribute for the
alt attribute, but I'm unsure about
the accessibility and semantic
implications of doing so.
Might the GD library be a good
candidate for something like this?
Can you recommend other options?
I wouldn't go down this route if I were you. Sure, you may save a few bytes in protocol overhead by reducing the number of requests, but this would more-tha-likely end up being self-defeating.
Imagine this scenario:
A blog site, whose front page has 10 articles at a time. Each article has it's own image associated with it. To save a byte or two of transfer time, you programatically create a composite image of all 10 article images. You now have one of two problems.
You must update the composite image each time a new post is made, as the most recent 10 images will have a modified set of content.
You decide to create a new composite each request, on the fly.
Obviously, #1 is preferable here, and would not be difficult to implement. However, what if a user searches for all posts tagged with the word "SQL"? You are unlikely to have a composite image of the first 10 results already created for this simple query, let alone a more complex one. Also, what happens if you want to update or delete an image? Once again you'd have to trigger the background creation of the composite.
How about an RSS aggregator, like Google Reader? It wouldn't have the required logic to figure out which portion of a composite image it would need to display, and would probably display the full image. (I mention Google Reader because I very rarely visit blog sites directly, tending to trust to an RSS aggregation service like Reader)
If it were me, I'd leave the single images alone. With modern connection speeds, the tradeoff between additional bandwidth overhead and on-server processing time is unlikely to win you and great gains.
Having said that, if you decide to go down this route anyway, I'd say the GD library is an excellent place to start.
You'd almost certainly be better off reducing the filesize of the images in articles, than combine them. I'd agree that there might be accessibility issues with the method you suggest. Also, I suppose it depends on what you mean by "dynamic" - if you're thinking of combining those images and generating CSS for each page load, you might well find that that results in slower page load times for users with average connection speeds.
As to your second point, GD could certainly handle that. A better use of GD for reducing page load times might be reducing the image quality of your article images to reduce filesizes, at article creation time, not at page load.

Categories