Preventing duplicate enteries in database - php

I am using a form to upload files(images) to my server. How can I possibly prevent the same image from being uploaded twice?? I cannot possibly look wether the a image by same title exists as same images can have different titles and different images can have same title.
Any help is appreciated.

Create a hash like ZombieHunter suggested. Why? Because is easy and fast to search and check through a big table of hashes if the image already exists. Unfortunately all this hash metdods like md5 or md5_file work on existing files not on remote ones. So you will have to upload the file anyway. What you can do is then decide if you want to keep or not the file. If you are fetching the files from an online resource, maybe there are ways to detect from headers the file size and run a hash without downloading it, but this is a special case.
Also if you have other business logic attached to those images, with concepts like userHasImages or companyHasImages you can organize them in namespaces/folders/tags so you can speed the search even further.
In terms of database strictly speaking prevention of duplicate entries, use an unique index for the column that contains the hash.

Related

For image upload, should I add a field to the MYSQL database to check against, or simply use PHP to check if the image exists?

I have a simple image upload form. When someone uploads an image, it is for a football pool, so there always is a $poolid that goes with the image they upload.
Right now, I am naming the uploaded image using the poolid. So for example, if someone uploads an image, it might get named P0714TYER7EN.png.
All the app will ever do is, when it outputs the football pool's page, it will check to see if an image exists for that pool and if so, it will show it. It checks like this:
if (file_exists("uploads/".$poolid.".png")) { //code to show it }
My first thought when planning this was to add a field called "image" in my MYSQL database's table for all the pool information (called pools) and I would store a value of either the image name (P0714TYER7EN.png) or empty if there wasn't one uploaded. Then I would check that field in the database to determine if an image exists or not.
But I realized I don't really need to store anything in the database because I can simply use the PHP file_exists check above to know if there is an image or not.
In other words, it would seem redundant to have a field in the database.
Everything works doing it this way (i.e. NOT having a field in the database) but I'm wondering if this is bad practice for any reason?
If anyone feels that I should absolutely still have a field in the database, please share your thoughts. I just want to do it the proper way.
Thank you.
The approach could depend a lot on what exactly you're trying to do. Seems like the options you would have is:
File System Only
Benefits would be the speed of accessing static files of an image and use of it in your HTML directly which makes it a more simple solution. Also if you're comfortable with using these functions it will be faster to finish.
Drawbacks would be that you're limited to using file_exists and similar. Any code to manage files this way has to be very specific and static. You also can not search or perform operations efficiently on this. In general relying on the file system alone is not a best practice from my experience.
Database Only
Benefits, you can use Blob type as a column with meta data like owner, uploader, timestamp, etc. in the same row. This makes checking for existing files faster as well as any searching or other operations fast and efficient.
Drawbacks, you can't serve files statically using a CDN or even a cookie-less subdomain or other strategies for page performance. You also have to use PHP and MySQL to generate then serve any images via code rather than just referring to the image file directly.
Hybrid
Benefits, basically the same benefits as both above. You can have your metadata in MySQL with a MD5 hash and location of the file available as well. Your PHP then renders the page with a direct link to the file rather than processing the Blob to an image. You could use this in conjunction with a CDN by prefixing or storing the CDN location as well.
Drawbacks, if you manually changed names of files on the server you'd have to rely on a function matching hashes to detect this, though this would also affect a File System Only that needs to detect a duplicate file potentially.
TLDR; the Hybrid approach is what you'll see most software use like WordPress or others and I believe would be considered a best practice while file system only is a bit of a hack.
Note: Database only could be a best approach in specific situations where you want database clustering and replication of images directly in your database rather than to a file system (especially if the file system is restricted access or unable to be modified for any reason, then you have full flexibility on the DB).
You can also use the blob datatypes from mysql. There you can save the image as binary data next to the data about the football pool.
So when you want to load an football pool you simple fire an sql statement and check if it returns a result, if so load the image from the database and display the data, otherwise throw an error.
If you have very frequent access you can simply put the images into a seperate table and load the image independent of the data about the football pool. Additional set some cache headers on the image and put it in a seperate file, this way you could simply save the primary key of the images in football table. Then you want to display the web page you simply load another document, pass it the primary key of the image, there the image will be loaded, or if the browser has it in cache, will load it from cache without querying the database.
This way you also have a better consistency of data and images.
Your uploading an image to specific folder and that too with poolid which will be unique. It should work just fine.
Problem :
The code you have written works great. But the problem is, for the first time if the image loaded is .png and second time loaded file in jpeg or jpg then file exists wont check that and hence it may fail.
Caution :
If you have already taken a caution to check that the image uploaded must and should be png than the file_exists will work great.
Alternate Solution :
In case if your not checking for the image type to be .png then I highly advice you to take a boolean image column in your table by is_image_uploaded or something which can be set once you upload the file every time.
This makes sure that in case next time you wan to upload the image then you can directly go and check in your database table and see that if is_image_uploaded column is set or not. If not set then upload or else ignore or do whatever you want

Similar images - how to compare them

I have over 1.3milion images that I have to compare with each other, and a few hundreds per day are added.
My company take an image and create a version that can be utilized by our vendors.
The files are often very similar to each other, for example two different companies can send us two different images, a JPG and a GIF, both with the McDonald Logo, with months between the submissions.
What is happening is that at the end we find ourselves creating two different times the same logo when we could simply copy/paste the already created one or at least suggest it as a possible starting point for the artists.
I have looked around for algorithms to create a fingerprint or something that will allow me to do a simple query when a new image is uploaded, time is relatively not an issues, if it takes 1 second to create the fingerprint it will take 150 days to create the fingerprints but it will be a great deal in saving that we might even get 3 or 4 servers to do it.
I am fluent in PHP, but if the algorithm is in pseudocode or even C I can read it and try to translate (unless it uses some C specific libraries)
Currently I am doing an MD5 of all the images to catch the ones that are exactly the same, this question came up when I was thinking to do a resize of the image and run the md5 on the resized image to catch the ones that have been saved in a different format and resized, but then I would still not have a good enough recognition.
If I didn't mention it, I will be happy with something that just suggest possible "similar" images.
EDIT
Keep in mind that the check needs to be done multiple times per minute, so the best solution is one that gives me some values per image that I can store and use in the future to compare with the image that I am looking at without having to re-scan the whole server.
I am reading some pages that mention histograms, or resizing the image to a very small size, strip possible tags and then convert it in grayscale, do the hash of that files and use it for comparison. If I am succesful I will post the code/answer here
Try using file_get_contents and:
http://www.php.net/manual/en/function.hash-file.php
If the hashes match, then you know they are the exact same.
EDIT:
If possible I would think storing the image hashes, and the image path in a database table might help you limit server load. It is much easier to run the hash algorithm once on your initial images and store the hash in a table... Then when new images are submitted you can hash the image and then do a lookup on the database table. If the hash is already there discard it. You can use the hash as the table index and so once you find a match you dont need to check the rest.
The other option is to not use a database...But then you would have to always do a n lookup. That is check hash the incoming image and then run in memory a n time search against all saved images.
EDIT #2:
Please view the solution here: Image comparison - fast algorithm
To speedup the process, sort all the files with size and compare internals only if two sizes are equal. To compare internal data, using hash comparison is also fastest way. Hope this helps.

What is the best way to store files(photo and video) in mysql?

I don't know which way is better to use about uploading and saving a file in my local server.
for example I see someone that INSERT image's link in the mysql field, I'm confused right now...
I want to upload some files and show that in other situation...
what's the best and secure way to perform that?
Store all the images in a folder called photos for example. Then, save an index of the file in your database assigning it an index number and other information. Save the file in the photos folder, renaming it [index_number].jpg, or whatever extension is needed. For example, if I upload the file coolpic.jpg, it will be assigned an index number of 2845. The file itself is saved in photos/2845.jpg.
Saving in Database may make some problems like as DB performance decrease (as result of reading and writing big files), DB crashes (as a result of delete of edits of rows fields), backup problems (because of huge dump file, some problems when table needs to be repaired.
also read file from mySQL will be delivered by Apache again.
I suggest you use of normal path with rewrite mode (virtual url)
Dont use img link.. its not necessary and all it does is just making you DB larger.
You shoud store just "picture.jpg"
and in documents use <img src="images/'.$row['image'].'">
Even better, you can create a function for it (displaying pictures).
Like
function DImage($image)
{
//you can do miracles here like checking images types, if is file and so on, padding, even adding divs and vspaces..
$output = '<img src="imagesfolder/'.$image.'">';
return $output;
}
so latter all you have to do is..
echo DImage($row['image']);
PS: if you ask about $_POST & $_FILE uploading, of course.. it is impossible for you to maintain images, names and updates I'm sure..

What is the best way to upload and store pictures on the site?

I have no idea how the big websites save the pictures on their servers. Could any one tell me how do they save the pictures that are uploaded by the users in their database?
I was thinking, maybe they would just save the file(the picture) in some path and just save that path in the databse is that right?
But I want to do it this way. Is this right? For example, a website named www.photos.com. When a user uploads a picture I would create a folder of the user name and save those pictures in that folder.
I believe we can create a directory using php file concepts. So when a new user uploads his picture or file, I want to create a directory with his name.
Example: if user name is john, I would create a directory like this on photos.com www.photos.com/john/ and then save all his pictures to this directory when he uploads a picture. Is this the right way to do this?
I have no one here that has good knowledge of saving the files to servers so please let me know how to do this? I want to do it the correct and secure way.
All big websites don't save pictures to the database they store them in the disk.
They save a reference to the picture's position in a table. And then link from there.
Why? Performance.
Pulling heavy content from a database is a huge performance bottleneck. And databases don't scale horizontally that well, so it would mean even a bigger problem. All big sites use static content farms to deal with static content such as images. That's servers who won't care less about your identity.
How do they keep the pictures really private you might ask? They don't.
The picture's link is, in itself, the address and the password. Let's take Facebook, for example. If I store a private picture on my account you should not be able to open it. But, as long as you have the correct address you can.
This picture is private. Notice the filename
10400121_87110566301_7482172_n.jpg
(facebook changes the url from time to time so the link may be broken)
It's non sequential. The only way to get the picture is to know it's address.
Based on a previous user photo you can't guess the next one.
It has a huge entropy so even if you start taking random wild guesses you'll have an extensive amount of failures and, if you do get to a picture, you won't be able to, from there, realize the owners identity which, in itself, is protection in anonymity.
Edit (why you should not store images in a "username" folder:
After your edit it became clear that you do intent to put files on disk and not on the database. This edit covers the new scenario.
Even though your logic (create a folder per user) seams more organized it creates problems when you start having many users and many pictures. Imagine that your servers have 1T disk space. And lets also imagine that 1T is more or less accurate with the load the server can handle.
Now you have 11 users, assume they start uploading at the same time and each will upload more than 100GB of files. When they reach 91GB each the server is full and you must start storing images on a different server. If that user/folder structure is followed you would have to select one of the users and migrate all of his data to a different server. Also, it makes a hard-limit on a user who can't upload more than 1T in files.
Should I store all files in the same folder, then?
No, big-sites generally store files in sequential folders (/000001/, /000002/, etc) having an x defined number of files per folder. This is mainly for file-system performance issues.
More on how many files in a directory is too many?
It is usually a bad idea to store images in your database (if your site is popular). Database is, traditionally, one of main bottlenecks in most any application out there. No need to load it more than necessary. If images are in the filesystem, many http servers (nginx, for example) will serve them most efficiently.
The biggest social network in Russia, Vkontakte does exactly this: store images in the filesystem.
Another big social network implemented a sophisticated scalable blob storage. But it's not available to the public, AFAIK.
Summary of this answer: don't store blobs in the database.
is this the right way to do
Yes.
The only thing I'd suggest to use not name but id.
www.photos.com/albums/1234/ would be okay for starter.
Image management may best be achieved by physically uploading images to the server and then recording file location and image details in a database. Subsequently, a Search Form could be configured to permit the user to do a text search, part number search, or other queries. A PHP script could be written to produce a valid HTML image tag based on data found in the table.
uploading images into a MySQLâ„¢ BLOB field is such a bad idea such image data is generally problematic if the images are much larger than thumbnails. If the images are large, you can end up having to copy/paste one SQL INSERT statement at a time (into phpMyAdmin). If the images are large and the SQL INSERT statement is broken into two lines by your text editor, you'll never be able to restore the image.

What is the best way to keep track of images uploaded by an user using PHP

I am planning to do a photo album website, So each user may upload as many number of images. What is the best way to keep track of images for an individual user. What should be the server configuration to handle this part.
-Lokesh
Depending on the amount of images, you will probably want to store them on a static domain. Then, have a table in whatever database you are using to store the paths to each of the images for each user.
Well like many design topics there are lots of different ways to go about it. Two ways that come to mind right now are as follows.
you could simply have a directory created on the server for each user and then have the images each use uploads saved into that directory. Ofcourse you'd want to make sure they didn't over write any existing images with images of the same name. You could do this by warning them about conflicting names or by adding some sort of noce string (like a time stamp) to the end of of the file name. This is a pretty straight forward solution and means that you can login to your server and see all the images each user has uploaded right there for you to do anything you like with.
Another idea would be to save the images in a database. This can be done by serializing the images to a string and storing it in a database. This is nice becaues it means you don't have to worry about handling directories and duplicate file names. You will have to deserialize each image when you want to display it which will put your DB under load so for a very high traffic volume site this might not really be the way to go.
There are ofcourse combinations of these ideas and many others. It really comes down to working out which solution best fits your exact needs.

Categories