Im working on a image hosting website, and to prevent "already exists" error, i md5 the images that are being uploaded, the problem is that the URL to that website is already quite long plus the whole MD5 hash makes it even longer, is there any way to make the URL shorter?
It's not necessary to use the md5 string as your image filename. To ensure the uniqueness of the images, you can try the following solution:
md5() every new image a user uploads
Store the md5() value in a database
Next time a user uploads an image, check if the item already exists in your database
If exists, prevent user from uploading the image. Else, proceed.
Repeat
you can keep a id to the hash value mapping with you on the image hosting server. You can store this mapping in redis or mysql as both are persistant databases.
you can use name of image and time when uploaded to make it unique but shorter. use like this
$img_name = $uploaded_name.time().$file_ext;
Hence the name will be shorter but unique.
Just use the unix timestamp to ensure a new and unique file name all the time and keep is length shorter as well.
Related
I'm trying to allow users to upload files through a PHP website. Since all the files are saved in a single folder on the server, it's conceivable (though admittedly with low probability) that two distinct users could upload two files that, while different, are named exactly the same. Or perhaps they're exactly the same file.
In the both cases, I'd like to use exec("openssl md5 " . $file['upload']['tmp_name']) to determine the MD5 hash of the file immediately after it is uploaded. Then I'll check the database for any identical MD5 hash and, if found, I simply won't complete the upload.
However, in the move_uploaded_file documentation, I found this comment:
Warning: If you save a md5_file hash in a database to keep record of uploaded files, which is usefull to prevent users from uploading the same file twice, be aware that after using move_uploaded_file the md5_file hash changes! And you are unable to find the corresponding hash and delete it in the database, when a file is deleted.
Is this really the case? Does the MD5 hash of a file in the tmp directory change after moving it to a permanent location? I don't understand why it would. And regardless, is there another, better way of ensuring the same file is not uploaded to the filesystem multiple times?
If you're convinced by all the reasons given here in the answers and decide not to use md5 at all (I'm still not sure whether you WANT to or MUST use hash), you can just append something unique for each user and the time of uploading to each file name. That way you'll end up with more readable file names. Something like: $filename = "$filename-$user_ip_string-$microtime";. Of course, you must have all three variables ready and formatted before that, it goes without saying.
No chance of the same file name, same IP address and same microtime occuring at the same time, right? You could easily get away with microtime only, but IP will make it even more certain.
Of course, like I said, all this goes if you decide not to use hashing and go for a simpler solution.
Shouldn't you use exec("openssl md5 " . $file['upload']['name']) name instead? I'm thinking that the temporary name differs from upload to upload.
It would seem that it indeed is the case. I have shortly been looking through the docs aswell. But why dont you share the md5 checksum before using move_uploaded_file and store that value in your database linking it directly with the new file? That was you can always check the uploaded file and whether that file already exists in your filesystem.
This does require a database, but most have access to one.
Try renaming the uploaded file to a unique id.
Use this:
$dest_filename = $filename;
if (RENAME_FILE) {
$dest_filename = md5(uniqid(rand(), true)) . '.' . $file_ext;
}
Let me know if it helps :)
No, in general the hash doesn't change by move_uploaded_file somehow magically.
But, if you compute the md5() including the file's path, the hash will certainly change if the file is move to a new path/folder.
In case you md5() the filename, nothing will change.
It's a good idea to rename uploaded files with a unique name.
But don't forget to locate the file to finally store the file, is outside of your document root folder of your vHost. Located there, it can't be downloaded without using a PHP-script.
Final remark: While it's very very unlikely, md5 hashed of two different files may be identical.
I have written a script to upload an image to a particular portion of my site.
What kind of check do I need to do to detect if a duplicate entry is trying to be uploaded through the form?
Example:
One user submits firefoxlogo.jpg.
Another user a few days later tries submitting firefox.jpg which is the same image, just renamed.
...the same image...
The same as "the binary data is identical" or "the image looks similar"? In the first case, you can calculate the hash of a file using sha1_file (for SHA1 hashes). You should never rely on the filename for checking whether files are unique or not. E.g. one user could upload "firefox.png" containing the browser's logo and someone else a screenshot of it. The hash has a fixed length (40 for SHA1) which is another advantage over using filenames.
Each time a user uploads a file you could keep a record of it's sha1 hash (using sha1_file) in your database. When you get a file upload, grab the hash of the new file while it's still in temporary storage, then query your database for an entry with the same hash. If none exists, you can continue to upload the file.
see this too http://php.net/exif , not totally secure to avoid duplicates, but a faster solution to sha1_file
hope this helps
I would like to create an upload form with php. The problem is that it will be used to upload a fixed row length text file that would contain orders. (full order details would be duplicated for each row).
Then it should place the file somewhere and call a program that will read the file and place the orders. The problem is that i want to prevent the same order file to be sent to the order program.
The file wont have any unique identifier. I am wondering which is the best way to check that the file isnt the same. One solution is to calculate MD5 for each file and store them but i am not sure about concurency and whether this would work and how many files to compare with.
THe only solution i can figure out it to store some max(20) for example to a file and use flock() for this file to avoid concurency problems. Like program A checks if file exists via md5,program B checks if file exists via md5 (they may from a a non updated thats why i think i should use exclusive lock....
Any other solution ?
Store the MD5 hash (or SHA1) and size of the file in the database. Index the hash.
To check for duplicates, just search in the database for a file with the same hash and size.
I want to let my users upload an avatar on their profile. My first idea was to name the avatar file like this: [user_id].jpg. So even if a user updates its avatar, it keeps the same name.
The problem with that is that if I use caching on the server (or even if it's used on the client) the new avatar won't show up.
My new solution is to name the file like this:
[user_id]_[random_number].jpg
and store the random number in the Users table. How would you generate this number in the most efficient way? Or maybe there is a better solution?
You should be able to invalidate the cache when the user uploads a new avatar.
If this is not possible you could just store it as [uid]_[YYYYMMDDhhmmss].jpg or something. No need to generate anything random...
I would do something like:
$avatarName = $userId . uniqid();
// add extension if needed, store it
It will be fast and do what you want. uniqid()
EDIT
As suggested by other users, you should drop the userId from the image name. Having a public userId may lead to problems in the future.
Also, uniqid() alone should work.
$avatarName = uniqid();
// add extension if needed, store it
Have you thought about configuring ETags in your .htaccess?
See:
http://developer.yahoo.com/blogs/ydn/posts/2007/07/high_performanc_11/
Though you can change filenames, you will need to manage the cleanup and pointing operations (remove/rename the old file, tell your app the new file). If you are happy to do this, you can simply append the users id with the unix timestamp at the point of upload, its unlikely they will be able to upload the same file to the same second. If you want to make it even more unique, append a random number/uniquid.
With a random number there is a miniscule (negligible) chance of a collision, but why not start at 1 and just increase the number each time since you are storing this number.
I'm generating a unique filename for uploaded files with the following code
$date = date( 'U' );
$user = $_SERVER[REMOTE_ADDR];
$filename = md5($date.$user);
The problem is that I want to use this filename again later on in the script, but if the script takes a second to run, I'm going to get a different filename the second time I try to use this variable.
For instance, I'm using an upload/resize/save image upload script. The first operation of the script is to copy and save the resized image, which I use a date function to assign a unique name to. Then the script processses the save and saves the whole upload, and assigns it a name. At the end of the script ($thumb and $full are the variables), I need to insert into a MySQL database, the filenames I used when i saved the uploads.
Problem is, sometimes on large images it takes more than a second (or during the process, the seconds change) resulting in a different filename being put into the database than is what the file is actually saved under.
Is this just not a good idea to use this method of naming?
AFAIK it's a great way to name the files, although I would check file_exists() and maybe tack on a random number.
You need to store that filename in a variable and reference it again later, instead of relying on the algorithm each time. This could be stored in the user $_SESSION, a cookie, a GET variable, etc between pageloads.
Hope that helps
Just want add that php has a function to create identifiers: uniqid. You can also prefix the identifier with a string (date maybe?).
Always validate your user's input, and the server headers!
I would recommend storing the file name in the session (as per AI). If you store it in one of the other variables, it is more likely for the end user to be able to attack the system through it. MD5 of user concatenated with rand() would be a nice way to get a long list of unique values. Just using rand() would probably have a higher percentage of conflicts.
I am not sure about the process that you are following for uploading files, but another way to handle file uploads is with PHP's built in handlers. You can upload the file and then use the "secure" methods for pulling uploaded files out of the temporary space. (the temporary space in this instance can be safely located outside of the open base dir directive to prevent tampering). is_uploaded_file() and move_uploaded_file() from: http://php.net/manual/en/features.file-upload.post-method.php example 2 might handle the problem you are encountering.
Definitely check for an existing file in that location if you are choosing a filename on the fly. If user input is allowed in any way shape or form, validate and filter the argument to make sure it is safe. Also, if the storage folder is web accessible, make sure you munge the name and probably the extension as well. You do not want someone to be able to upload code and then be able to execute it. That officially leads to BAD activities.
I just discovered that PHP has a built-in function for this, called tempnam. It even avoids race conditions. See http://php.net/manual/en/function.tempnam.php.
Why not to use
$filename = md5(rand());
This will be pretty much unique in every case. And if you find that $filename already exists you can just call it again.
Not a good idea using ID dependent on time – if you upload two images at the same time, the later one can overwrite the earlier. You should look at function such as uniqid(). However, if this upload/resize/save script is meant to be "single-user", then this is not such a big problem.
To the problem itself. If I were you, I would just save the computed filename to some variable a use the variable from that point. Computing already computed is waste of time. And when uploading some really big images, or more images at once, script can take even 20 seconds. You cannot depend on fact that you'll make everything you want in one second.