Can someone suggest a complex algorithm in php to name files that would be uploaded so that it never repeats? i wonder how youtube which has millions of videos does it??
Right now i use an random number and get its 16 character sha1 hash and name the file with that name but i'm pretty sure it will eventually repeat and generate an error as file will not be able to save in the file system.
something like:
$name = sha1(substr(sha1(md5($randomnumber)),0,10));
somebody once told me that its impossible to break the hash generated by this code or at least it'll take 100 years to break it.
you could do:
$uniq = md5(uniqid(rand(), true));
You could also apped user id of users uploading the file, like:
$uniq = $user_id_of_uploader."_".md5(uniqid(rand(), true));
Generate a GUID (sometimes called UUID) using a pre-existing implementation. GUIDs are unique per computer, timestamp, GUID generated during that timestamp and so on, so they will never repeat.
If making a GUID isn't available, using sha1 on the entire input and using the entire output of it is second best.
$name = 'filename'.$user_id(if_available).md5(microtime(true)).'extension';
Try to remove special characters and white spaces from the file name.
If you are saving name in database then a recursive function can be helpful.
Do below with proper methods.
First slice its extension and filename
Now Trim the filename
Change multiple Space into single space
Replace special character and whitespace into to _
Prefix with current timestamp using strtotime and salt using md5(uniqid(rand(), true)) separated by _ (Thanks to #Sudhir )
Suffix with a special signature using str_pad and limit the text length of a file
Now again add extension and formatted file name
hope it make sense.
Thanks
I usually just generate a string for the filename (implementation is not incredibly important), then check if a file already exists with that name. If so, append a counter to it. If you somehow have a lot of files with the same base filename, this could be inefficient, but assuming your string is unique enough, it shouldn't happen very often. There's also the overhead of checking that the file exists.
$base_name = generate_some_random_string(); // use whatever method you like
$extension = '.jpg'; // Change as necessary
$file_name = $base_name . $extension;
$i = 0;
while (file_exists($file_name)) {
$file_name = $base_name . $i++ . $extension;
}
/* insert code to save the file as $file_name */
I have a feild in a mySQL table for file names, when new files are uploaded a function checks if that filename already exists. How can i change the new filename if the same exists to something like:
filename.jpg, filename_(1).jpg, filename_(2).jpg etc
I get most of how to do it, just not sure howto make the function that renames it know what number is in filename (if any) and what the next number is to change it to.
Cheers
The simplest answer is to generate unique filenames when they're uploaded, using some combination of the time and user. Maybe something like:
$user_id . time() . $user_provided_name . "." . $file_type
If you probe the existence of the files with file_exists() in a loop then you could include this to increase the numeric suffix:
$filename = preg_replace('# (_\( (\d+) \))? (?=\.\w+$) #ex',
'"_(" . ($2+1) . ")"', $filename, 1);
The same can be accomplished with strpos/substr functions. And it probably would look less like gibberish.
You can use a COUNT:
SELECT COUNT(*) AS NumberFiles FROM Files WHERE FileName = '$filename'
Then, on your query result, use the $result[0]['NumberFiles'] to rename your file.
You might also want to LOCK your Files table (or whatever it is).
What's a good way to create a unique name for an image that my user is uploading?
I don't want to have any duplicates so something like MD5($filename) isn't suitable.
Any ideas?
as it was mentioned, i think that best way to create unique file name is to simply add time(). that would be like
$image_name = time()."_".$image_name;
Grab the file extension from uploaded file:
$ext = pathinfo($uploaded_filename, PATHINFO_EXTENSION);
Grab the time to the second: time()
Grab some randomness: md5(microtime())
Convert time to base 36: base_convert (time(), 10, 36) - base 36 compresses a 10 byte string down to about 6 bytes to allow for more of the random string to be used
Send the whole lot out as a 16 char string:
$unique_id = substr( base_convert( time(), 10, 36 ) . md5( microtime() ), 0, 16 ) . $ext;
I doubt that will ever collide - you could even not truncate it if you don't mind very long file names.
If you actually need a filename (it's not entirely clear from your question) I would use tempnam(), which:
Creates a file with a unique filename, with access permission set to 0600, in the specified directory.
...and let PHP do the heavy lifting of working out uniqueness. Note that as well as returning the filename, tempnam() actually creates the file; you can just overwrite it when you drop the image file there.
You could take a hash (e.g., md5, sha) of the image data itself. That would help identify duplicate images too (if it was byte-for-byte, the same). But any sufficiently long string of random characters would work.
You can always rig it up in a way that the file name looks like:
/image/0/1/012345678/original-name.jpg
That way the file name looks normal, but it's still unique.
I'd recommend sha1_file() over md5_file(). It's less prone to collisions.
You could also use hash_file('sha256', $filePath) to get even better results.
http://php.net/manual/en/function.uniqid.php maybe?
You can prefix it with the user id to avoid collisions between 2 users (in less than one millisecond).
For short names:
$i = 0;
while(file_exists($name . '_' . $i)){
$i++;
}
WARNING: this might fail on a multi threaded server if two user upload a image with the same name at the same time.
In that case you should include the md5 of the username.
lol there are around 63340000000000000000000000000000000000000000000000 possibility's that md5 can produce
plus you could use just tobe on the safe side
$newfilename = md5(time().'image');
if(file_exists('./images/'.$newfilename)){
$newfilename = md5(time().$newfilename);
}
//uploadimage
How big is the probablity of two users uploading image with same name on same microsecond ?
try
$currTime = microtime(true);
$finalFileName = cleanTheInput($fileName)."_".$currTime;
// you can also append a _.rand(0,1000) in the end to have more foolproof name collision
function cleanTheInput($input)
{
// do some formatting here ...
}
This would also help you in tracking the upload time of the file for analysis. or may be sort the files,manage the files.
For good performance and uniqueness you can use approach like this:
files will be stored on a server with names like md5_file($file).jpg
the directory to store file in define from md5 file name, by stripping first two chars (first level), and second two (second level) like that:
uploaded_files\ 30 \ c5 \ 30 c5 67139b64ee14c80cc5f5006d8081.pdf
create record in database with file_id, original file name, uploaded user id, and path to file on server
on server side create script that'll get role of download providing - it'll get file by id from db, and output its content with original filename provided by user (see php example of codeigniter download_helper ). So url to file will look like that:
http://site.com/download.php?file=id
Pros:
minified collisions threat
good performance at file lookup (not much files in 1 directory, not much directories at the same level)
original file names are saved
you can adjust access to files by server side script (check session or cookies)
Cons:
Good for small filesizes, because before user can download file, server have to read this file in memory
try this file format:
$filename = microtime(true) . $username . '.jpg';
I think it would be good for you.
<?php
$name=uniqid(mt_rand()).$image_name;
?>
You should try to meet two goals: Uniqueness, and usefulness.
Using a GUID guarantees uniqueness, but one day the files may become detached from their original source, and then you will be in trouble.
My typical solution is to embed crucial information into the filename, such as the userID (if it belongs to a user) or the date and time uploaded (if this is significant), or the filename used when uploading it.
This may really save your skin one day, when the information embedded in the filename allows you to, for example, recover from a bug, or the accidental deletion of records. If all you have is GUIDs, and you lose the catalogue, you will have a heck of a job cleaning that up.
For example, if a file "My Holiday: Florida 23.jpg" is uploaded, by userID 98765, on 2013/04/04 at 12:51:23 I would name it something like this, adding a random string ad8a7dsf9:
20130404125123-ad8a7dsf9-98765-my-holiday-florida-23.jpg
Uniqueness is ensured by the date and time, and random string (provided it is properly random from /dev/urandom or CryptGenRandom.
If the file is ever detached, you can identify the user, the date and time, and the title.
Everything is folded to lower case and anything non-alphanumeric is removed and replaced by dashes, which makes the filename easy to handle using simple tools (e.g. no spaces which can confuse badly written scripts, no colons or other characters which are forbidden on some filesystems, and so on).
Something like this could work for you:
while (file_exists('/uploads/' . $filename . '.jpeg')) {
$filename .= rand(10, 99);
}
Ready-to-use code:
$file_ext = substr($file['name'], -4); // e.g.'.jpg', '.gif', '.png', 'jpeg' (note the missing leading point in 'jpeg')
$new_name = sha1($file['name'] . uniqid('',true)); // this will generate a 40-character-long random name
$new_name .= ((substr($file_ext, 0, 1) != '.') ? ".{$file_ext}" : $file_ext); //the original extension is appended (accounting for the point, see comment above)
The issue was saving file uploads locally, and trying to find a nice way to handle duplicate file names.
This algorithm is not scalable. Uploading n files with the same name will cause O(n) behavior in this algorithm, leading to O(n²) total running time, including O(n²) filesystem accesses. That's not pretty for a server app. It also can't be fixed because of how filesystems work.
Better solutions:
Store filenames that have already been used in a DB table, mapping them to their use count.
Put a high-granularity timestamp in the filename.
Use the SHA1 (or MD5) hash of the contents as the filename. This also prevents duplicate files being uploaded, if that's important.
Use a database to map filenames back to human-readable names, if necessary.
Best solution is just attach Time Stamp in form of YYYYDDMMHHMMSS , You won't get conflicts throughout your whole life ;)
Also its Time complexity is very less.
Another thing you can do .. you might skip name check directly and instead with file's name ex.
"1.jpg" if you are uploading
just attach 1(timestamp).jpg , so that you don't even need to iterate through file system. hope it helps
ex. in PHP
$timestamp=date("YmdGis");
it will generate something like
20111122193631
;)
I've made my own solution. Here it is:
function recursive_increment_filename ($path, $filename)
{
$test = "{$path}/{$filename}";
if (!is_file($test)) return $test;
$file_info = pathinfo($filename);
$part_filename = $file_info['filename'];
if (preg_match ('/(.*)_(\d+)$/', $part_filename, $matches))
{
$num = (int)$matches[2] +1;
$part_filename = $matches[1];
}
else
{
$num = 1;
}
$filename = $part_filename.'_'.$num;
if (array_key_exists('extension', $file_info))
{
$filename .= '.'.$file_info['extension'];
}
return recursive_increment_filename($path, $filename);
}
$url = realpath(dirname(__FILE__));
$file = 'test.html';
$fn = recursive_increment_filename($url, $file);
echo $fn;
I am working on a php site in which we have to upload images from users.i have to rename that file for preventing conflicts in the name of the image.
uniqid(rand(), true);
and adding a large random number after it.
Will this work perfectly. Any suggestions..??
Its about generation unique names for the image.....
Function tempnam() creates a file with a unique name.
Take an md5 of the file and use that. IIRC, the odds of a collision are 1 in 64M. If that's not enough, prefix it with the timestamp expressed in seconds or milliseconds. That way even if a duplicate md5 is generated, the files would have to come in during the same second/millisecond for a collision.
You can use Base36 on the AutoIncrement value from a SQL Table (hoping that you do use a SQL table).
$filename = base_convert($last_insert_id, 10, 36);
You have two approaches depending on "how" big can be your image library:
1. for a non-big amount of files I do this
<?php
$file = sanitize_file($file); // remove all no [az-09_] characters for safe url linking;
$file_md5 = $md5($file);
$file_extention = $md5($file);
// since I assume the file should belongs to someone you can do this
$file_name = $user_id . $file_md5 . $file_extension;
// then save the file
?>
option.... CacheMogul. Here you need to use your imagination. but for huge amount of files this does a nice sharding so you dont need to worry about a folder max quantity or size