algorithm to name files with no probability of repetition - php

Can someone suggest a complex algorithm in php to name files that would be uploaded so that it never repeats? i wonder how youtube which has millions of videos does it??
Right now i use an random number and get its 16 character sha1 hash and name the file with that name but i'm pretty sure it will eventually repeat and generate an error as file will not be able to save in the file system.
something like:
$name = sha1(substr(sha1(md5($randomnumber)),0,10));
somebody once told me that its impossible to break the hash generated by this code or at least it'll take 100 years to break it.

you could do:
$uniq = md5(uniqid(rand(), true));
You could also apped user id of users uploading the file, like:
$uniq = $user_id_of_uploader."_".md5(uniqid(rand(), true));

Generate a GUID (sometimes called UUID) using a pre-existing implementation. GUIDs are unique per computer, timestamp, GUID generated during that timestamp and so on, so they will never repeat.
If making a GUID isn't available, using sha1 on the entire input and using the entire output of it is second best.

$name = 'filename'.$user_id(if_available).md5(microtime(true)).'extension';
Try to remove special characters and white spaces from the file name.
If you are saving name in database then a recursive function can be helpful.

Do below with proper methods.
First slice its extension and filename
Now Trim the filename
Change multiple Space into single space
Replace special character and whitespace into to _
Prefix with current timestamp using strtotime and salt using md5(uniqid(rand(), true)) separated by _ (Thanks to #Sudhir )
Suffix with a special signature using str_pad and limit the text length of a file
Now again add extension and formatted file name
hope it make sense.
Thanks

I usually just generate a string for the filename (implementation is not incredibly important), then check if a file already exists with that name. If so, append a counter to it. If you somehow have a lot of files with the same base filename, this could be inefficient, but assuming your string is unique enough, it shouldn't happen very often. There's also the overhead of checking that the file exists.
$base_name = generate_some_random_string(); // use whatever method you like
$extension = '.jpg'; // Change as necessary
$file_name = $base_name . $extension;
$i = 0;
while (file_exists($file_name)) {
$file_name = $base_name . $i++ . $extension;
}
/* insert code to save the file as $file_name */

Related

How to make a name of file in such way 31f3207.jpeg [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Unique key generation
How to automaticly make a new name of file when you upload it to server?
I upload a picture file with < input type="file" / >. File has name picture_1.jpg. But I'd like to store it in filesystem with name like this ec0b4c5173809b6aa534631f3207.jpg? How such new names are created? Is some special script/generator used to make such names?
For example picture in FB: http:// a3.sphotos.ak.fbcdn.net/hphotos-ak-snc6/190074_10150167009837952_8062627951_8311631_4439729_n.jpg.
The name of it is 190074_10150167009837952_8062627951_8311631_4439729_n.jpg. But original name was different for sure. So I'd like to change the name of uploaded file the same way. How is it possible?
In PHP I use uniqid() for naming files that I store from uploads
If you want an extra long or seemingly more random id, you can sha1() or md5() the uniqid to create a hash.
You could of course also use those two methods to create a hash of the filename.
For example, the following code can be used to generate a new name for a file
$file = 'somefile.jpg';
$new_name = uniqid();
$new_name .= '.'.pathinfo($file,PATHINFO_EXTENSION);
You didn't mention what programming language you are using.
in PHP, that cryptic name and original filename can be found on array $_FILES.
let's assume your form's element name is userfile, you can get that cryptic name from basename($_FILES['userfile']['tmp_name']) and the original name from basename($_FILES['userfile']['name'])
Visit here for more information :
http://www.php.net/manual/en/features.file-upload.post-method.php
You could MD5 the name to create a string without spaces or dots: md5($filename) (http://php.net/md5) although I'm not sure that was what you were asking?
Edit: you could also use uniqid() (http://php.net/manual/en/function.uniqid.php)
You just rewrite the name of the file upon moving it from the temporary files on the server, to the location you want to move it to.
move_uploaded_file($_FILES['userfile']['tmp_name'], $filePN)
Where &filePN is the path and name of where you want to move the file to.
the special script that you're talking about, however, can be a multitude of things from an MD5 hash of the input name, to an incremented number to prevent overwrites.
Looks like a GUID.
If you are an ASP.NET programmer, use something like this:
string filename = new Guid().ToString() + ".jpg";

File naming convention and allowed characters between different OS

I'm wring a piece of code in PHP for saving email attachments. Can i assume that this will never fail because different allowed characters between OS?
foreach($message->attachments as $a)
{
// Make dir if not exists
$dir = __DIR__ . "/saved/$uid"; // Message id
if (!file_exists($dir)) mkdir($dir) or die("Cannot create: $dir");
// Save the attachment using original!!! filename as found in email
$fp = fopen($dir . '/' . $a->filename, 'w+');
fwrite($fp, $a->data);
fclose($fp);
}
You should never use a name that you have no control over, it can contain all sorts of characters, like ../../...
You can use a function like basename to clean it up and a constant like DIRECTORY_SEPARATOR to separate directories.
Personally I would rename the file but you can also filter the variables before using them.
it is good practice to replace certain characters that may occur in filenames on windows.
Unix can handle almost any character in a file name (but not "/" and 0x00 [the null Character]), but to prevent encoding problems and difficulties on downloading a file I would suggest to replace anything that does not match
/[A-Za-Z0-9_-\.]/g, which satisfies the POSIX fully portable filename format.
so a preg_replace("/[^A-Za-Z0-9_-\.]/g","_",$filename); will do a good job.
a more generous approach would be to replace only |\?*<":>+[]\x00/ which leaves special language characters like öäü untouched and is compatible with FAT32, NTFS, any Unix and Mac OS X.
in that case use preg_replace("/[\|\\\?\*<\":>\+\[\]\/]\x00/g","_",$filename);
NO, you should assume that this will have a high probability of failing. for 2 reasons:
what if 2 emails have files named the same (selfie.jpg, for example)?
what if filename contains unacceptable characters?
you should use an internal naming convention (user+datetime+sequential, for example) and save names in a MySQL table with at least 3 fields:
Id - Autonumbered
filename - as saved by your php
original name - as in the original email
optional username - or usercode or email address of whomever sent email
optional datetime stamp
save the original filename as a VARCHAR and you will be able to keep track of original name and even show it, search for it, etc.

Create Unique Image Names

What's a good way to create a unique name for an image that my user is uploading?
I don't want to have any duplicates so something like MD5($filename) isn't suitable.
Any ideas?
as it was mentioned, i think that best way to create unique file name is to simply add time(). that would be like
$image_name = time()."_".$image_name;
Grab the file extension from uploaded file:
$ext = pathinfo($uploaded_filename, PATHINFO_EXTENSION);
Grab the time to the second: time()
Grab some randomness: md5(microtime())
Convert time to base 36: base_convert (time(), 10, 36) - base 36 compresses a 10 byte string down to about 6 bytes to allow for more of the random string to be used
Send the whole lot out as a 16 char string:
$unique_id = substr( base_convert( time(), 10, 36 ) . md5( microtime() ), 0, 16 ) . $ext;
I doubt that will ever collide - you could even not truncate it if you don't mind very long file names.
If you actually need a filename (it's not entirely clear from your question) I would use tempnam(), which:
Creates a file with a unique filename, with access permission set to 0600, in the specified directory.
...and let PHP do the heavy lifting of working out uniqueness. Note that as well as returning the filename, tempnam() actually creates the file; you can just overwrite it when you drop the image file there.
You could take a hash (e.g., md5, sha) of the image data itself. That would help identify duplicate images too (if it was byte-for-byte, the same). But any sufficiently long string of random characters would work.
You can always rig it up in a way that the file name looks like:
/image/0/1/012345678/original-name.jpg
That way the file name looks normal, but it's still unique.
I'd recommend sha1_file() over md5_file(). It's less prone to collisions.
You could also use hash_file('sha256', $filePath) to get even better results.
http://php.net/manual/en/function.uniqid.php maybe?
You can prefix it with the user id to avoid collisions between 2 users (in less than one millisecond).
For short names:
$i = 0;
while(file_exists($name . '_' . $i)){
$i++;
}
WARNING: this might fail on a multi threaded server if two user upload a image with the same name at the same time.
In that case you should include the md5 of the username.
lol there are around 63340000000000000000000000000000000000000000000000 possibility's that md5 can produce
plus you could use just tobe on the safe side
$newfilename = md5(time().'image');
if(file_exists('./images/'.$newfilename)){
$newfilename = md5(time().$newfilename);
}
//uploadimage
How big is the probablity of two users uploading image with same name on same microsecond ?
try
$currTime = microtime(true);
$finalFileName = cleanTheInput($fileName)."_".$currTime;
// you can also append a _.rand(0,1000) in the end to have more foolproof name collision
function cleanTheInput($input)
{
// do some formatting here ...
}
This would also help you in tracking the upload time of the file for analysis. or may be sort the files,manage the files.
For good performance and uniqueness you can use approach like this:
files will be stored on a server with names like md5_file($file).jpg
the directory to store file in define from md5 file name, by stripping first two chars (first level), and second two (second level) like that:
uploaded_files\ 30 \ c5 \ 30 c5 67139b64ee14c80cc5f5006d8081.pdf
create record in database with file_id, original file name, uploaded user id, and path to file on server
on server side create script that'll get role of download providing - it'll get file by id from db, and output its content with original filename provided by user (see php example of codeigniter download_helper ). So url to file will look like that:
http://site.com/download.php?file=id
Pros:
minified collisions threat
good performance at file lookup (not much files in 1 directory, not much directories at the same level)
original file names are saved
you can adjust access to files by server side script (check session or cookies)
Cons:
Good for small filesizes, because before user can download file, server have to read this file in memory
try this file format:
$filename = microtime(true) . $username . '.jpg';
I think it would be good for you.
<?php
$name=uniqid(mt_rand()).$image_name;
?>
You should try to meet two goals: Uniqueness, and usefulness.
Using a GUID guarantees uniqueness, but one day the files may become detached from their original source, and then you will be in trouble.
My typical solution is to embed crucial information into the filename, such as the userID (if it belongs to a user) or the date and time uploaded (if this is significant), or the filename used when uploading it.
This may really save your skin one day, when the information embedded in the filename allows you to, for example, recover from a bug, or the accidental deletion of records. If all you have is GUIDs, and you lose the catalogue, you will have a heck of a job cleaning that up.
For example, if a file "My Holiday: Florida 23.jpg" is uploaded, by userID 98765, on 2013/04/04 at 12:51:23 I would name it something like this, adding a random string ad8a7dsf9:
20130404125123-ad8a7dsf9-98765-my-holiday-florida-23.jpg
Uniqueness is ensured by the date and time, and random string (provided it is properly random from /dev/urandom or CryptGenRandom.
If the file is ever detached, you can identify the user, the date and time, and the title.
Everything is folded to lower case and anything non-alphanumeric is removed and replaced by dashes, which makes the filename easy to handle using simple tools (e.g. no spaces which can confuse badly written scripts, no colons or other characters which are forbidden on some filesystems, and so on).
Something like this could work for you:
while (file_exists('/uploads/' . $filename . '.jpeg')) {
$filename .= rand(10, 99);
}
Ready-to-use code:
$file_ext = substr($file['name'], -4); // e.g.'.jpg', '.gif', '.png', 'jpeg' (note the missing leading point in 'jpeg')
$new_name = sha1($file['name'] . uniqid('',true)); // this will generate a 40-character-long random name
$new_name .= ((substr($file_ext, 0, 1) != '.') ? ".{$file_ext}" : $file_ext); //the original extension is appended (accounting for the point, see comment above)

Duplicate filename protection by incrementation

The issue was saving file uploads locally, and trying to find a nice way to handle duplicate file names.
This algorithm is not scalable. Uploading n files with the same name will cause O(n) behavior in this algorithm, leading to O(n²) total running time, including O(n²) filesystem accesses. That's not pretty for a server app. It also can't be fixed because of how filesystems work.
Better solutions:
Store filenames that have already been used in a DB table, mapping them to their use count.
Put a high-granularity timestamp in the filename.
Use the SHA1 (or MD5) hash of the contents as the filename. This also prevents duplicate files being uploaded, if that's important.
Use a database to map filenames back to human-readable names, if necessary.
Best solution is just attach Time Stamp in form of YYYYDDMMHHMMSS , You won't get conflicts throughout your whole life ;)
Also its Time complexity is very less.
Another thing you can do .. you might skip name check directly and instead with file's name ex.
"1.jpg" if you are uploading
just attach 1(timestamp).jpg , so that you don't even need to iterate through file system. hope it helps
ex. in PHP
$timestamp=date("YmdGis");
it will generate something like
20111122193631
;)
I've made my own solution. Here it is:
function recursive_increment_filename ($path, $filename)
{
$test = "{$path}/{$filename}";
if (!is_file($test)) return $test;
$file_info = pathinfo($filename);
$part_filename = $file_info['filename'];
if (preg_match ('/(.*)_(\d+)$/', $part_filename, $matches))
{
$num = (int)$matches[2] +1;
$part_filename = $matches[1];
}
else
{
$num = 1;
}
$filename = $part_filename.'_'.$num;
if (array_key_exists('extension', $file_info))
{
$filename .= '.'.$file_info['extension'];
}
return recursive_increment_filename($path, $filename);
}
$url = realpath(dirname(__FILE__));
$file = 'test.html';
$fn = recursive_increment_filename($url, $file);
echo $fn;

Generate unique names?

I am working on a php site in which we have to upload images from users.i have to rename that file for preventing conflicts in the name of the image.
uniqid(rand(), true);
and adding a large random number after it.
Will this work perfectly. Any suggestions..??
Its about generation unique names for the image.....
Function tempnam() creates a file with a unique name.
Take an md5 of the file and use that. IIRC, the odds of a collision are 1 in 64M. If that's not enough, prefix it with the timestamp expressed in seconds or milliseconds. That way even if a duplicate md5 is generated, the files would have to come in during the same second/millisecond for a collision.
You can use Base36 on the AutoIncrement value from a SQL Table (hoping that you do use a SQL table).
$filename = base_convert($last_insert_id, 10, 36);
You have two approaches depending on "how" big can be your image library:
1. for a non-big amount of files I do this
<?php
$file = sanitize_file($file); // remove all no [az-09_] characters for safe url linking;
$file_md5 = $md5($file);
$file_extention = $md5($file);
// since I assume the file should belongs to someone you can do this
$file_name = $user_id . $file_md5 . $file_extension;
// then save the file
?>
option.... CacheMogul. Here you need to use your imagination. but for huge amount of files this does a nice sharding so you dont need to worry about a folder max quantity or size

Categories