Duplicate filename protection by incrementation

Duplicate filename protection by incrementation - php

The issue was saving file uploads locally, and trying to find a nice way to handle duplicate file names.

This algorithm is not scalable. Uploading n files with the same name will cause O(n) behavior in this algorithm, leading to O(n²) total running time, including O(n²) filesystem accesses. That's not pretty for a server app. It also can't be fixed because of how filesystems work.
Better solutions:
Store filenames that have already been used in a DB table, mapping them to their use count.
Put a high-granularity timestamp in the filename.
Use the SHA1 (or MD5) hash of the contents as the filename. This also prevents duplicate files being uploaded, if that's important.
Use a database to map filenames back to human-readable names, if necessary.

Best solution is just attach Time Stamp in form of YYYYDDMMHHMMSS , You won't get conflicts throughout your whole life ;)
Also its Time complexity is very less.
Another thing you can do .. you might skip name check directly and instead with file's name ex.
"1.jpg" if you are uploading
just attach 1(timestamp).jpg , so that you don't even need to iterate through file system. hope it helps
ex. in PHP
$timestamp=date("YmdGis");
it will generate something like
20111122193631
;)

I've made my own solution. Here it is:
function recursive_increment_filename ($path, $filename)
{
$test = "{$path}/{$filename}";
if (!is_file($test)) return $test;
$file_info = pathinfo($filename);
$part_filename = $file_info['filename'];
if (preg_match ('/(.*)_(\d+)$/', $part_filename, $matches))
{
$num = (int)$matches[2] +1;
$part_filename = $matches[1];
}
else
{
$num = 1;
}
$filename = $part_filename.'_'.$num;
if (array_key_exists('extension', $file_info))
{
$filename .= '.'.$file_info['extension'];
}
return recursive_increment_filename($path, $filename);
}
$url = realpath(dirname(__FILE__));
$file = 'test.html';
$fn = recursive_increment_filename($url, $file);
echo $fn;

Related

Case Insensitive file_exists for large list of files

I have a server folder with a large number of files, randomly named with a guid value (example file: c3c1a48e-a798-41bd-bd70-66ffdc619963.jpg ).
I need to do a case-insensitive search of that folder, as there might be an uppercase (or mixed case) version of the same filename. (I cannot convert existing files to all lowercase file names.)
The answers in this question
PHP Case Insensitive Version of file_exists() provide a function (shown below) that will 'glob' the entire folder into an array, then does a foreach search each item in the array.
This would seem to be a bit slow/inefficient, especially when searching a folder with many (thousands) of files.
Is there a more efficient way to do a case-insensitive filename search? Or is the use of the foreach loop - as shown in the below code - 'efficient enough'?
(This is the code recommended by the above question)
function fileExists($fileName, $caseSensitive = true) {
if(file_exists($fileName)) {
return $fileName;
}
if($caseSensitive) return false;
// Handle case insensitive requests
$directoryName = dirname($fileName);
$fileArray = glob($directoryName . '/*', GLOB_NOSORT);
$fileNameLowerCase = strtolower($fileName);
foreach($fileArray as $file) {
if(strtolower($file) == $fileNameLowerCase) {
return $file;
}
}
return false;
}

I cannot comment, though this can be an answer to your question: No. As of right now in your current state, it looks like you have to use that logic. -However- you could create logic to take those files that have capital letters and use copy($filename,strtolower($filename)) in the folder to make them lower case and then remove the old filenames that have capital letters. . Then in the future upon adding more files to the many, strtolower($new_file_name) before adding the file to the system. I agree with you though that that logic does seem slow, especially with thousands of files.
It contradicts you saying you cannot rename /convert the file names, though once you do it, that will be the only time you would have to rename them.

Variable + Wildcard image removal

I want to create a piece of code that will remove images based on set parameters and a wildcard.
The number of images and naming will vary, though the first two parameters will remain as a constant.
// Example of images to be deleted / removed.
/images/1-50-variablename-A.jpg
/images/1-50-variablename-B.jpg
/images/1-50-variablename-C.jpg
/images/1-50-variablename-D.jpg
/images/1-50-variablename-E.jpg
Essentially I am after a loop to make this happen though I am not too sure on the best logic to make this happen.
$menuid = "1";
$imageid = "50"
$fileName = "images/".$menuid."-".$imageid."-*.jpg";
if (file_exists ($fileName)) {
unlink ($fileName);
}

You can use the php glob function (http://php.net/manual/fr/function.glob.php). Feed it with your pattern (it supports wildcards) and then iterate over the result and unlink each file.
Hope it helped

Solution presented itself in Glob form.
$menuid ="9999";
$imageid="5";
array_map('unlink', glob("../images/".$menuid."-".$imageid."-*.jpg"));

how to avoid trouble caused by duplicate picture name in php

All images are uploaded into image folder under my php project while reference goes into mysql table, but my confusion is, what if there are two images have the same name, are there better way to avoid duplicate naming happen? i know i cant control how will user naming their image file.

I usually do a combination of timestamp and a big random value (just in case):
So for example:
$filename = time() . rand(1000000,9999999) . strtolower($ext);
Where $ext is the extension (whether it's jpg, png or whatever).
This is also more secure than accepting filenames from user.
And the reason for strtolower, is because sometimes someone will upload something like IMAGE.JPG, so rather on counting that your server and all your scripts will be case insensitive, you can simply make sure that all extensions are in lowercase.

Like VMai said before. Primary key is a good solution. But if You just wan`t to know solution with same name problem:
$filename = 'myfilename'; // without extension!
$extension = '.jpg';
$dir = '/directory/';
$fullPath = $dir.$filename.$extension;
$i=1;
$newFilename = filename;
while(file_exists($fullPath))
{
$newFilename = $filename.'_'.$i;
$i++;
$fullPath = $dir.$newFilename.$extension;
}
Not tested, but You got the concept

For security and yes to avoid duplicate you could change the filename to your own format,
example. formats
$filename = sha1(time().$original_filename) or $filename = md5(time().$originalfilename) it is up to you.
One advantage of changing the filename is that if an attacker uploads something, you are sure he will not find it because of a different name,other than the already security validations you have provided.

algorithm to name files with no probability of repetition

Can someone suggest a complex algorithm in php to name files that would be uploaded so that it never repeats? i wonder how youtube which has millions of videos does it??
Right now i use an random number and get its 16 character sha1 hash and name the file with that name but i'm pretty sure it will eventually repeat and generate an error as file will not be able to save in the file system.
something like:
$name = sha1(substr(sha1(md5($randomnumber)),0,10));
somebody once told me that its impossible to break the hash generated by this code or at least it'll take 100 years to break it.

you could do:
$uniq = md5(uniqid(rand(), true));
You could also apped user id of users uploading the file, like:
$uniq = $user_id_of_uploader."_".md5(uniqid(rand(), true));

Generate a GUID (sometimes called UUID) using a pre-existing implementation. GUIDs are unique per computer, timestamp, GUID generated during that timestamp and so on, so they will never repeat.
If making a GUID isn't available, using sha1 on the entire input and using the entire output of it is second best.

$name = 'filename'.$user_id(if_available).md5(microtime(true)).'extension';
Try to remove special characters and white spaces from the file name.
If you are saving name in database then a recursive function can be helpful.

Do below with proper methods.
First slice its extension and filename
Now Trim the filename
Change multiple Space into single space
Replace special character and whitespace into to _
Prefix with current timestamp using strtotime and salt using md5(uniqid(rand(), true)) separated by _ (Thanks to #Sudhir )
Suffix with a special signature using str_pad and limit the text length of a file
Now again add extension and formatted file name
hope it make sense.
Thanks

I usually just generate a string for the filename (implementation is not incredibly important), then check if a file already exists with that name. If so, append a counter to it. If you somehow have a lot of files with the same base filename, this could be inefficient, but assuming your string is unique enough, it shouldn't happen very often. There's also the overhead of checking that the file exists.
$base_name = generate_some_random_string(); // use whatever method you like
$extension = '.jpg'; // Change as necessary
$file_name = $base_name . $extension;
$i = 0;
while (file_exists($file_name)) {
$file_name = $base_name . $i++ . $extension;
}
/* insert code to save the file as $file_name */

Create Unique Image Names

What's a good way to create a unique name for an image that my user is uploading?
I don't want to have any duplicates so something like MD5($filename) isn't suitable.
Any ideas?

as it was mentioned, i think that best way to create unique file name is to simply add time(). that would be like
$image_name = time()."_".$image_name;

Grab the file extension from uploaded file:
$ext = pathinfo($uploaded_filename, PATHINFO_EXTENSION);
Grab the time to the second: time()
Grab some randomness: md5(microtime())
Convert time to base 36: base_convert (time(), 10, 36) - base 36 compresses a 10 byte string down to about 6 bytes to allow for more of the random string to be used
Send the whole lot out as a 16 char string:
$unique_id = substr( base_convert( time(), 10, 36 ) . md5( microtime() ), 0, 16 ) . $ext;
I doubt that will ever collide - you could even not truncate it if you don't mind very long file names.

If you actually need a filename (it's not entirely clear from your question) I would use tempnam(), which:
Creates a file with a unique filename, with access permission set to 0600, in the specified directory.
...and let PHP do the heavy lifting of working out uniqueness. Note that as well as returning the filename, tempnam() actually creates the file; you can just overwrite it when you drop the image file there.

You could take a hash (e.g., md5, sha) of the image data itself. That would help identify duplicate images too (if it was byte-for-byte, the same). But any sufficiently long string of random characters would work.
You can always rig it up in a way that the file name looks like:
/image/0/1/012345678/original-name.jpg
That way the file name looks normal, but it's still unique.

I'd recommend sha1_file() over md5_file(). It's less prone to collisions.
You could also use hash_file('sha256', $filePath) to get even better results.

http://php.net/manual/en/function.uniqid.php maybe?
You can prefix it with the user id to avoid collisions between 2 users (in less than one millisecond).

For short names:
$i = 0;
while(file_exists($name . '_' . $i)){
$i++;
}
WARNING: this might fail on a multi threaded server if two user upload a image with the same name at the same time.
In that case you should include the md5 of the username.

lol there are around 63340000000000000000000000000000000000000000000000 possibility's that md5 can produce
plus you could use just tobe on the safe side
$newfilename = md5(time().'image');
if(file_exists('./images/'.$newfilename)){
$newfilename = md5(time().$newfilename);
}
//uploadimage

How big is the probablity of two users uploading image with same name on same microsecond ?
try
$currTime = microtime(true);
$finalFileName = cleanTheInput($fileName)."_".$currTime;
// you can also append a _.rand(0,1000) in the end to have more foolproof name collision
function cleanTheInput($input)
{
// do some formatting here ...
}
This would also help you in tracking the upload time of the file for analysis. or may be sort the files,manage the files.

For good performance and uniqueness you can use approach like this:
files will be stored on a server with names like md5_file($file).jpg
the directory to store file in define from md5 file name, by stripping first two chars (first level), and second two (second level) like that:
uploaded_files\ 30 \ c5 \ 30 c5 67139b64ee14c80cc5f5006d8081.pdf
create record in database with file_id, original file name, uploaded user id, and path to file on server
on server side create script that'll get role of download providing - it'll get file by id from db, and output its content with original filename provided by user (see php example of codeigniter download_helper ). So url to file will look like that:
http://site.com/download.php?file=id
Pros:
minified collisions threat
good performance at file lookup (not much files in 1 directory, not much directories at the same level)
original file names are saved
you can adjust access to files by server side script (check session or cookies)
Cons:
Good for small filesizes, because before user can download file, server have to read this file in memory

try this file format:
$filename = microtime(true) . $username . '.jpg';

I think it would be good for you.
<?php
$name=uniqid(mt_rand()).$image_name;
?>

You should try to meet two goals: Uniqueness, and usefulness.
Using a GUID guarantees uniqueness, but one day the files may become detached from their original source, and then you will be in trouble.
My typical solution is to embed crucial information into the filename, such as the userID (if it belongs to a user) or the date and time uploaded (if this is significant), or the filename used when uploading it.
This may really save your skin one day, when the information embedded in the filename allows you to, for example, recover from a bug, or the accidental deletion of records. If all you have is GUIDs, and you lose the catalogue, you will have a heck of a job cleaning that up.
For example, if a file "My Holiday: Florida 23.jpg" is uploaded, by userID 98765, on 2013/04/04 at 12:51:23 I would name it something like this, adding a random string ad8a7dsf9:
20130404125123-ad8a7dsf9-98765-my-holiday-florida-23.jpg
Uniqueness is ensured by the date and time, and random string (provided it is properly random from /dev/urandom or CryptGenRandom.
If the file is ever detached, you can identify the user, the date and time, and the title.
Everything is folded to lower case and anything non-alphanumeric is removed and replaced by dashes, which makes the filename easy to handle using simple tools (e.g. no spaces which can confuse badly written scripts, no colons or other characters which are forbidden on some filesystems, and so on).

Something like this could work for you:
while (file_exists('/uploads/' . $filename . '.jpeg')) {
$filename .= rand(10, 99);
}

Ready-to-use code:
$file_ext = substr($file['name'], -4); // e.g.'.jpg', '.gif', '.png', 'jpeg' (note the missing leading point in 'jpeg')
$new_name = sha1($file['name'] . uniqid('',true)); // this will generate a 40-character-long random name
$new_name .= ((substr($file_ext, 0, 1) != '.') ? ".{$file_ext}" : $file_ext); //the original extension is appended (accounting for the point, see comment above)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Duplicate filename protection by incrementation - php

The issue was saving file uploads locally, and trying to find a nice way to handle duplicate file names.

Related

Case Insensitive file_exists for large list of files

Variable + Wildcard image removal

how to avoid trouble caused by duplicate picture name in php

algorithm to name files with no probability of repetition

Create Unique Image Names

Categories

Resources