Upload image evenly into a directory structure

Upload image evenly into a directory structure - php

Im sure this question has been asked thousand of times, so here goes my version...
I have a form that uploads images...
Every image contains an unique id. I use the following function to generate my unid id:
function generateUnid($key) {
$name = $_FILES[$key]['name']; //get image name from global variable $_FILES
$ext = pathinfo($name, PATHINFO_EXTENSION); //get image extension
$prefix = 'fc'; //prefix for unid
do {
$unid = uniqid($prefix, true); //generate a unid
$filename = $unid . '.' . $ext; //replace image name with unid
$path = PATH_UPLOAD_ARTWORK . $filename; // image path
} while (file_exists($path)); // check if the image name exists
return $filename;
}
A sample of return values is:
fc4e7801523a04e6.06876802.jpg
So far so good. Now, i want to create some sort of directory structure for my images. Something similar like:
0
0
1
2
fc4e7801523a04e6.06876802.jpg
...
3
...
1
0
1
2
3
...
2
0
1
...
I could probably get the last 2 integers in my unique id for filing the image in the correct directory. But, i'm not to sure if that is the correct strategy...
How can i make sure that the images are filed evenly in the folders. I don't want to find my self with one folder that contains 12 000 images and one folder with 1 500 images...
Am i doing it the correct way by extracting the last 2 numbers of my uniq? Are there better ways for filing the image evenly?
Thanks

Assuming the unique id is uniformly (psuedo)random, which I think it is, this strategy will work pretty well I think. There will inevitably be a few folders with many more or many less than the average, predicted by normal distribution.
A slightly better technique for "binning" the images is to use the modulo (%) of many digits from the uid, rather than using the last two digits, in case the digits you have picked have some kind of pattern.
My advice would be to give it a go and see how it works for you. Ideally, you could create a "test harness" which calls the algorithm hundreds of thousands of times, after which you could assess whether the distribution of files in the directory structure is appropriate for your purposes.

Related

Filtering filenames in PHP

I'm trying to group a bunch of files together based on RecipeID and StepID. Instead of storing all of the filenames in a table I've decided to just use glob to get the images for the requested recipe. I feel like this will be more efficient and less data handling. Keeping in mind the directory will eventually contain many thousands of images. If I'm wrong about this then the below question is not necessary lol
So let's say I have RecipeID #5 (nachos, mmmm) and it has 3 preparation steps. The naming convention I've decided on would be as such:
5_1_getchips.jpg
5_2_laycheese.jpg
5_2_laytomatos.jpg
5_2_laysalsa.jpg
5_3_bake.jpg
5_finishednachos.jpg
5_morefinishedproduct.jpg
The files may be generated by a camera, so DSC###.jpg...or the person may have actually named each picture as I have above. Multiple images can exist per step. I'm not sure how I'll handle dupe filenames, but I feel that's out of scope.
I want to get all of the "5_" images...but filter them by all the ones that DON'T have any step # (grouped in one DIV), and then get all the ones that DO have a step (grouped in their respective DIVs).
I'm thinking of something like
foreach ( glob( $IMAGES_RECIPE . $RecipeID . "-*.*") as $image)
and then using a substr to filter out the step# but I'm concerned about getting the logic right because what if the original filename already has _#_ in it for some reason. Maybe I need to have a strict naming convention that always includes _0_ if it doesn't belong to a step.
Thoughts?

Globbing through 1000s of files will never being faster than having indexed those files in a database (of whatever type) and execute a database query for them. That's what databases are meant for.

I had a similar issue with 15,000 mp3 songs.
In the Win command line dir
dir *.mp3 /b /s > mp3.bat
Used a regex search and replace in NotePad++ that converted the the file names and prefixed and appended text creating a Rename statement and Ran the mp3.bat.
Something like this might work for you in PHP:
Use regex to extract the digits using preg_replace to
Create a logic table(s) to create the words for the new file names
create the new filename with rename()
Here is some simplified and UNTESTED Example code to show what I am suggesting.
Example Logic Table:
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$folder = '/home/user/public_html/recipies/';
$dir=opendir($folder);
while (false !== ($found=readdir($dir))){
if pathinfo($file,PATHINFO_EXTENSION) == '.jpg')
{
$files[]= pathinfo($file,PATHINFO_FILENAME);
}
}
foreach($files as $key=> $filename){
$digit1 = 'DSC(\d)\d\d\.jpg/',"$1", $filename);
$digit2 = 'DSC\d(\d)\d\.jpg',"$1", $filename);
$digit3 = 'DSC\d\d(\d)\.jpg',"$1", $filename);
$newName = $translation[$digit1][$digit2][$digit3]
ren($filename,$newfilename);
}

How can I bulk rename files in a photo folder with sequential numbers

This was a poorly worded question
My main goal was to display many images or even a range from a set arbitrarily, renaming the files to sequential numbers seemed like it would make the displaying or iterating through the files easier if they just differed by 1 rather than random strings.
-- anyway I'm going to read... seems glob is according to php manual ... The glob() function searches for all the pathnames matching pattern according to the rules used by the libc glob() function
The files have random names but they are all .jpg's
As an example, "this name".jpg is replaced with "i+1".jpg
So that I can display the photos lazily using a for loop incrementing the numbers. The primary purpose is to display the photos regardless of their file names.

You can use php glob and rename functions:
$num = 1;
foreach (glob("/path/to/images/*.jpg") as $filename) {
$fileNoExtension = basename($filename, ".jpg");
rename ($filename, "$fileNoExtension{$num}.jpg");
$num++;
}

Basically,
$count = 1;
foreach (glob('*.jpg') as $filename) {
#rename($filename, $count.'.jpg');
$count++;
}
echo 'Done';
All image files must be in the current folder (along with the script).
You can add a path manually, if you wish:
glob('/path/to/*.jpg')
rename('path/..'.$file etc. )

PHP generating same random number at same time (in seconds)

I'm using random number function in a PHP script while uploading files. Because I wanted to avoid overwrite files with same name. So following is the script potion is used while upload the file.
$filename = rand(0,100000).strtolower($_FILES['file']['name']);
$dir="/file/upload/directory/".$filename;
move_uploaded_file($_FILES["user_file"]["tmp_name"], $dir);
This application expected to have large amount of concurrent users. So QA testing this application with different automated tools by applying high concurrent visit. That point the random number seems generating the same value within the same seconds.
Then we test the random number separately this same random number on same time was clearly identified.
While search on web some post suggest on mt_rand() but still it is same on milli second level.
Is there any way of generating random number in time independent way in PHP?

Random numbers are generated with time. But for this particular issue we need to write few lines of code. If we check for file existence and apply incremental number to the file name it will be a fixed solution. The code can be like follows.
$filename = strtolower($_FILES['file']['name']);
$dir="/file/upload/directory/";
$i = 1;
while(is_file($dir . $i . $filename))
{
$i++;
}
move_uploaded_file($_FILES["user_file"]["tmp_name"], $dir . $i . $filename);
Even though loop is inefficient. This will make sure the file overwrite won't happen.

Create Unique Image Names

What's a good way to create a unique name for an image that my user is uploading?
I don't want to have any duplicates so something like MD5($filename) isn't suitable.
Any ideas?

as it was mentioned, i think that best way to create unique file name is to simply add time(). that would be like
$image_name = time()."_".$image_name;

Grab the file extension from uploaded file:
$ext = pathinfo($uploaded_filename, PATHINFO_EXTENSION);
Grab the time to the second: time()
Grab some randomness: md5(microtime())
Convert time to base 36: base_convert (time(), 10, 36) - base 36 compresses a 10 byte string down to about 6 bytes to allow for more of the random string to be used
Send the whole lot out as a 16 char string:
$unique_id = substr( base_convert( time(), 10, 36 ) . md5( microtime() ), 0, 16 ) . $ext;
I doubt that will ever collide - you could even not truncate it if you don't mind very long file names.

If you actually need a filename (it's not entirely clear from your question) I would use tempnam(), which:
Creates a file with a unique filename, with access permission set to 0600, in the specified directory.
...and let PHP do the heavy lifting of working out uniqueness. Note that as well as returning the filename, tempnam() actually creates the file; you can just overwrite it when you drop the image file there.

You could take a hash (e.g., md5, sha) of the image data itself. That would help identify duplicate images too (if it was byte-for-byte, the same). But any sufficiently long string of random characters would work.
You can always rig it up in a way that the file name looks like:
/image/0/1/012345678/original-name.jpg
That way the file name looks normal, but it's still unique.

I'd recommend sha1_file() over md5_file(). It's less prone to collisions.
You could also use hash_file('sha256', $filePath) to get even better results.

http://php.net/manual/en/function.uniqid.php maybe?
You can prefix it with the user id to avoid collisions between 2 users (in less than one millisecond).

For short names:
$i = 0;
while(file_exists($name . '_' . $i)){
$i++;
}
WARNING: this might fail on a multi threaded server if two user upload a image with the same name at the same time.
In that case you should include the md5 of the username.

lol there are around 63340000000000000000000000000000000000000000000000 possibility's that md5 can produce
plus you could use just tobe on the safe side
$newfilename = md5(time().'image');
if(file_exists('./images/'.$newfilename)){
$newfilename = md5(time().$newfilename);
}
//uploadimage

How big is the probablity of two users uploading image with same name on same microsecond ?
try
$currTime = microtime(true);
$finalFileName = cleanTheInput($fileName)."_".$currTime;
// you can also append a _.rand(0,1000) in the end to have more foolproof name collision
function cleanTheInput($input)
{
// do some formatting here ...
}
This would also help you in tracking the upload time of the file for analysis. or may be sort the files,manage the files.

For good performance and uniqueness you can use approach like this:
files will be stored on a server with names like md5_file($file).jpg
the directory to store file in define from md5 file name, by stripping first two chars (first level), and second two (second level) like that:
uploaded_files\ 30 \ c5 \ 30 c5 67139b64ee14c80cc5f5006d8081.pdf
create record in database with file_id, original file name, uploaded user id, and path to file on server
on server side create script that'll get role of download providing - it'll get file by id from db, and output its content with original filename provided by user (see php example of codeigniter download_helper ). So url to file will look like that:
http://site.com/download.php?file=id
Pros:
minified collisions threat
good performance at file lookup (not much files in 1 directory, not much directories at the same level)
original file names are saved
you can adjust access to files by server side script (check session or cookies)
Cons:
Good for small filesizes, because before user can download file, server have to read this file in memory

try this file format:
$filename = microtime(true) . $username . '.jpg';

I think it would be good for you.
<?php
$name=uniqid(mt_rand()).$image_name;
?>

You should try to meet two goals: Uniqueness, and usefulness.
Using a GUID guarantees uniqueness, but one day the files may become detached from their original source, and then you will be in trouble.
My typical solution is to embed crucial information into the filename, such as the userID (if it belongs to a user) or the date and time uploaded (if this is significant), or the filename used when uploading it.
This may really save your skin one day, when the information embedded in the filename allows you to, for example, recover from a bug, or the accidental deletion of records. If all you have is GUIDs, and you lose the catalogue, you will have a heck of a job cleaning that up.
For example, if a file "My Holiday: Florida 23.jpg" is uploaded, by userID 98765, on 2013/04/04 at 12:51:23 I would name it something like this, adding a random string ad8a7dsf9:
20130404125123-ad8a7dsf9-98765-my-holiday-florida-23.jpg
Uniqueness is ensured by the date and time, and random string (provided it is properly random from /dev/urandom or CryptGenRandom.
If the file is ever detached, you can identify the user, the date and time, and the title.
Everything is folded to lower case and anything non-alphanumeric is removed and replaced by dashes, which makes the filename easy to handle using simple tools (e.g. no spaces which can confuse badly written scripts, no colons or other characters which are forbidden on some filesystems, and so on).

Something like this could work for you:
while (file_exists('/uploads/' . $filename . '.jpeg')) {
$filename .= rand(10, 99);
}

Ready-to-use code:
$file_ext = substr($file['name'], -4); // e.g.'.jpg', '.gif', '.png', 'jpeg' (note the missing leading point in 'jpeg')
$new_name = sha1($file['name'] . uniqid('',true)); // this will generate a 40-character-long random name
$new_name .= ((substr($file_ext, 0, 1) != '.') ? ".{$file_ext}" : $file_ext); //the original extension is appended (accounting for the point, see comment above)

Generate unique names?

I am working on a php site in which we have to upload images from users.i have to rename that file for preventing conflicts in the name of the image.
uniqid(rand(), true);
and adding a large random number after it.
Will this work perfectly. Any suggestions..??
Its about generation unique names for the image.....

Function tempnam() creates a file with a unique name.

Take an md5 of the file and use that. IIRC, the odds of a collision are 1 in 64M. If that's not enough, prefix it with the timestamp expressed in seconds or milliseconds. That way even if a duplicate md5 is generated, the files would have to come in during the same second/millisecond for a collision.

You can use Base36 on the AutoIncrement value from a SQL Table (hoping that you do use a SQL table).
$filename = base_convert($last_insert_id, 10, 36);

You have two approaches depending on "how" big can be your image library:
1. for a non-big amount of files I do this
<?php
$file = sanitize_file($file); // remove all no [az-09_] characters for safe url linking;
$file_md5 = $md5($file);
$file_extention = $md5($file);
// since I assume the file should belongs to someone you can do this
$file_name = $user_id . $file_md5 . $file_extension;
// then save the file
?>
option.... CacheMogul. Here you need to use your imagination. but for huge amount of files this does a nice sharding so you dont need to worry about a folder max quantity or size

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.