Image upload storage strategies - php

When a user uploads an image to my site, the image goes through this process;
user uploads pic
store pic metadata in db, giving the image a unique id
async image processing (thumbnail creation, cropping, etc)
all images are stored in the same uploads folder
So far the site is pretty small, and there are only ~200,000 images in the uploads directory. I realise I'm nowhere near the physical limit of files within a directory, but this approach clearly won't scale, so I was wondering if anyone had any advice on upload / storage strategies for handling large volumes of image uploads.
EDIT:
Creating username (or more specifically, userid) subfolders would seem to be a good solution. With a bit more digging, I've found some great info right here; How to store images in your filesystem
However, would this userid dir approach scale well if a CDN is bought into the equation?

I've answered a similar question before but I can't find it, maybe the OP deleted his question...
Anyway, Adams solution seems to be the best so far, yet it isn't bulletproof since images/c/cf/ (or any other dir/subdir pair) could still contain up to 16^30 unique hashes and at least 3 times more files if we count image extensions, a lot more than any regular file system can handle.
AFAIK, SourceForge.net also uses this system for project repositories, for instance the "fatfree" project would be placed at projects/f/fa/fatfree/, however I believe they limit project names to 8 chars.
I would store the image hash in the database along with a DATE / DATETIME / TIMESTAMP field indicating when the image was uploaded / processed and then place the image in a structure like this:
images/
2010/ - Year
04/ - Month
19/ - Day
231c2ee287d639adda1cdb44c189ae93.png - Image Hash
Or:
images/
2010/ - Year
0419/ - Month & Day (12 * 31 = 372)
231c2ee287d639adda1cdb44c189ae93.png - Image Hash
Besides being more descriptive, this structure is enough to host hundreds of thousands (depending on your file system limits) of images per day for several thousand years, this is the way Wordpress and others do it, and I think they got it right on this one.
Duplicated images could be easily queried on the database and you'd just have to create symlinks.
Of course, if this is not enough for you, you can always add more subdirs (hours, minutes, ...).
Personally I wouldn't use user IDs unless you don't have that info available in your database, because:
Disclosure of usernames in the URL
Usernames are volatile (you may be able to rename folders, but still...)
A user can hypothetically upload a large number of images
Serves no purpose (?)
Regarding the CDN I don't see any reason this scheme (or any other) wouldn't work...

MediaWiki generates the MD5 sum of the name of the uploaded file, and uses the first two letters of the MD5 (say, "c" and "f" of the sum "cf1e66b77918167a6b6b972c12b1c00d") to create this directory structure:
images/c/cf/Whatever_filename.png
You could also use the image ID for a predictable upper limit on the number of files per directory. Maybe take floor(image unique ID / 1000) to determine the parent directory, for 1000 images per directory.

Yes, yes I know this is an ancient topic. But the problem to store large amount of images and how the underlying folder structure should be organized. So I present my way to handle it in the hope this might help some people.
The idea using md5 hash is the best way to handle massive image storage. Keeping in mind that different values might have the same hash I strongly suggest to add also the user id or nicname to the path to make it unique. Yep that's all what's needed. If someone has different users with the same database id - well, there is something wrong ;) So root_path/md5_hash/user_id is everything you need to do it properly.
Using DATE / DATETIME / TIMESTAMP is not the optimal solution by the way IMO. You end up with big clusters of image folders on a buisy day and nearly empty ones on less frequented ones. Not sure this leads to performance problems but there is something like data aesthetics and a consistent data distribution is always superior.
So I clearly go for the hash solution.
I wrote the following function to make it easy to generate such hash based storage paths. Feel free to use it if you like it.
/**
* Generates directory path using $user_id md5 hash for massive image storing
* #author Hexodus
* #param string $user_id numeric user id
* #param string $user_root_raw root directory string
* #return null|string
*/
function getUserImagePath($user_id = null, $user_root_raw = "images/users", $padding_length = 16,
$split_length = 3, $hash_length = 12, $hide_leftover = true)
{
// our db user_id should be nummeric
if (!is_numeric($user_id))
return null;
// clean trailing slashes
$user_root_rtrim = rtrim( $user_root_raw, '/\\' );
$user_root_ltrim = ltrim( $user_root_rtrim, '/\\' );
$user_root = $user_root_ltrim;
$user_id_padded = str_pad($user_id, $padding_length, "0", STR_PAD_LEFT); //pad it with zeros
$user_hash = md5($user_id); // build md5 hash
$user_hash_partial = $hash_length >=1 && $hash_length < 32
? substr($user_hash, 0, $hash_length) : $user_hash;
$user_hash_leftover = $user_hash_partial <= 32 ? substr($user_hash, $hash_length, 32) : null;
$user_hash_splitted = str_split($user_hash_partial, $split_length); //split in chunks
$user_hash_imploded = implode($user_hash_splitted,"/"); //glue aray chunks with slashes
if ($hide_leftover || !$user_hash_leftover)
$user_image_path = "{$user_root}/{$user_hash_imploded}/{$user_id_padded}"; //build final path
else
$user_image_path = "{$user_root}/{$user_hash_imploded}/{$user_hash_leftover}/{$user_id_padded}"; //build final path plus leftover
return $user_image_path;
}
Function test calls:
$user_id = "1394";
$user_root = "images/users";
$user_hash = md5($user_id);
$path_sample_basic = getUserImagePath($user_id);
$path_sample_advanced = getUserImagePath($user_id, "images/users", 8, 4, 12, false);
echo "<pre>hash: {$user_hash}</pre>";
echo "<pre>basic:<br>{$path_sample_basic}</pre>";
echo "<pre>customized:<br>{$path_sample_advanced}</pre>";
echo "<br><br>";
The resulting output - colorized for your convenience ;):

Have you thought about using something like Amazon S3 to store the files? I run a photo hosting company and after quickly reaching limits on our own server, we switched over to AmazonS3. The beauty of S3 is that there are no limits like inodes and what not, you just keep throwing files at it.
Also: If you don't like S3, you can always try and break it down into subfolders as much as you can:
/userid/year/month/day/photoid.jpg

You can convert a username to md5 and set a folder from 2-3 first letters of md5 converted username for the avatars and for images you can convert and playing with time , random strings , ids and names
8648b8f3ce06a7cc57cf6fb931c91c55 - devcline
Also a first letter of the username or id for the next folder or inverse
It will look like
Structure:
stream/img/86/8b8f3ce06a7cc57cf6fb931c91c55.png //simplest
stream/img/d/2/0bbb630d63262dd66d2fdde8661a410075.png //first letter and id folders
stream/img/864/d/8b8f3ce06a7cc57cf6fb931c91c55.png // with first letter of the nick
stream/img/864/2/8b8f3ce06a7cc57cf6fb931c91c55.png //with unique id
stream/img/2864/8b8f3ce06a7cc57cf6fb931c91c55.png //with unique id in 3 letters
stream/img/864/2_8b8f3ce06a7cc57cf6fb931c91c55.png //with unique id in picture name
Code
$username = substr($username_md5, 1); // to cut first letter from the md5 converted nick
$username_first = $username[0]; // the first letter
$username_md5 = md5($username); // md5 for username
$randomname = uniqid($userid).md5(time()); //for generate a random name based on ID
you can try also with base64
$image_encode = strtr(base64_encode($imagename), '+/=', '-_,');
$image_decode = base64_decode(strtr($imagename, '-_,', '+/='));
Steam And dokuwiki use this structure.

You might consider the open source http://danga.com/mogilefs/ as it is perfect for what you're doing. It'll take you from thinking about folders to namespaces (which could be users) and let it store you images for you. The best part is you don't have to care how the data is stored. It makes it completely redundant and you can even set controls around how redundant thumbnails are as well.

I got soultion im using for a long time. It's quite old code, and can be further optimised, but it still serves good as it is.
It's a immutable function creating directory structure based on:
Number that identifies image (FILE ID):
it's recommended that this numer is unique for base directory, like primary key for database table, but it's not required.
The base directory
The maximum desired number of files and first level subdirectories. This promised can be kept only if every FILE ID is unique.
Example of usage:
Using explicitly FILE ID:
$fileName = 'my_image_05464hdfgf.jpg';
$fileId = 65347;
$baseDir = '/home/my_site/www/images/';
$baseURL = 'http://my_site.com/images/';
$clusteredDir = \DirCluster::getClusterDir( $fileId );
$targetDir = $baseDir . $clusteredDir;
$targetPath = $targetDir . $fileName;
$targetURL = $baseURL . $clusteredDir . $fileName;
Using file name, number = crc32( filename )
$fileName = 'my_image_05464hdfgf.jpg';
$baseDir = '/home/my_site/www/images/';
$baseURL = 'http://my_site.com/images/';
$clusteredDir = \DirCluster::getClusterDir( $fileName );
$targetDir = $baseDir . $clusteredDir;
$targetURL = $baseURL . $clusteredDir . $fileName;
Code:
class DirCluster {
/**
* #param mixed $fileId - numeric FILE ID or file name
* #param int $maxFiles - max files in one dir
* #param int $maxDirs - max 1st lvl subdirs in one dir
* #param boolean $createDirs - create dirs?
* #param string $path - base path used when creatign dirs
* #return boolean|string
*/
public static function getClusterDir($fileId, $maxFiles = 100, $maxDirs = 10,
$createDirs = false, $path = "") {
// Value for return
$rt = '';
// If $fileId is not numerci - lets create crc32
if (!is_numeric($fileId)) {
$fileId = crc32($fileId);
}
if ($fileId < 0) {
$fileId = abs($fileId);
}
if ($createDirs) {
if (!file_exists($path))
{
// Check out the rights - 0775 may be not the best for you
if (!mkdir($path, 0775)) {
return false;
}
#chmod($path, 0775);
}
}
if ( $fileId <= 0 || $fileId <= $maxFiles ) {
return $rt;
}
// Rest from dividing
$restId = $fileId%$maxFiles;
$formattedFileId = $fileId - $restId;
// How many directories is needed to place file
$howMuchDirs = $formattedFileId / $maxFiles;
while ($howMuchDirs > $maxDirs)
{
$r = $howMuchDirs%$maxDirs;
$howMuchDirs -= $r;
$howMuchDirs = $howMuchDirs/$maxDirs;
$rt .= $r . '/'; // DIRECTORY_SEPARATOR = /
if ($createDirs)
{
$prt = $path.$rt;
if (!file_exists($prt))
{
mkdir($prt);
#chmod($prt, 0775);
}
}
}
$rt .= $howMuchDirs-1;
if ($createDirs)
{
$prt = $path.$rt;
if (!file_exists($prt))
{
mkdir($prt);
#chmod($prt, 0775);
}
}
$rt .= '/'; // DIRECTORY_SEPARATOR
return $rt;
}
}

Related

Remove Prestashop orphan images not stored in DB

I need to clean a shop running Prestashop, actually 1.7, since many years.
With this script I removed all the images in the DB not connected to any product.
But there are many files not listed in the DB. For example, actually I have 5 image sizes in settings, so new products shows 6 files in the folder (the 5 above and the imageID.jpg file) but some old product had up to 18 files. Many of these old products have been deleted but in the folder I still find all the other formats, like "2026-small-cart.jpg".
So I tried creating a script to loop in folders, check image files in it and verify if that id_image is stored in the DB.
If not, I can delete the file.
It works but obviously the loop is huge and it stops working as long as I change the starting path folder.
I've tried to reduce the DB queries storing some data (to delete all the images with the same id with a single DB query), but it still crashes as I change the starting path.
It only works with two nested loops (really few...).
Here is the code. Any idea for a better way to get the result?
Thanks!
$shop_root = $_SERVER['DOCUMENT_ROOT'].'/';
include('./config/config.inc.php');
include('./init.php');
$image_folder = 'img/p/';
$image_folder = 'img/p/2/0/3/2/'; // TEST, existing product
$image_folder = 'img/p/2/0/2/6/'; // TEST, product deleted from DB but files in folder
//$image_folder = 'img/p/2/0/2/'; // test, not working...
$scan_dir = $shop_root.$image_folder;
// will check only images...
global $imgExt;
$imgExt = array("jpg","png","gif","jpeg");
// to avoid multiple queries for the same image id...
global $lastID;
global $delMode;
echo "<h1>Examined folder: $image_folder</h1>\r\n";
function checkFile($scan_dir,$name) {
global $lastID;
global $delMode;
$path = $scan_dir.$name;
$ext = substr($name,strripos($name,".")+1);
// if is an image and file name starts with a number
if (in_array($ext,$imgExt) && (int)$name>0){
// avoid extra queries...
if ($lastID == (int)$name) {
$inDb = $lastID;
} else {
$inDb = (int)Db::getInstance()->getValue('SELECT id_product FROM '._DB_PREFIX_.'image WHERE id_image ='.((int) $name));
$lastID = (int)$name;
$delMode = $inDb;
}
// if haven't found an id_product in the DB for that id_image
if ($delMode<1){
echo "- $path has no related product in the DB I'll DELETE IT<br>\r\n";
//unlink($path);
}
}
}
function checkDir($scan_dir,$name2) {
echo "<h3>Elements found in the folder <i>$scan_dir$name2</i>:</h3>\r\n";
$files = array_values(array_diff(scandir($scan_dir.$name2.'/'), array('..', '.')));
foreach ($files as $key => $name) {
$path = $scan_dir.$name;
if (is_dir($path)) {
// new loop in the subfolder
checkDir($scan_dir,$name);
} else {
// is a file, I'll check if must be deleted
checkFile($scan_dir,$name);
}
}
}
checkDir($scan_dir,'');
I would create two files with lists of images.
The first file is the result of a query from your database of every image file referenced in your data.
mysql -BN -e "select distinct id_image from ${DB}.${DB_PREFIX}image" > all_image_ids
(set the shell variables for DB and DB_PREFIX first)
The second file is every image file currently in your directories. Include only files that start with a digit and have an image extension.
find img/p -name '[0-9]*.{jpg,png,gif,jpeg}' > all_image_files
For each filename, check if it's in the list of image ids. If not, then output the command to delete the file.
cat all_image_files | while read filename ; do
# strip the directory name and convert filename to an integer value
b=$(basename $filename)
image_id=$((${b/.*/}))
grep -q "^${image_id}$" all_image_ids || echo "rm ${filename}"
done > files_to_delete
Read the file files_to_delete to visually check that the list looks right. Then run that file as a shell script:
sh files_to_delete
Note I have not tested this solution, but it should give you something to experiment with.

Using PHP how can you tell if the image already exists on your server regardless of name?

I have seen several websites where if you upload an image and an identical image already exists on there servers they will reject the submission. Using PNGs is there an easy way to check one image against a massive folder of images?
http://www.imagemagick.org/discourse-server/viewtopic.php?t=12618
I did find this with imagemagick, but I am looking for one vs many and not one to one a million
You can transform the file content into a sha1. That will give you a way to identify two pictures strictly identical.
see http://php.net/manual/fr/function.sha1-file.php
Then after you save it into a NFS, or use some kind of database to test if the hash already exists.
Details of the images are probably maintained in a database; while the images are stored in the filesystem. And that database probably has a hash column which is used to store an md5 hash of the image file itself, calculated when the image is first uploaded. When a new image is uploaded, it calculates the hash for that image, and then checks to see if any other image detail in the database has a matching hash. If not, it stores the newly uploaded image with that hash; otherwise it can respond with details of the previous upload. If the hash column is indexed in the table, then this check is pretty quick.
If I understood your question correctly. You want to find out if a specific image exists in a Directory with so many images, right? If so, take a look at the solution:
<?php
// CREATE A FUNCTION WHICH RETURNS AN ARRAY OF ALL IMAGES IN A SPECIFIC FOLDER
function getAllImagesInFolder($dir_full_path){
$returnable = array();
$files_in_dir = scandir($dir_full_path);
$reg_fx = '#(\.png|\.jpg|\.bmp|\.gif|\.jpeg)#';
foreach($files_in_dir as $key=>$val){
$temp_file_or_dir = $dir_full_path . DIRECTORY_SEPARATOR . $val;
if(is_file($temp_file_or_dir) && preg_match($reg_fx, $val) ){
$regx_dot_wateva = '/\.{2,4}$/';
$regx_dot = '/\./';
$regx_array = array($regx_dot_wateva, $regx_dot);
$replace_array = array("", "_");
$return_val = preg_replace($regx_array, $replace_array, $val);
$returnable[$return_val] = $temp_file_or_dir ;
}else if(is_dir($temp_file_or_dir) && !preg_match('/^\..*/', $val) ){
getFilesInFolder($temp_file_or_dir);
}
}
return $returnable;
}
// CREATE ANOTHER FUNCTION TO CHECK IF THE SPECIFIED IMAGE EXISTS IN THE GIVEN DIRECTORY.
// THE FIRST PARAMETER SHOULD BE THE RESULT OF CALLING THE PREVIOUS FUNCTION: getAllImagesInFolder(...)
// THE SECOND PARAMETER IS THE IMAGE YOU WANT TO SEARCH WHETHER IT EXISTS IN THE SAID FOLDER OR NOT
function imageExistsInFolder($arrImagesInFolder, $searchedImage){
if(!is_array($arrImagesInFolder) && count($arrImagesInFolder) < 1){
return false;
}
foreach($arrImagesInFolder as $strKey=>$imgPath){
if(stristr($imgPath, $searchedImage)){
return true;
}
}
return false;
}
// NOW GET ALL THE IMAGES IN A SPECIFIED FOLDER AND ASSIGN THE RESULTING ARRAY TO A VARIABLE: $imgFiles
$imgFolder = "/path/to/directory/where/there/are/images";
$arrImgFiles = getAllImagesInFolder($imgFolder);
$searchedImage = "sandwich.jpg"; //<== OR EVEN WITHOUT THE EXTENSION, JUST "sandwich"
// ASSUMING THE SPECIFIC IMAGE YOU WANT TO MATCH IS CALLED sandwich.jpg
// YOU CAN USE THE imageExistsInFolder(...) FUNCTION TO RETURN A BOOLEAN FLAG OF true OR false
// DEPENDING ON IF IT DOES OR NOT.
var_dump($arrImgFiles);
var_dump( imageExistsInFolder($arrImgFiles, $searchedImage) );

upload images through php using unique file names

I am currently in the process of writing a mobile app with the help of phonegap. One of the few features that I would like this app to have is the ability to capture an image and upload it to a remote server...
I currently have the image capturing and uploading/emailing portion working fine with a compiled apk... but in my php, I am currently naming the images "image[insert random number from 10 to 20]... The problem here is that the numbers can be repeated and the images can be overwritten... I have read and thought about just using rand() and selecting a random number from 0 to getrandmax(), but i feel that I might have the same chance of a file overwriting... I need the image to be uploaded to the server with a unique name every-time, no matter what... so the php script would check to see what the server already has and write/upload the image with a unique name...
any ideas other than "rand()"?
I was also thinking about maybe naming each image... img + date + time + random 5 characters, which would include letters and numbers... so if an image were taken using the app at 4:37 am on March 20, 2013, the image would be named something like "img_03-20-13_4-37am_e4r29.jpg" when uploaded to the server... I think that might work... (unless theres a better way) but i am fairly new to php and wouldn't understand how to write something like that...
my php is as follows...
print_r($_FILES);
$new_image_name = "image".rand(10, 20).".jpg";
move_uploaded_file($_FILES["file"]["tmp_name"], "/home/virtual/domain.com/public_html/upload/".$new_image_name);
Any help is appreciated...
Thanks in advance!
Also, Please let me know if there is any further info I may be leaving out...
You may want to consider the PHP's uniqid() function.
This way the code you suggested would look like the following:
$new_image_name = 'image_' . date('Y-m-d-H-i-s') . '_' . uniqid() . '.jpg';
// do some checks to make sure the file you have is an image and if you can trust it
move_uploaded_file($_FILES["file"]["tmp_name"], "/home/virtual/domain.com/public_html/upload/".$new_image_name);
Also keep in mind that your server's random functions are not really random. Try random.org if you need something indeed random. Random random random.
UPD: In order to use random.org from within your code, you'll have to do some API requests to their servers. The documentation on that is available here: www.random.org/clients/http/.
The example of the call would be: random.org/integers/?num=1&min=1&max=1000000000&col=1&base=10&format=plain&rnd=new. Note that you can change the min, max and the other parameters, as described in the documentation.
In PHP you can do a GET request to a remote server using the file_get_contents() function, the cURL library, or even sockets. If you're using a shared hosting, the outgoing connections should be available and enabled for your account.
$random_int = file_get_contents('http://www.random.org/integers/?num=1&min=1&max=1000000000&col=1&base=10&format=plain&rnd=new');
var_dump($random_int);
You should use tempnam() to generate a unique file name:
// $baseDirectory Defines where the uploaded file will go to
// $prefix The first part of your file name, e.g. "image"
$destinationFileName = tempnam($baseDirectory, $prefix);
The extension of your new file should be done after moving the uploaded file, i.e.:
// Assuming $_FILES['file']['error'] == 0 (no errors)
if (move_uploaded_file($_FILES['file']['tmp_name'], $destinationFileName)) {
// use extension from uploaded file
$fileExtension = '.' . pathinfo($_FILES['file']['name'], PATHINFO_EXTENSION);
// or fix the extension yourself
// $fileExtension = ".jpg";
rename($destinationFileName, $destinationFileName . $fileExtension);
} else {
// tempnam() created a new file, but moving the uploaded file failed
unlink($destinationFileName); // remove temporary file
}
Have you considered using md5_file ?
That way all of your files will have unique name and you would not have to worry about duplicate names. But please note that this will return same string if the contents are the same.
Also here is another method:
do {
$filename = DIR_UPLOAD_PATH . '/' . make_string(10) . '-' . make_string(10) . '-' . make_string(10) . '-' . make_string(10);
} while(is_file($filename));
return $filename;
/**
* Make random string
*
* #param integer $length
* #param string $allowed_chars
* #return string
*/
function make_string($length = 10, $allowed_chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890') {
$allowed_chars_len = strlen($allowed_chars);
if($allowed_chars_len == 1) {
return str_pad('', $length, $allowed_chars);
} else {
$result = '';
while(strlen($result) < $length) {
$result .= substr($allowed_chars, rand(0, $allowed_chars_len), 1);
} // while
return $result;
} // if
} // make_string
Function will create a unique name before uploading image.
// Upload file with unique name
if ( ! function_exists('getUniqueFilename'))
{
function getUniqueFilename($file)
{
if(is_array($file) and $file['name'] != '')
{
// getting file extension
$fnarr = explode(".", $file['name']);
$file_extension = strtolower($fnarr[count($fnarr)-1]);
// getting unique file name
$file_name = substr(md5($file['name'].time()), 5, 15).".".$file_extension;
return $file_name;
} // ends for is_array check
else
{
return '';
} // else ends
} // ends
}

Php WHILE loops only find one element

I got a problem with the following php code. It is supposed to list the items of a S3 bucket and find&delete files which contain a certain string in their filenames.
Problem is: only one file is deleted the others remain on the bucket after the execution of the script.
I can't find where the issue comes from so I ask you :/
$aS3Files = $s3->getBucket($bucketName); // list all elements in the bucket
$query = mysql_query("SELECT filename FROM prizes_media WHERE prize_id=" . $_POST["prizeId"]); // finds all filenames linked to the prize
while($media = mysql_fetch_array($query)){
// Find relevant files
while ( list($cFilename, $rsFileData) = each($aS3Files) ) { // reformat the bucket list into a table and reads through it
if(strpos($cFilename,$media['filename'])) {
$s3->deleteObject($bucketName, $cFilename); // deletes all files that contain $media['filename'] in their filename
}
}
}
// 2. Delete DB entry
mysql_query("DELETE FROM prizes WHERE id=" . $_POST['prizeId'] ); // deletes the entry correponding to the prize in the DB (deletes media table in cascade)
You may be getting false negatives on your if, you should be using this:
if(strpos($cFilename,$media['filename']) !== FALSE) { ...
Edit
Here is a different way to loop the bucket, based on the structure on your comment:
foreach($aS3Files as $filename => $filedata) {
if(strpos($filename, $media['filename']) !== FALSE) {
$s3->deleteObject($bucketName, $filename); // deletes all files that contain $media['filename'] in their filename
}
}

Create Files Automatically using PHP script

I have a project that needs to create files using the fwrite in php. What I want to do is to make it generic, I want to make each file unique and dont overwrite on the others.
I am creating a project that will record the text from a php form and save it as html, so I want to output to have generated-file1.html and generated-file2.html, etc.. Thank you.
This will give you a count of the number of html files in a given directory
$filecount = count(glob("/Path/to/your/files/*.html"));
and then your new filename will be something like:
$generated_file_name = "generated-file".($filecount+1).".html";
and then fwrite using $generated_file_name
Although I've had to do a similar thing recently and used uniq instead. Like this:
$generated_file_name = md5(uniqid(mt_rand(), true)).".html";
I would suggest using the time as the first part of the filename (as that should then result in files being listed in chronological/alphabetic order, and then borrow from #TomcatExodus to improve the chances of the filename being unique (incase of two submissions being simultaneous).
<?php
$data = $_POST;
$md5 = md5( $data );
$time = time();
$filename_prefix = 'generated_file';
$filename_extn = 'htm';
$filename = $filename_prefix.'-'.$time.'-'.$md5.'.'.$filename_extn;
if( file_exists( $filename ) ){
# EXTREMELY UNLIKELY, unless two forms with the same content and at the same time are submitted
$filename = $filename_prefix.'-'.$time.'-'.$md5.'-'.uniqid().'.'.$filename_extn;
# IMPROBABLE that this will clash now...
}
if( file_exists( $filename ) ){
# Handle the Error Condition
}else{
file_put_contents( $filename , 'Whatever the File Content Should Be...' );
}
This would produce filenames like:
generated_file-1300080525-46ea0d5b246d2841744c26f72a86fc29.htm
generated_file-1300092315-5d350416626ab6bd2868aa84fe10f70c.htm
generated_file-1300109456-77eae508ae79df1ba5e2b2ada645e2ee.htm
If you want to make absolutely sure that you will not overwrite an existing file you could append a uniqid() to the filename. If you want it to be sequential you'll have to read existing files from your filesystem and calculate the next increment which can result in an IO overhead.
I'd go with the uniqid() method :)
If your implementation should result in unique form results every time (therefore unique files) you could hash form data into a filename, giving you unique paths, as well as the opportunity to quickly sort out duplicates;
// capture all posted form data into an array
// validate and sanitize as necessary
$data = $_POST;
// hash data for filename
$fname = md5(serialize($data));
$fpath = 'path/to/dir/' . $fname . '.html';
if(!file_exists($fpath)){
//write data to $fpath
}
Do something like this:
$i = 0;
while (file_exists("file-".$i.".html")) {
$i++;
}
$file = fopen("file-".$i.".html");

Categories