How to create increment folder name in php - php

I have an HTML form with three inputs:
name
consultant id (number)
picture upload
After the user submits the form, a php script would:
Create folder with the submitted name
Inside the folder create a txt file with: name + consultant id (given number)
Inside the folder, store the image uploaded by user
The most important thing I want is that folders created by the php file should be increased by 1. What I mean: folder1 (txt file + image), folder2 (txt file + image), folder3 (txtfile + image) and so on...

There are a few different methods for accomplishing what you describe. One option would be look at all existing folders(directories) when you attempt to create a new one and determine the next highest number.
You can accomplish this by using scandir on your parent output directory to find existing files.
Example:
$max=0;
$files=scandir("/path/to/your/output-directory");
$matches=[];
foreach($files as $file){
if(preg_match("/folder(\d+)/", $file, $matches){
$number=intval($matches[1]);
if($number>$max)
$max=$number;
}
}
$newNumber=$max+1;
That is a simple example to get you the next number. There are many other factors to consider. For instance, what happens if two users submit the form concurrently? You would need some synchronization metaphor(such as semaphore or file lock) to ensure only insert can occur at a time.
You could use a separate lock file to store the current number and function as a synchronization method.
I would highly encourage finding a different way to store the data. Using a database to store this data may be a better option.
If you need to store the files on disk, locally, you may consider other options for generating the directory name. You could use a timestamp, a hash of the data, or a combination thereof, for instance. You may also be able to get by with something like uniqid. Any filesystem option will require some form of synchronization to address race conditions.
Here is a more complete example for sequentially creating directories using a lock file for the sequence and synchronization. This omits some error handling that should be added for production code, but should provide the core functionality.
define("LOCK_FILE", "/some/file/path"); //A file for synchronization and to store the counter
define("OUTPUT_DIRECTORY", "/some/directory"); //The directory where you want to write your folders
//Open the lock file
$file=fopen(LOCK_FILE, "r+");
if(flock($file, LOCK_EX)){
//Read the current value of the file, if empty, default to 0
$last=fgets($file);
if(empty($last))
$last=0;
//Increment to get the current ID
$current=$last+1;
//Write over the existing value(a larger number will always completely overwrite a smaller number written from the same position)
rewind($file);
fwrite($file, (string)$current);
fflush($file);
//Determine the path for the next directory
$dir=OUTPUT_DIRECTORY."/folder$current";
if(file_exists($dir))
die("Directory $dir already exists. Lock may have been reset");
//Create the next directory
mkdir($dir);
//TODO: Write your content to $dir (You'll need to provide this piece)
//Release the lock
flock($file, LOCK_UN);
}
else{
die("Unable to acquire lock");
}
//Always close the file handle
fclose($file);

Related

PHP - can my script for fetching filenames and finding new files be faster?

I have FTP access to 1 directory that holds all images for all products of the vendor.
1 product has multiple images: variations in size and variations in display of the product.
There is no "list" (XML, CSV, database..) by which I am able to know "what's new".
For now the only way I see is to grab all filenames and compare them with the ones in my DB.
The last check counted 998.283 files in that directory.
1 product has multiple variations and there is no documentation of how they are named.
I did an initial grab of the filenames, compared them with my products and saved in database table for "images" with their filenames and date modified (from file).
The next step is to check for "new ones".
What I am doing now is:
// get the file list /
foreach ($this->getFilenamesFromFtp() as $key => $image_data) {
// I extract data from filenames (product code, size, variation number, extension..) so I can store them in table and later use that as reference (ie. I want to use only large images of variation, not all sizes
$data=self::extractDataFromImage($image_data);
// checking if filename already exists in DB images
// if there is DB entry (TRUE) it will do nothing, and if there is none it will continue with insertion in DB
if($this->checkForFilenameInDb($data['filename'])){
}
else{
$export_codes=$this->export->getProductIds();
// check if product code is in export table - that is do we really need this image
if($this->functions->in_array_r($data['product_code'],$export_codes)){
self::insertImageDataInDb($data);
} // end if
} // end if check if filename is already in DB
} // end foreach
and my method getFilenamesFromFtp() looks like this:
$filenames = array();
$i=1;
$ftp = $this->getFtpConfiguration();
// set up basic connection
$conn_id = ftp_ssl_connect($ftp['host']);
// login with username and password
$login_result = ftp_login($conn_id, $ftp['username'], $ftp['pass']);
ftp_set_option($conn_id, FTP_USEPASVADDRESS, false);
$mode = ftp_pasv($conn_id, TRUE);
ftp_set_option($conn_id, FTP_TIMEOUT_SEC, 180);
//Login OK ?
if ((!$conn_id) || (!$login_result) || (!$mode)) { // || (!$mode)
die("FTP connection has failed !");
}
else{
// I get all filenames and store them in array
$files=ftp_nlist($conn_id, ".");
// I count the number of files in array = the number of files on FTP
$nofiles=count($files);
foreach($files as $filename){
// the limit I implemented while developing or testing, but in production (current mode) it has to run without limit
if(self::LIMIT>0 && $i==self::LIMIT){ //!empty(self::LIMIT) &&
break;
}
else{
// I get date modified from from file
$date_modified = ftp_mdtm($conn_id, $filename);
// I create new array for filenames and date modified so I can return it and store it in DB
$filenames[]= array(
"filename" => $filename,
"date_modified" => $date_modified
);
} // end if LIMIT empty
$i++;
} // end foreach
// close the connection
ftp_close($conn_id);
return $filenames;
}
The problem is that script takes a long time.
The longest period I have detected by now is when in getFilenamesFromFtp() I create the array:
$filenames[]= array(
"filename" => $filename,
"date_modified" => $date_modified
);
That part so far lasts for 4h and is still not done.
While writing this I had an idea to remove "date modified" from the beginning and to use that later only if I am planning to store that image in DB.
I will update this question as soon as I am done with this change and test :)
Processing a million filenames will take time, however, I see no reason to store those file names (and date_modified) in an array, why not process a filename directly?
Also, instead of completely processing a filename, why not store it in a database table first? Then you can do the real processing later. By splitting the task in two, retrieval and processing, it becomes more flexible. For instance, you don't need to do a new retrieval if you want to change the processing.
If the objective is to just display new files on the webpage:
You can just store the highest file created/modified time from the DB.
This way, for the next batch, just fetch the last modified time and compare it against file created/modified time of all the files. This will make your app pretty lightweight. You can use filemtime for this.
Now, take highest filemtime of all current files in iteration and store the highest recorded in the DB and repeat the same above steps.
Suggestions:
foreach ($this->getFilenamesFromFtp() as $key => $image_data) {
If the above snippet gets all filenames in an array, you can discard this strategy. This would consume a lot of memory. Instead read files by one by one using directory functions as mentioned in this answer, as this one maintains an internal pointer for the handle and doesn't load all files at once. Of course, you need to make the pointed out answer follow recursive iteration as well for nested directories.

Check if image exists - database or file lookup?

I hash a file's name based on its contents and store this name reference in a database, and store the file on a server.
Would it be more efficient (quicker) to check for duplicate files (and therefore not re-upload) via checking its name in a database or by checking if the file exists on the server?
There would be 1000s of files.
I had the same issue, we have roughly 40k images and the duplicates were loading heavy on our server, especially with image license management, as the same license had to be added to the same image multiple times.
I recommmend a database lookup. It's much faster as your collection of files grows. a 40k table scan takes something like 20 milliseconds. A 40k file search on disk runs in a few seconds, which gets annoying fast.
To solve this we changed how images were uploaded, so we don't get duplicate files, but multiple database records that reference the same physical file on disk. This gives us speed for looking up the file data, without having the "file" or even knowing where the actual file is.
We also don't store the file as the original filename, but as hexadicimal hash based on date and time, so we don't get conflicting filenames, and have no delivery issues due to special characters, spaces, etc... and store the original file name in a database field for lookup purposes.
Our images have their "metadata" stored in the database, with a hexadecimal file name and an "original" filename. It's really fast to check against this database, and then retrieve the file link when there's a match/relation. This also allows checking if this file has already been uploaded, as you don't need to scan the entire directory structure with alt the images, which can take up a significant amount of time.
This is the code I use, you can use something similar. Note that this is using laravel's eloquent, but it's fairly easy to replicate in pure mysql.
First you get an instance to query the file model table.
Then you check for a file where the original filename, file size, content type and other meta data that shouldn't change are the same.
If they are the same, make your file entry a duplicate of the original entry(In my case this allows modifying image titles and descriptions for each reference)
$file = new $FILE();
$existingFile = $file->newQuery()->where('file_name',$uploadedFile->getClientOriginalName())
->where('file_size',$uploadedFile->getSize())
->where('content_type',$uploadedFile->getMimeType())
->where('is_public',$fileRelation->isPublic())->limit(1)->get()->first();
if($existingFile) {
$file->disk_name = $existingFile->disk_name;
$file->file_size = $existingFile->file_size;
$file->file_name = $existingFile->file_name;
$file->content_type = $existingFile->content_type;
$file->is_public = $existingFile->is_public;
}
else {
$file->data = $uploadedFile;
$file->is_public = $fileRelation->isPublic();
}
then as the file is deleted, you need to check if it's the "last one"
public function afterDelete()
{
try {
$count = $this->newQuery()->where('disk_name','=',$this->disk_name)
->where('file_name','=',$this->file_name)
->where('file_size','=', $this->file_size)
->count();
if(!$count) {
$this->deleteThumbs();
$this->deleteFile();
}
}
catch (Exception $ex) {
traceLog($ex->getMessage() . '\n' . $ex->getTraceAsString());
}
}

PHP: Safe way to remove a directory?

Consider this code:
public static function removeDir($src)
{
if (is_dir($src)) {
$dir = #opendir($src);
if ($dir === false)
return;
while(($file = readdir($dir)) !== false) {
if ($file != '.' && $file != '..') {
$path = $src . DIRECTORY_SEPARATOR . $file;
if (is_dir($path)) {
self::removeDir($path);
} else {
#unlink($path);
}
}
}
closedir($dir);
#rmdir($src);
}
}
This will remove a directory. But if unlink fails or opendir fails on any subdirectory, the directory will be left with some content.
I want either everything deleted, or nothing deleted. I'm thinking of copying the directory before removal and if anything fails, restoring the copy. But maybe there's a better way - like locking the files or something similar?
I general I would confirm the comment:
"Copy it, delete it, copy back if deleted else throw deleting message fail..." – We0
However let's take some side considerations:
Trying to implement a transaction save file deletion indicates that you want to allow competing file locks on the same set of files. Transaction handling is usually the most 'expensive' way to ensure consistency. This holds true even if php would have any kind of testdelete available, because you would need to testdelete everything in a first run and then do a second loop which costs time (and where you are in danger that something changed on your file system in the meanwhile). There are other options:
Try to isolate what really needs to be transaction save and handle those data accesses in databases. Eg. MySQL/InnoDB supports all the nitty gritty details of transaction handling
Define and implement dedicated 'write/lock ownership'. So you have folders A and B with sub items and your php is allowed to lock files in A and some other process is allowed to lock files in B. Both your php and the other process are allowed to read A and B. This gets tricky on files, because a file read causes a lock as well which lasts the longer the bigger the files are. So on file basis you probably need to enrich this with file size limits, torance periods and so on.
Define and implement dedicated access time frames. Eg. All files can be used within the week, but you have a maintenance time frame at sunday night which can also run deletions and therefore requires lock free environments.
Right, let's say my reasoning was not frightening enough :) - and you implement a transaction save file deletion anyway - your routine can be implemented this way:
backup all files
if the backup fails you could try a second, third, fourth time (this is an implementation decision)
if there is no successful backup, full stop
run your deletion process, two implementation options (in any way you need to log the files you deleted successfully):
always run through fully, and document all errors (this can be returned to the user later on as homework task list, however potentially runs long)
run through and stop at the first error
if the deletion was the successful, all fine/full stop; if not proceed with rolling back
copy back only previously successful deleted file from the archive (ONLY THEM!)
Wipe out your backup
This then is only transaction save on file level. It does NOT handle the case where somebody changes permissions on folders in between step 5 and 6.
Or you could try to just rename/move the directory to something like /tmp/, it succeeds or doesnt - but the files are not gone. Even if another process would have an open handle, the move should be ok. The files will be gone some time later when the tmp folder is emptied.

Directory structure for large number of files

I made one site... where i am storing user uploaded files in separate directories like
user_id = 1
so
img/upload_docs/1/1324026061_1.txt
img/upload_docs/1/1324026056_1.txt
Same way if
user_id = 2
so
img/upload_docs/2/1324026061_2.txt
img/upload_docs/2/1324026056_2.txt
...
n
So now if in future if I will get 100000 users then in my upload_docs folder I will have 100000 folders.
And there is no restriction on user upload so it can be 1000 files for 1 user or 10 files any number of files...
so is this proper way?
Or if not then can anyone suggest me how to store this files in this kind of structure???
What I would do is name the images UUIDs and create subfolders based on the names of the files. You can do this pretty easily with chunk_split. For example, if you create a folder every 4 characters you would end up with a structure like this:
img/upload_docs/1/1324/0260/61_1.txt
img/upload_docs/1/1324/0260/56_1.txt
By storing the image name 1324026056_1.txt you could then very easily determine where it belongs or where to fetch it using chunk_split.
This is a similar method to how git stores objects.
As code, it could look something like this.
// pass filename ('123456789.txt' from db)
function get_path_from_filename($filename) {
$path = 'img/upload_docs';
list($filename, $ext) = explode('.', $filename); //remove extension
$folders = chunk_split($filename, 4, '/'); // becomes 1234/5678/9
// now we store it here
$newpath = $path.'/'.$folders.'/'.$ext;
return $newpath;
}
Now when you search for the file to deliver it to the user, use a function using these steps to recreate where the file is (based on the filename which is still stored as '123456789.txt' in the DB).
Now to deliver or store the file, use get_path_from_filename.
img/upload_docs/1/0/10000/1324026056_2.txt
img/upload_docs/9/7/97555/1324026056_2.txt
img/upload_docs/2/3/23/1324026056_2.txt

What's the best way to read from and then overwrite file contents in php?

What's the cleanest way in php to open a file, read the contents, and subsequently overwrite the file's contents with some output based on the original contents? Specifically, I'm trying to open a file populated with a list of items (separated by newlines), process/add items to the list, remove the oldest N entries from the list, and finally write the list back into the file.
fopen(<path>, 'a+')
flock(<handle>, LOCK_EX)
fread(<handle>, filesize(<path>))
// process contents and remove old entries
fwrite(<handle>, <contents>)
flock(<handle>, LOCK_UN)
fclose(<handle>)
Note that I need to lock the file with flock() in order to protect it across multiple page requests. Will the 'w+' flag when fopen()ing do the trick? The php manual states that it will truncate the file to zero length, so it seems that may prevent me from reading the file's current contents.
If the file isn't overly large (that is, you can be confident loading it won't blow PHP's memory limit), then the easiest way to go is to just read the entire file into a string (file_get_contents()), process the string, and write the result back to the file (file_put_contents()). This approach has two problems:
If the file is too large (say, tens or hundreds of megabytes), or the processing is memory-hungry, you're going to run out of memory (even more so when you have multiple instances of the thing running).
The operation is destructive; when the saving fails halfway through, you lose all your original data.
If any of these is a concern, plan B is to process the file and at the same time write to a temporary file; after successful completion, close both files, rename (or delete) the original file and then rename the temporary file to the original filename.
Read
$data = file_get_contents($filename);
Write
file_put_contents($filename, $data);
One solution is to use a separate lock file to control access.
This solution assumes that only your script, or scripts you have access to, will want to write to the file. This is because the scripts will need to know to check a separate file for access.
$file_lock = obtain_file_lock();
if ($file_lock) {
$old_information = file_get_contents('/path/to/main/file');
$new_information = update_information_somehow($old_information);
file_put_contents('/path/to/main/file', $new_information);
release_file_lock($file_lock);
}
function obtain_file_lock() {
$attempts = 10;
// There are probably better ways of dealing with waiting for a file
// lock but this shows the principle of dealing with the original
// question.
for ($ii = 0; $ii < $attempts; $ii++) {
$lock_file = fopen('/path/to/lock/file', 'r'); //only need read access
if (flock($lock_file, LOCK_EX)) {
return $lock_file;
} else {
//give time for other process to release lock
usleep(100000); //0.1 seconds
}
}
//This is only reached if all attempts fail.
//Error code here for dealing with that eventuality.
}
function release_file_lock($lock_file) {
flock($lock_file, LOCK_UN);
fclose($lock_file);
}
This should prevent a concurrently-running script reading old information and updating that, causing you to lose information that another script has updated after you read the file. It will allow only one instance of the script to read the file and then overwrite it with updated information.
While this hopefully answers the original question, it doesn't give a good solution to making sure all concurrent scripts have the ability to record their information eventually.

Categories