Multiple process read same file php - php

I have a little problem inside my application.
I have many process launched automatically, from the server with crontab, written in php to read file inside a folder.
Sometimes different process read the same file and create a problem inside the application.
Is there a way to manage this problem?
Actually I read all files inside a folder, read each of them and delete immediately, but sometimes another process read the same file before I delete It.
This is my script written with cakephp3 (so some classes like File is only for cakephp3 but isn't the point of the question) to read and delete:
$xml_files = glob(TMP . 'xml/*.xml');
foreach($xml_files as $fileXml)
{
//read the file and put into a string or array or object
$explStr = explode('/', $fileXml);
$filename = $explStr[count($explStr) - 1];
$path = TMP . '/xml/' . $filename;
$file = new File($path, false);
if($file->exists()){
$string = $file->read();
$file->close();
$file->delete();
}
}

Use flock() to obtain (or attempt to obtain) a file lock and act accordingly.

It's called a race condition, and when working with files you could lock the file when process A uses it, it locks it, then other processes would check if it's locked and if it is then do nothing. Then unlock the file when process A has finished with it.

Related

About PHP parallel file read/write

Have a file in a website. A PHP script modifies it like this:
$contents = file_get_contents("MyFile");
// ** Modify $contents **
// Now rewrite:
$file = fopen("MyFile","w+");
fwrite($file, $contents);
fclose($file);
The modification is pretty simple. It grabs the file's contents and adds a few lines. Then it overwrites the file.
I am aware that PHP has a function for appending contents to a file rather than overwriting it all over again. However, I want to keep using this method since I'll probably change the modification algorithm in the future (so appending may not be enough).
Anyway, I was testing this out, making like 100 requests. Each time I call the script, I add a new line to the file:
First call:
First!
Second call:
First!
Second!
Third call:
First!
Second!
Third!
Pretty cool. But then:
Fourth call:
Fourth!
Fifth call:
Fourth!
Fifth!
As you can see, the first, second and third lines simply disappeared.
I've determined that the problem isn't the contents string modification algorithm (I've tested it separately). Something is messed up either when reading or writing the file.
I think it is very likely that the issue is when the file's contents are read: if $contents, for some odd reason, is empty, then the behavior shown above makes sense.
I'm no expert with PHP, but perhaps the fact that I performed 100 calls almost simultaneously caused this issue. What if there are two processes, and one is writing the file while the other is reading it?
What is the recommended approach for this issue? How should I manage file modifications when several processes could be writing/reading the same file?
What you need to do is use flock() (file lock)
What I think is happening is your script is grabbing the file while the previous script is still writing to it. Since the file is still being written to, it doesn't exist at the moment when PHP grabs it, so php gets an empty string, and once the later processes is done it overwrites the previous file.
The solution is to have the script usleep() for a few milliseconds when the file is locked and then try again. Just be sure to put a limit on how many times your script can try.
NOTICE:
If another PHP script or application accesses the file, it may not necessarily use/check for file locks. This is because file locks are often seen as an optional extra, since in most cases they aren't needed.
So the issue is parallel accesses to the same file, while one is writing to the file another instance is reading before the file has been updated.
PHP luckily has a mechanisms for locking the file so no one can read from it until the lock is released and the file has been updated.
flock()
can be used and the documentation is here
You need to create a lock, so that any concurrent requests will have to wait their turn. This can be done using the flock() function. You will have to use fopen(), as opposed to file_get_contents(), but it should not be a problem:
$file = 'file.txt';
$fh = fopen($file, 'r+');
if (flock($fh, LOCK_EX)) { // Get an exclusive lock
$data = fread($fh, filesize($file)); // Get the contents of file
// Do something with data here...
ftruncate($fh, 0); // Empty the file
fwrite($fh, $newData); // Write new data to file
fclose($fh); // Close handle and release lock
} else {
die('Unable to get a lock on file: '.$file);
}

PHP Check the file is opend before rename

I have cleanup script, which move the XLS files from one place to another. for this file moving process, I have used the rename function. This script is working fine. but when the XLS file is open, when I try to move that xls, I am getting error which simply say Can not rename sample.xls. But I would like to add the functionality like, Check the XLS is open before initiate rename function.
I believe this is function call flock but this is applicable for TXT file alone.
How to check XLS file is opened before call the rename function.
One simple thing you could try is to use flock to acquire a Exclusive Lock on the file and if it fails you will know the file is being used:
<?php
$fp = fopen('c:/your_file.xlsx', 'r+');
if(!flock($fp, LOCK_EX))
{
echo 'File is being used...';
exit(-1);
}
else
{
fclose($fp);
// rename(...);
}
An alternative would be to check the existence of the locking file excel usually creates when a file is being used:
<?php
$file = 'c:/testfile.xlsx';
$lock = 'c:/~$testfile.xlsx';
if (file_exists($lock))
{
echo "Excel $file is locked.";
}
else
{
echo "Excel $file is free.";
}
The hidden file is usually name with the prefix ~$ as for old excel files I believe 2003 and older the lock files are saved on the temp folder with a random name like ~DF7B32A4D388B5854C.TMP so it would be pretty hard to find out.
You should use flock(). This puts a flag on the file so that other scripts are informed that the file is in use. The flag is turned off either intentionally using fclose or implicitly by the end of the script.
Use file lock like:
flock($file,LOCK_EX);
see this

What's the best way to read from and then overwrite file contents in php?

What's the cleanest way in php to open a file, read the contents, and subsequently overwrite the file's contents with some output based on the original contents? Specifically, I'm trying to open a file populated with a list of items (separated by newlines), process/add items to the list, remove the oldest N entries from the list, and finally write the list back into the file.
fopen(<path>, 'a+')
flock(<handle>, LOCK_EX)
fread(<handle>, filesize(<path>))
// process contents and remove old entries
fwrite(<handle>, <contents>)
flock(<handle>, LOCK_UN)
fclose(<handle>)
Note that I need to lock the file with flock() in order to protect it across multiple page requests. Will the 'w+' flag when fopen()ing do the trick? The php manual states that it will truncate the file to zero length, so it seems that may prevent me from reading the file's current contents.
If the file isn't overly large (that is, you can be confident loading it won't blow PHP's memory limit), then the easiest way to go is to just read the entire file into a string (file_get_contents()), process the string, and write the result back to the file (file_put_contents()). This approach has two problems:
If the file is too large (say, tens or hundreds of megabytes), or the processing is memory-hungry, you're going to run out of memory (even more so when you have multiple instances of the thing running).
The operation is destructive; when the saving fails halfway through, you lose all your original data.
If any of these is a concern, plan B is to process the file and at the same time write to a temporary file; after successful completion, close both files, rename (or delete) the original file and then rename the temporary file to the original filename.
Read
$data = file_get_contents($filename);
Write
file_put_contents($filename, $data);
One solution is to use a separate lock file to control access.
This solution assumes that only your script, or scripts you have access to, will want to write to the file. This is because the scripts will need to know to check a separate file for access.
$file_lock = obtain_file_lock();
if ($file_lock) {
$old_information = file_get_contents('/path/to/main/file');
$new_information = update_information_somehow($old_information);
file_put_contents('/path/to/main/file', $new_information);
release_file_lock($file_lock);
}
function obtain_file_lock() {
$attempts = 10;
// There are probably better ways of dealing with waiting for a file
// lock but this shows the principle of dealing with the original
// question.
for ($ii = 0; $ii < $attempts; $ii++) {
$lock_file = fopen('/path/to/lock/file', 'r'); //only need read access
if (flock($lock_file, LOCK_EX)) {
return $lock_file;
} else {
//give time for other process to release lock
usleep(100000); //0.1 seconds
}
}
//This is only reached if all attempts fail.
//Error code here for dealing with that eventuality.
}
function release_file_lock($lock_file) {
flock($lock_file, LOCK_UN);
fclose($lock_file);
}
This should prevent a concurrently-running script reading old information and updating that, causing you to lose information that another script has updated after you read the file. It will allow only one instance of the script to read the file and then overwrite it with updated information.
While this hopefully answers the original question, it doesn't give a good solution to making sure all concurrent scripts have the ability to record their information eventually.

PHP script that sends an email listing file changes that have happened in a directory/subdirectories

I have a directory with a number of subdirectories that users add files to via FTP. I'm trying to develop a php script (which I will run as a cron job) that will check the directory and its subdirectories for any changes in the files, file sizes or dates modified. I've searched long and hard and have so far only found one script that works, which I've tried to modify - original located here - however it only seems to send the first email notification showing me what is listed in the directories. It also creates a text file of the directory and subdirectory contents, but when the script runs a second time it seems to fall over, and I get an email with no contents.
Anyone out there know a simple way of doing this in php? The script I found is pretty complex and I've tried for hours to debug it with no success.
Thanks in advance!
Here you go:
$log = '/path/to/your/log.js';
$path = '/path/to/your/dir/with/files/';
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path), RecursiveIteratorIterator::SELF_FIRST);
$result = array();
foreach ($files as $file)
{
if (is_file($file = strval($file)) === true)
{
$result[$file] = sprintf('%u|%u', filesize($file), filemtime($file));
}
}
if (is_file($log) !== true)
{
file_put_contents($log, json_encode($result), LOCK_EX);
}
// are there any differences?
if (count($diff = array_diff($result, json_decode(file_get_contents($log), true))) > 0)
{
// send email with mail(), SwiftMailer, PHPMailer, ...
$email = 'The following files have changed:' . "\n" . implode("\n", array_keys($diff));
// update the log file with the new file info
file_put_contents($log, json_encode($result), LOCK_EX);
}
I am assuming you know how to send an e-mail. Also, please keep in mind that the $log file should be kept outside the $path you want to monitor, for obvious reasons of course.
After reading your question a second time, I noticed that you mentioned you want to check if the files change, I'm only doing this check with the size and date of modification, if you really want to check if the file contents are different I suggest you use a hash of the file, so this:
$result[$file] = sprintf('%u|%u', filesize($file), filemtime($file));
Becomes this:
$result[$file] = sprintf('%u|%u|%s', filesize($file), filemtime($file), md5_file($file));
// or
$result[$file] = sprintf('%u|%u|%s', filesize($file), filemtime($file), sha1_file($file));
But bare in mind that this will be much more expensive since the hash functions have to open and read all the contents of your 1-5 MB CSV files.
I like sfFinder so much that I wrote my own adaption:
http://www.symfony-project.org/cookbook/1_0/en/finder
https://github.com/homer6/altumo/blob/master/source/php/Utils/Finder.php
Simple to use, works well.
However, for your use, depending on the size of the files, I'd put everything in a git repository. It's easy to track then.
HTH

Unique and temporary file names in PHP?

I need to convert some files to PDF and then attach them to an email. I'm using Pear Mail for the email side of it and that's fine (mostly--still working out some issues) but as part of this I need to create temporary files. Now I could use the tempnam() function but it sounds like it creates a file on the filesystem, which isn't what I want.
I just want a name in the temporary file system (using sys_get_temp_dir()) that won't clash with someone else running the same script of the same user invoking the script more than once.
Suggestions?
I've used uniqid() in the past to generate a unique filename, but not actually create the file.
$filename = uniqid(rand(), true) . '.pdf';
The first parameter can be anything you want, but I used rand() here to make it even a bit more random. Using a set prefix, you could further avoid collisions with other temp files in the system.
$filename = uniqid('MyApp', true) . '.pdf';
From there, you just create the file. If all else fails, put it in a while loop and keep generating it until you get one that works.
while (true) {
$filename = uniqid('MyApp', true) . '.pdf';
if (!file_exists(sys_get_temp_dir() . $filename)) break;
}
Seriously, use tempnam(). Yes, this creates the file, but this is a very intentional security measure designed to prevent another process on your system from "stealing" your filename and causing your process to overwrite files you don't want.
I.e., consider this sequence:
You generate a random name.
You check the file system to make sure it doesn't exist. If it does, repeat the previous step.
Another, evil, process creates a file with the same name as a hard link to a file Mr Evil wants you to accidentally overwrite.
You open the file, thinking you're creating the file rather than opening an existing one in write mode and you start writing to it.
You just overwrote something important.
PHP's tempnam() actually calls the system's mkstemp under the hood (that's for Linux... substitute the "best practice" function for other OSs), which goes through a process like this:
Pick a filename
Create the file with restrictive permissions, inside a directory that prevents others from removing files it doesn't own (that's what the sticky-bit does on /var/tmp and /tmp)
Confirms that the file created still has the restrictive permissions.
If any of the above fails, try again with a different name.
Returns the filename created.
Now, you can do all of those things yourself, but why, when "the proper function" does everything that's required to create secure temporary files, and that almost always involves creating an empty file for you.
Exceptions:
You're creating a temporary file in a directory that only your process can create/delete files in.
Create a randomly generated temporary directory, which only your process can create/delete files in.
Another alternative based on #Lusid answer with a failover of max execution time:
// Max exectution time of 10 seconds.
$maxExecTime = time() + 10;
$isUnique = false;
while (time() !== $maxExecTime) {
// Unique file name
$uniqueFileName = uniqid(mt_rand(), true) . '.pdf';
if (!file_exists(sys_get_temp_dir() . $uniqueFileName)){
$isUnique = true;
break;
}
}
if($isUnique){
// Save your file with your unique name
}else{
// Time limit was exceeded without finding a unique name
}
Note:
I prefer to use mt_rand instead of rand because the first function use Mersenne Twister algorithm and it's faster than the second (LCG).
More info:
http://php.net/manual/en/function.uniqid.php
http://php.net/manual/en/function.mt-rand.php
http://php.net/manual/en/function.time.php
Consider using an uuid for the filename. Consider the uniqid function.
http://php.net/uniqid
You could use part of the date and time in order to create a unique file name, that way it isn't duplicated when invoked more than once.
I recomend you to use the PHP function
http://www.php.net/tempnam
$file=tempnam('tmpdownload', 'Ergebnis_'.date(Y.m.d).'_').'.pdf';
echo $file;
/var/www/html/tmpdownload/Ergebnis_20071004_Xbn6PY.pdf
Or
http://www.php.net/tmpfile
<?php
$temp = tmpfile();
fwrite($temp, "writing to tempfile");
fseek($temp, 0);
echo fread($temp, 1024);
fclose($temp); // this removes the file
?>
Better use Unix timestamp with the user id.
$filename='file_'.time().'_'.$id.'.jepg';
My idea is to use a recursive function to see if the filename exists, and if it does, iterate to the next integer:
function check_revision($filename, $rev){
$new_filename = $filename."-".$rev.".csv";
if(file_exists($new_filename)){
$new_filename = check_revision($filename, $rev++);
}
return $new_filename;
}
$revision = 1;
$filename = "whatever";
$filename = check_revision($filename, $revision);
function gen_filename($dir) {
if (!#is_dir($dir)) {
#mkdir($dir, 0777, true);
}
$filename = uniqid('MyApp.', true).".pdf";
if (#is_file($dir."/".$filename)) {
return $this->gen_filename($dir);
}
return $filename;
}
Update 2020
hash_file('md5', $file_pathname)
This way you will prevent duplications.

Categories