PHP: Safe way to remove a directory? - php

Consider this code:
public static function removeDir($src)
{
if (is_dir($src)) {
$dir = #opendir($src);
if ($dir === false)
return;
while(($file = readdir($dir)) !== false) {
if ($file != '.' && $file != '..') {
$path = $src . DIRECTORY_SEPARATOR . $file;
if (is_dir($path)) {
self::removeDir($path);
} else {
#unlink($path);
}
}
}
closedir($dir);
#rmdir($src);
}
}
This will remove a directory. But if unlink fails or opendir fails on any subdirectory, the directory will be left with some content.
I want either everything deleted, or nothing deleted. I'm thinking of copying the directory before removal and if anything fails, restoring the copy. But maybe there's a better way - like locking the files or something similar?

I general I would confirm the comment:
"Copy it, delete it, copy back if deleted else throw deleting message fail..." – We0
However let's take some side considerations:
Trying to implement a transaction save file deletion indicates that you want to allow competing file locks on the same set of files. Transaction handling is usually the most 'expensive' way to ensure consistency. This holds true even if php would have any kind of testdelete available, because you would need to testdelete everything in a first run and then do a second loop which costs time (and where you are in danger that something changed on your file system in the meanwhile). There are other options:
Try to isolate what really needs to be transaction save and handle those data accesses in databases. Eg. MySQL/InnoDB supports all the nitty gritty details of transaction handling
Define and implement dedicated 'write/lock ownership'. So you have folders A and B with sub items and your php is allowed to lock files in A and some other process is allowed to lock files in B. Both your php and the other process are allowed to read A and B. This gets tricky on files, because a file read causes a lock as well which lasts the longer the bigger the files are. So on file basis you probably need to enrich this with file size limits, torance periods and so on.
Define and implement dedicated access time frames. Eg. All files can be used within the week, but you have a maintenance time frame at sunday night which can also run deletions and therefore requires lock free environments.
Right, let's say my reasoning was not frightening enough :) - and you implement a transaction save file deletion anyway - your routine can be implemented this way:
backup all files
if the backup fails you could try a second, third, fourth time (this is an implementation decision)
if there is no successful backup, full stop
run your deletion process, two implementation options (in any way you need to log the files you deleted successfully):
always run through fully, and document all errors (this can be returned to the user later on as homework task list, however potentially runs long)
run through and stop at the first error
if the deletion was the successful, all fine/full stop; if not proceed with rolling back
copy back only previously successful deleted file from the archive (ONLY THEM!)
Wipe out your backup
This then is only transaction save on file level. It does NOT handle the case where somebody changes permissions on folders in between step 5 and 6.

Or you could try to just rename/move the directory to something like /tmp/, it succeeds or doesnt - but the files are not gone. Even if another process would have an open handle, the move should be ok. The files will be gone some time later when the tmp folder is emptied.

Related

flock() between PHP and C edge case

I have a PHP script which receives and saves invoices as files in Linux. Later, a C++ infinite loop based program reads each and does some processing. I want the latter to read each file safely (only after fully written).
PHP side code simplification:
file_put_contents("sampleDir/invoice.xml", "contents", LOCK_EX)
On the C++ side (with C filesystem API), I must first note that I want to preserve a code which deletes the files in the designated invoices folder which are empty, just as a means to properly deal with the edge case of an empty file being created from other sources (not the PHP script).
Now, here's a C++ side code simplification, too:
FILE* pInvoiceFile = fopen("sampleDir/invoice.xml", "r");
if (pInvoiceFile != NULL)
{
if (flock(pInvoiceFile->_fileno, LOCK_SH) == 0)
{
struct stat fileStat;
fstat(pInvoiceFile->_fileno, &fileStat);
string invoice;
invoice.resize(fileStat.st_size);
if (fread((char*)invoice.data(), 1, fileStat.st_size, pInvoiceFile) < 1)
{
remove("sampleDir/invoice.xml"); // Edge case resolution
}
flock(pInvoiceFile->_fileno, LOCK_UN);
}
}
fclose(pInvoiceFile);
As you can see, the summarizing key concept is the cooperation of LOCK_EX and LOCK_SH flags.
My problem is that, while this integration has been working fine, yesterday I noticed the edge case executed for an invoice which should not be empty, and thus it got deleted by the C++ program.
PHP manual on file_put_contents mentions the following for the LOCK_EX flag:
Acquire an exclusive lock on the file while proceeding to the writing. In other words, a flock() call happens between the fopen() call and the fwrite() call. This is not identical to an fopen() call with mode "x".
Could the problem be caused as a race condition by the LOCK_EX not being established right before file_put_contents calls fopen? If so, what could be done to solve this while keeping the edge case removal code?
Otherwise, may I be doing anything wrong overall?
Your code is assuming that the file_put_contents() operation is atomic, and that using FLOCK_EX and FLOCK_SH is enough to ensure no race conditions between the two programs happen. This is not the case.
As you can see from the PHP doc, the FLOCK_EX is applied after opening the file. This is important, because it leaves a short window of time for the C++ program to successfully open the file and lock it with FLOCK_SH. At that point the file was already truncated by the fopen() done by PHP, and it's empty.
What's most likely happening is:
PHP code opens the file for writing, truncating it and effectively wiping out its content.
C++ code opens the file for reading.
C++ code requests the shared lock on the file: the lock is granted.
PHP code requests the exclusive lock on the file: the call blocks, waiting for the lock to be available.
C++ code reads the file's contents: nothing, the file is empty.
C++ code deletes the file.
C++ code releases the shared lock.
PHP code acquires the exclusive lock.
PHP code writes to the file: the data does not reach the disk because the inode associated with the open file descriptor does not exist anymore.
You are effectively left with no file and the data is lost.
The problem with your code is that the operations you are doing on the file from two different programs are not atomic, and the way you are acquiring the locks does not help in ensuring that those don't overlap.
The only sane way of guaranteeing the atomicity of such an operation on a POSIX compliant system, without even worrying about file locking, is to take advantage of the atomicity of rename(2):
If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing.
If newpath exists but the operation fails for some reason, rename() guarantees to leave an instance of newpath in place.
The equivalent rename() PHP function is what you should use in this case. It's the simplest way to guarantee atomic updates to a file.
What I would suggest is the following:
PHP code:
$tmpfname = tempnam("/tmp", "myprefix"); // Create a temporary file.
file_put_contents($tmpfname, "contents"); // Write to the temporary file.
rename($tmpfname, "sampleDir/invoice.xml"); // Atomically replace the contents of invoice.xml by renaming the file.
// TODO: check for errors in all the above calls, most importantly tempnam().
C++ code:
FILE* pInvoiceFile = fopen("sampleDir/invoice.xml", "r");
if (pInvoiceFile != NULL)
{
struct stat fileStat;
fstat(fileno(pInvoiceFile), &fileStat);
string invoice;
invoice.resize(fileStat.st_size);
size_t n = fread(&invoice[0], 1, fileStat.st_size, pInvoiceFile);
fclose(pInvoiceFile);
if (n == 0)
remove("sampleDir/invoice.xml");
}
This way, the C++ program will always either see the old version of the file (if fopen() happens before PHP's rename()) or the new version of the file (if fopen() happens after), but it will never see an inconsistent version of the file.

Using rmdir() and RemoveDirectory() can a deleted file be recovered?

If you use a code like below to delete a file that was created by your code, where does that file get deleted too? Can the file be recovered?
function RemoveDirectory($path){
foreach(glob("{$path}/*") as $file)
{
if(is_dir($file)) {
RemoveDirectory($file);
} else {
unlink($file);
}
}
rmdir($path);
}
Let's just say I called this directory at the wrong time in the code, and I have regrets.
The file gets deleted from your hard drive. It doesn't get removed to a "Recycle Bin". It gets completely removed. To recover the file after that, you will need some sort of undelete software, which may or may not work, depending on whether or not you have overwritten those sectors of the hard drive with other files since the delete took place. If you accidentally delete a file, remove the drive immediately and boot from a different drive to prevent further writes from occurring.

creating only new files in PHP without cpu intensive code

In my cache system, I want it where if a new page is requested, a check is made to see if a file exists and if it doesn't then a copy is stored on the server, If it does exist, then it must not be overwritten.
The problem I have is that I may be using functions designed to be slow.
This is part of my current implementation to save files:
if (!file_exists($filename)){$h=fopen($filename,"wb");if ($h){fwrite($h,$c);fclose($h);}}
This is part of my implementation to load files:
if (($m=#filemtime($file)) !== false){
if ($m >= filemtime("sitemodification.file")){
$outp=file_get_contents($file);
header("Content-length:".strlen($outp),true);echo $outp;flush();exit();
}
}
What I want to do is replace this with a better set of functions meant for performance and yet still achieve the same functionality. All caching files including sitemodification.file reside on a ramdisk. I added a flush before exit in hopes that content will be outputted faster.
I can't use direct memory addressing at this time because the file sizes to be stored are all different.
Is there a set of functions I can use that can execute the code I provided faster by at least a few milliseconds, especially the loading files code?
I'm trying to keep my time to first byte low.
First, prefer is_file to file_exists and use file_put_contents:
if ( !is_file($filename) ) {
file_put_contents($filename,$c);
}
Then, use the proper function for this kind of work, readfile:
if ( ($m = #filemtime($file)) !== false && $m >= filemtime('sitemodification.file')) {
header('Content-length:'.filesize($file));
readfile($file);
}
}
You should see a little improvement but keep in mind that file accesses are slow and you check three times for files access before sending any content.

Is there a way to detect changes in a folder using php on both windows and linux?

I'm looking for a solution to detect changes in folder(s) using php. The application may run on both platforms(linux and windows). I may use different methods for each platform as long as results are the same.
What I desire is :
If a file/folder is added to a directory, I want my app to detect this new file and read its attributes (size,filetime etc)
If a existing file/folder is saved/contents changed/deleted, I need to detect this file is changed
It would be better if I can monitor a base folder outside webroot of apache (such as c:\tmp, or d:\music on windows or /home/ertunc on linux)
I read something on inotify but I'm not sure it meets my needs.
Monitoring the filesystem for changes is a task that should be solved outside PHP. It's not really built to do stuff like this.
There are ready-made tools on both platforms that can monitor file changes that could call a PHP file to do the further processing.
For Linux:
How to efficiently monitor a directory for changes on linux? (looks best at a quick glance)
Monitor Directory for Changes
For Windows:
How to monitor a folder for changes, and execute a command if it does, on Windows?
The second answer in How can I monitor a Windows directory for changes?
So if you are checking compared to the last time you checked rather than just being updated as soon as it changes you could do the following.
You could create an MD5 of a directory, storew this MD5 then compare the new MD5 with the old to see if things have changed.
The function below taken from http://php.net/manual/en/function.md5-file.php would do this for you.
function MD5_DIR($dir)
{
if (!is_dir($dir))
{
return false;
}
$filemd5s = array();
$d = dir($dir);
while (false !== ($entry = $d->read()))
{
if ($entry != '.' && $entry != '..')
{
if (is_dir($dir.'/'.$entry))
{
$filemd5s[] = MD5_DIR($dir.'/'.$entry);
}
else
{
$filemd5s[] = md5_file($dir.'/'.$entry);
}
}
}
$d->close();
return md5(implode('', $filemd5s));
}
This is rather inefficient though, since as you probably know, there is no point checking the entire contents of a directory if the first bit is different.
I would
scan all folders/files and create an array of them,
save this somewhere
run this scan again later [to check if the array still looks the same].
As you have the entire data structure from "time 1" and "now", you can clearly see what has changed. To crawl through the directories, check this: http://www.evoluted.net/thinktank/web-development/php-directory-listing-script and this http://phpmaster.com/list-files-and-directories-with-php/

What's the best way to read from and then overwrite file contents in php?

What's the cleanest way in php to open a file, read the contents, and subsequently overwrite the file's contents with some output based on the original contents? Specifically, I'm trying to open a file populated with a list of items (separated by newlines), process/add items to the list, remove the oldest N entries from the list, and finally write the list back into the file.
fopen(<path>, 'a+')
flock(<handle>, LOCK_EX)
fread(<handle>, filesize(<path>))
// process contents and remove old entries
fwrite(<handle>, <contents>)
flock(<handle>, LOCK_UN)
fclose(<handle>)
Note that I need to lock the file with flock() in order to protect it across multiple page requests. Will the 'w+' flag when fopen()ing do the trick? The php manual states that it will truncate the file to zero length, so it seems that may prevent me from reading the file's current contents.
If the file isn't overly large (that is, you can be confident loading it won't blow PHP's memory limit), then the easiest way to go is to just read the entire file into a string (file_get_contents()), process the string, and write the result back to the file (file_put_contents()). This approach has two problems:
If the file is too large (say, tens or hundreds of megabytes), or the processing is memory-hungry, you're going to run out of memory (even more so when you have multiple instances of the thing running).
The operation is destructive; when the saving fails halfway through, you lose all your original data.
If any of these is a concern, plan B is to process the file and at the same time write to a temporary file; after successful completion, close both files, rename (or delete) the original file and then rename the temporary file to the original filename.
Read
$data = file_get_contents($filename);
Write
file_put_contents($filename, $data);
One solution is to use a separate lock file to control access.
This solution assumes that only your script, or scripts you have access to, will want to write to the file. This is because the scripts will need to know to check a separate file for access.
$file_lock = obtain_file_lock();
if ($file_lock) {
$old_information = file_get_contents('/path/to/main/file');
$new_information = update_information_somehow($old_information);
file_put_contents('/path/to/main/file', $new_information);
release_file_lock($file_lock);
}
function obtain_file_lock() {
$attempts = 10;
// There are probably better ways of dealing with waiting for a file
// lock but this shows the principle of dealing with the original
// question.
for ($ii = 0; $ii < $attempts; $ii++) {
$lock_file = fopen('/path/to/lock/file', 'r'); //only need read access
if (flock($lock_file, LOCK_EX)) {
return $lock_file;
} else {
//give time for other process to release lock
usleep(100000); //0.1 seconds
}
}
//This is only reached if all attempts fail.
//Error code here for dealing with that eventuality.
}
function release_file_lock($lock_file) {
flock($lock_file, LOCK_UN);
fclose($lock_file);
}
This should prevent a concurrently-running script reading old information and updating that, causing you to lose information that another script has updated after you read the file. It will allow only one instance of the script to read the file and then overwrite it with updated information.
While this hopefully answers the original question, it doesn't give a good solution to making sure all concurrent scripts have the ability to record their information eventually.

Categories