I have a daemon that opens a file and writes to it throughout operation (typically for many days at a time). In order to support log rotation, I want to be able to identify when the file the handle refers to is in a new location from the original.
Is this possible? fstat() doesn't give me anything useful for this situation.
My current solution is, in the log-writing function, testing the existence of the log file and if it's not there, closing the old handle and opening a new handle. This works, but is a hack and has limitations. In my case, our systems group uses a tool for log rotation that requires them to touch the file after rotating it out, which causes my daemon to continue thinking that its file handle points to the correct place.
Here's a thought. It's not portable, I'm not totally sure if it works or is reliable, and it makes me cringe a little, but you can probably use readlink on /proc/%d/fd/%d, where the first %d is the result of getpid(), and the second is your file descriptor.
There are some caveats here, though. First, the whole "get path + do something with that path" approach will have a race condition in the face of a rename happening concurrently. Also, your log file could have other links. I'm not sure what the behavior is for the links in /proc in the face of a rename, either.
You can simply periodically re-acquire your file handle (with mode a), for example every 24 hours. That allows you to continue logging despite the presence of the moronic and buggy(since there is an inevitable race condition between renaming the file and re-touching it) log rotation utility.
fstat gives you an inode number, which will change when the log is rotated.
See http://php.net/manual/en/function.fstat.php and http://www.php.net/manual/en/function.lstat.php
You can compare the inode number from fstat with the inode number from lstat; if they are different, reopen.
The standard way of handling this for Unix daemons in the past has been to catch SIGHUP and use it as a signal to reopen the log file, and have the log rotation script send SIGHUP.
Related
I have a cron script that compresses images. It basically iterates over folders and then compresses the files in the folder. My problem is that some images are getting processed halfway. My theory is that users are uploading a image, and before the image has finished uploading the file, the compressor tries to compress the file. Thus compressing a half-uploaded image, and resulting in half an image being displayed.
Is there a way in PHP to confirm that a file has finished uploading? So that I can only do the compression once i know the file has been fully written?
Or alternatively, is there a way to check if a file is being used by another process?
Or alternatively, would it be reliable enough to look at when the file was "written to disk" and not process it until 10 minutes has gone by?
PHP doesn't trigger your action until the files are fully uploaded, but it is possible for your cron job to start interacting with files before they're fully saved.
When saving something from $_FILES, save it to a version with a . prefix on it to tag it as incomplete. Make sure your cron job skips any such files.
Then, once the save operation is complete, rename the file without the . prefix to make it eligible for processing.
There are two ways to handle the scenario
Flags
Set flag that files before modify/write it.
Our App handles lots of files, we set flags before taking them to process once it's done we remove the flag, as it runs on cron flag is the best way to process files.
Usually, you can an extra column on the table on each file. or you can have an array where you can store all currently handling files.
filemtime()
As you mentioned you can check like if file mtime is more than 10 min that current time, then you compress them but if some other processes are using the file opened the at the same time. it causes the problem again.
So its better to go with flag. If other processes never modify the files often.
You can use flock to ensure file is not in use, see here for example. Alternatively you can check whether an image is broken or corrupted see here.
I want to update a file while other processes may be using reading it. PHP flock() function allows exactly to do that.
However as I see the flock only takes a file handle .. that generally comes from fopen. If I want to do this efficiently, I don't want to keep the file open and write it, because file is coming over the network and the write operation may span to a few seconds (say 2-3 seconds).
So I was hoping if could write the data to temp file and then move it. In that case readers of the file will only be disturbed when I am renaming it.
Now writing data to temp will not require me to use flock. However how can I move tempfile to actual file correctly using locking.
I also wonder if I would actually need locking in the first place .. wouldn't the move operation will be very quick? Would it hurt simultaineous file reads. And I expect there will be 100s of reads but just one update, and that update will happen once every hour
Rename is atomic in POSIX systems, so you don't need flock. Readers that have already opened the file will be undisturbed. (Justification: An open file handle points to the inode, not to the directory entry. Rename changes just the directory entry.)
However, readers must close and reopen the file to get the new content. If readers keep the file open, they will be able to reread the old content.
I am looking for a solution I need to delete log files, but there might be a possibility that they are being accessed at the moment the delete call is made. By being accessed, I mean a process is either reading or writing to the file. In such cases, I need to skip the file instead of deleting it. Also my server is Linux and PHP is running on Apache.
What I am looking for is something similar to (in pseudo-code):
<?php
$path = "path_to_log_file";
$log_file = "app.log";
if(!being_accessed($log_file))
{
unlink($path.$log_file);
}
?>
Now my question is how can I define being_accessed? I know there might not be a language function do to this directly in PHP. I am thinking about using a combination of sections like last_access_time (maybe?) and flock (but this is useful only in those conditions where the file was flock-ed by the accessing application)
Any suggestions/insights welcome...
In general you will not be able to find that out without having administration rights (and i.e. be able to run tools like lsof to see if file of your is listed. But if your scripts are running on linux/unix server (which is the case for most hosters) then you do not need to bother, because filesystem will take care of this. So for example, you got 1GB file and someone is downloading this file. It is safe for you to delete the file (with unlink() or any other way) event if that downloader just started and it will not interfere his downloading, because filesystem knows that file is already open (some processes holds a handle) so it will only mark it, let say invisible for others (so once you try to list folder content you will no longer see that file, but if your file is big enough you could try to check available disk space (i.e. with df, to see it would still be occupied)) but those how kept the handle will still be able to use it. Once all processes close their handle file will be physically removed from media and disk space freed. So just unlink when needed. If you bother about warning unlink may throw (which may be a case on Windows), then just prepend your call with # mark (#unlink()) to disable any warning this call may throw in runtime
You'd simply change your code this way (if you are doing it repetitively):
<?php
$path = "path_to_log_file";
$log_file = "app.log";
#unlink($path.$log_file);
Notice the # to avoid getting an error in case the file is not deletable, and the lack of ending tag (ending tags are source of common errors and should be avoided)
I have a PHP script that opens a local directory in order to copy and process some files. But these files may be incomplete, because they are being uploaded by a slow FTP process, and I do not want to copy or process any files which have not been completely uploaded yet.
Is is possible in PHP to find out if a file is still being copied (that is, read from), or written to?
I need my script to process only those files that have been completely uploaded.
The ftp process now, upload files in parallel, and it take more than 1 second for each filesize to change, so this trick is not working for me now, any other method suggests
Do you have script control over the FTP process? If so, have the script that's doing the uploading upload a [FILENAME].complete file (blank text file) after the primary upload completes, so the processing script knows that the file is complete if there's a matching *.complete file there also.
+1 to #MidnightLightning for his excellent suggestion. If you don't have control over the process you have a couple of options:
If you know what the final size of the file should be then use filesize() to compare the current size to the known size. Keep checking until they match.
If you don't know what the final size should be it gets a little trickier. You could use filesize() to check the size of the file, wait a second or two and check it again. If the size hasn't changed then the upload should be complete. The problem with the second method is if your file upload stalls for whatever reason it could give you a false positive. So the time to wait is key.
You don't specify what kind of OS you're on, but if it's a Unix-type box, you should have fuser and/or lsof available. fuser will report on who's using a particular file, and lsof will list all open files (including sockets, fifos, .so's, etc...). Either of those could most likely be used to monitor your directory.
On the windows end, there's a few free tools from Sysinternals that do the same thing. handle might do the trick
I have to write script in PHP which will be dynamicly replace some files on server from time to time. Easy thing, but the problem is that I want to avoid situation when user request this file during replacing. Then he could get uncompleted file or even error.
Best solution to me is block access to my site during replacing by e.g. setting .htaccess redirecting all requests to page with information about short break. But normally .htaccess file already exist, so there may be situation when server gets uncomplited .htaccess file.
Is there any way to solve it?
Edit: Thank you so much for all answers, guys. You are briliant.
#ircmaxell Your idea sounds great for me. I read what dudes from PHP.net wrote and I don't know if I understand all correctly.
So, tell me: If I do all steps you wrote and add apc.file_update_protection to my php.ini, there will be no way to get uncompleted file by user by any time? There will be always one, correct file? Are you sure at 100% ?
It is very important to me coz these replacements will be very often and there is big chance to request file during renaming.
Here's something that's easy, and will work on any local filesystem on linux:
Upload (or write) the file to a temporary filename
Move the file (using the mv (move) command, either in FTP, or command line, etc, or the rename command in PHP) to overwrite the existing one.
When you execute the mv command, it basically deletes the old file pointer, and writes the new one. Since it's being done at the filesystem level, it's an atomic operation. So the client can't get an old file...
APC recommends doing this to prevent these very issues from cropping up...
Also note that you could use rsync to do it as well (since it basically does this behind the scenes)...
Doesn't this work already? I never tested for this specifically but I've done what you're doing and that problem never showed up.
It seems like an easy thing for an operating system to
Upload / write to a temporary file
When writing is done, block access to the original file (make the request for the file wait)
Delete the file, rename the temporary one and remove any locks
I'm fairly sure this is what an OS should do for copying. If you're writing the file contents yourself with PHP you'll just have to do this yourself...
Try railless Capistrano or a method they use:
in a directory you have two things:
A folder containering folders, each subfolder is a release
A soft link to the current release folder
When you upload the new file, do the upload making a new release folder. Check to see that no one is currently running the current release (this might be a little tricky assuming you dont have a crazy number of users you could probably do it with a db entry) and then rewrite the softlink to point to the newest release.
maybe do try it like this:
delete file and save it's path
ln -nfs movedfilepath pathtosorrypage.html
upload file to some temporary folder on the server
remove symlink
mv newfile movedfilepath
Option 1: If you have a lot of users and this replacing is done not so frequent, you can set up a maintenance on the site (block access) and have no one log in after a certain time, and finally cut off everyone who is logged in when you're about to do the replacement.
Option 2: If the file replacing is done frequently (in which case you shouldn't do the maintenance every day), have it done by code. Have two of the same files (same folder if you want). Then, by code, when you're about to replace the file, have it just give the copy, while you replace the one you want. You can do it with a simple IF.
Pseudo-code:
if (replaceTime - 15 seconds <= currentTime <= replaceTime + 15 seconds){
// allows 30 seconds for another script to bring in the new image into 'myImage.jpg'
<img src="/myFiles/myOldImage.jpg" />
} else {
<img src="/myFiles/myImage.jpg" />
}
No need to update any database or manually move/copy/rename a file.
After replaceTime + 15 has passed:
copyFileTo("myImage.jpg","myOldImage.jpg");
// Now you have the copy ready for the next time to replace