I have a PHP script that opens a local directory in order to copy and process some files. But these files may be incomplete, because they are being uploaded by a slow FTP process, and I do not want to copy or process any files which have not been completely uploaded yet.
Is is possible in PHP to find out if a file is still being copied (that is, read from), or written to?
I need my script to process only those files that have been completely uploaded.
The ftp process now, upload files in parallel, and it take more than 1 second for each filesize to change, so this trick is not working for me now, any other method suggests
Do you have script control over the FTP process? If so, have the script that's doing the uploading upload a [FILENAME].complete file (blank text file) after the primary upload completes, so the processing script knows that the file is complete if there's a matching *.complete file there also.
+1 to #MidnightLightning for his excellent suggestion. If you don't have control over the process you have a couple of options:
If you know what the final size of the file should be then use filesize() to compare the current size to the known size. Keep checking until they match.
If you don't know what the final size should be it gets a little trickier. You could use filesize() to check the size of the file, wait a second or two and check it again. If the size hasn't changed then the upload should be complete. The problem with the second method is if your file upload stalls for whatever reason it could give you a false positive. So the time to wait is key.
You don't specify what kind of OS you're on, but if it's a Unix-type box, you should have fuser and/or lsof available. fuser will report on who's using a particular file, and lsof will list all open files (including sockets, fifos, .so's, etc...). Either of those could most likely be used to monitor your directory.
On the windows end, there's a few free tools from Sysinternals that do the same thing. handle might do the trick
Related
So as described in question itself, I want to replace the file from which a zip archive is opened and then which is overwriting files with new version.
If still my question is not clear then the thing I want to do is I want to get a zip file from a server and then unzip using CLASS "ZipArchive" and then over write everyfile which is in Zip to destination location, the problem will be the php file by which this thing is happening will gonna be overwritten.
So will the php generate the error or the process will go whatever we want?
On Linux files are not usually locked (see https://unix.stackexchange.com/questions/147392/what-is-advisory-locking-on-files-that-unix-systems-typically-employs) so you can do whatever you want with that file. PHP works with that file in memory so you can overwrite it during it's execution.
But if you will run the script multiple times while the first one is in progress it might load incomplete version and then it will throw some error so it might be wise to make sure that won't happen (using locks) or try to do some more atomic approach.
Windows locks files so I assume you won't be able to extract files the same way there.
I have a cron script that compresses images. It basically iterates over folders and then compresses the files in the folder. My problem is that some images are getting processed halfway. My theory is that users are uploading a image, and before the image has finished uploading the file, the compressor tries to compress the file. Thus compressing a half-uploaded image, and resulting in half an image being displayed.
Is there a way in PHP to confirm that a file has finished uploading? So that I can only do the compression once i know the file has been fully written?
Or alternatively, is there a way to check if a file is being used by another process?
Or alternatively, would it be reliable enough to look at when the file was "written to disk" and not process it until 10 minutes has gone by?
PHP doesn't trigger your action until the files are fully uploaded, but it is possible for your cron job to start interacting with files before they're fully saved.
When saving something from $_FILES, save it to a version with a . prefix on it to tag it as incomplete. Make sure your cron job skips any such files.
Then, once the save operation is complete, rename the file without the . prefix to make it eligible for processing.
There are two ways to handle the scenario
Flags
Set flag that files before modify/write it.
Our App handles lots of files, we set flags before taking them to process once it's done we remove the flag, as it runs on cron flag is the best way to process files.
Usually, you can an extra column on the table on each file. or you can have an array where you can store all currently handling files.
filemtime()
As you mentioned you can check like if file mtime is more than 10 min that current time, then you compress them but if some other processes are using the file opened the at the same time. it causes the problem again.
So its better to go with flag. If other processes never modify the files often.
You can use flock to ensure file is not in use, see here for example. Alternatively you can check whether an image is broken or corrupted see here.
My application is keeping watch on a set of folders where users can upload files. When a file upload is finished I have to apply a treatment, but I don't know how to detect that a file has not finish to upload.
Any way to detect if a file is not released yet by the FTP server?
There's no generic solution to this problem.
Some FTP servers lock the file being uploaded, preventing you from accessing it, while the file is still being uploaded. For example IIS FTP server does that. Most other FTP servers do not. See my answer at Prevent file from being accessed as it's being uploaded.
There are some common workarounds to the problem (originally posted in SFTP file lock mechanism, but relevant for the FTP too):
You can have the client upload a "done" file once the upload finishes. Make your automated system wait for the "done" file to appear.
You can have a dedicated "upload" folder and have the client (atomically) move the uploaded file to a "done" folder. Make your automated system look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the client (atomically) rename the file after upload to its final name. Make your automated system ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for an example of implementing this approach.
Also, some FTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive.
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, that the file is incomplete.
Some FTP servers allow you to configure a hook to be called, when an upload is finished. You can make use of that. For example ProFTPD has a mod_exec module (see the ExecOnCommand directive).
I use ftputil to implement this work-around:
connect to ftp server
list all files of the directory
call stat() on each file
wait N seconds
For each file: call stat() again. If result is different, then skip this file, since it was modified during the last seconds.
If stat() result is not different, then download the file.
This whole ftp-fetching is old and obsolete technology. I hope that the customer will use a modern http API the next time :-)
If you are reading files of particular extensions, then use WINSCP for File Transfer. It will create a temporary file with extension .filepart and it will turn to the actual file extension once it fully transfer the file.
I hope, it will help someone.
This is a classic problem with FTP transfers. The only mostly reliable method I've found is to send a file, then send a second short "marker" file just to tell the recipient the transfer of the first is complete. You can use a file naming convention and just check for existence of the second file.
You might get fancy and make the content of the second file a checksum of the first file. Then you could verify the first file. (You don't have the problem with the second file because you just wait until file size = checksum size).
And of course this only works if you can get the sender to send a second file.
We have a FreeBSD server with samba that employees copy image files on to, which then get uploaded to our web servers (this way they don't have to mess with ftp). Sometimes, if the upload script is running at the same time as files are being copied, it can upload an incomplete file.
We fixed this by getting the list of files along with the file sizes, then waiting 5 seconds and rechecking the file sizes. If the sizes match then its save to upload, if they don't match it checks again in another 5 seconds.
This seems like an odd way to check if the files are being written to. is there a better, more simple way of doing this?
Use a flock function http://php.net/flock - when writing a file obtain an exclusive lock flock($handle, LOCK_EX), after it is written release the lock flock($handle, LOCK_UN).
The upload script could try to obtain the exclusive writing lock too, if it succeeds it is Okay to move the file, otherwise no.
EDIT: Sorry, I forgot about the users copying the files to the server through samba... So there is no space to use flock while copying... But the upload script could still use flock($handle, LOCK_EX) to see, if it is successful or not.
I recommend to shell_exec() smbstatus(1), e.g. smbstatus -LB to check for locked files
Write a script to copy the files to a temp folder on the Samba server and then, when fully copied and flushed, move, (ie, unlink/link, not copy again), them to the upload folder.
I have a directory on a remote machine in which my clients are uploading (via different tools and protocols, from WebDav to FTP) files. I also have a PHP script that returns the directory structure. Now, the problem is, if a client uploads a large file, and I make a request during the uploading time, the PHP script will return the file even if it's not completely uploaded. Is there a way to check whether a file is completely uploaded using PHP?
Setup your remote server to move uploaded files to another directory, and only query the directory files are moved to for files.
AFAIK, there is no way (at least cross-machine) to tell if a file is still being uploaded, without doing something like:
Query the file's length
Wait a few seconds
Query the file's length
If it's the same, its possibly completed
Most UNIX/Linux/BSD-like operating systems has a command called lsof (lsof stands for "list open files") which outputs a list of all currently open files in the system. You can run that command to see if any process is still working with the file. If not, your upload has finished. In this example, awk is used to filter so only files will show that are open with write or read/write file handlers:
if (shell_exec("lsof | awk '\$4 ~ /.*[uw]/' | grep " . $uploaded_file_name) == '') {
/* No file handles open for this file, so upload is finished. */
}
I'm not very familiar with Windows servers, but this thread might help you to do the same on a Windows machine (if that is what you have): How can I determine whether a specific file is open in Windows?
I would think that some operating systems include a ".part" file when downloading a file, so there may be a way to check for the existence of such a file. Otherwise, I agree with Brian's answer. If you were using the script on the same system it is simple enough to tell using move_uploaded_file()'s return if it was being uploaded by a PHP script, but it does become a challenge pulling from a remote directory that can be added to with different protocols.