How to tell if a file has finished uploading in PHP? - php

I have a cron script that compresses images. It basically iterates over folders and then compresses the files in the folder. My problem is that some images are getting processed halfway. My theory is that users are uploading a image, and before the image has finished uploading the file, the compressor tries to compress the file. Thus compressing a half-uploaded image, and resulting in half an image being displayed.
Is there a way in PHP to confirm that a file has finished uploading? So that I can only do the compression once i know the file has been fully written?
Or alternatively, is there a way to check if a file is being used by another process?
Or alternatively, would it be reliable enough to look at when the file was "written to disk" and not process it until 10 minutes has gone by?

PHP doesn't trigger your action until the files are fully uploaded, but it is possible for your cron job to start interacting with files before they're fully saved.
When saving something from $_FILES, save it to a version with a . prefix on it to tag it as incomplete. Make sure your cron job skips any such files.
Then, once the save operation is complete, rename the file without the . prefix to make it eligible for processing.

There are two ways to handle the scenario
Flags
Set flag that files before modify/write it.
Our App handles lots of files, we set flags before taking them to process once it's done we remove the flag, as it runs on cron flag is the best way to process files.
Usually, you can an extra column on the table on each file. or you can have an array where you can store all currently handling files.
filemtime()
As you mentioned you can check like if file mtime is more than 10 min that current time, then you compress them but if some other processes are using the file opened the at the same time. it causes the problem again.
So its better to go with flag. If other processes never modify the files often.

You can use flock to ensure file is not in use, see here for example. Alternatively you can check whether an image is broken or corrupted see here.

Related

Read uploaded file into variable instead saving it

I want to be able to read an uploaded file, but I don't want the file to be saved on the server because of security concerns... Is it possible just to directly read a file into a variable?
If not, how does the Temp file thing work, how secure is it to save a temp file on the server, and when is it deleted?
You're going to have to save the upload to a file, otherwise any large file will cause an error because it'll overload available memory pretty easily.
Temp is VERY insecure, generally anybody on the system can read/write/delete your temp file.
The best way to go about this is just do a normal file upload, and in your submission script either read the file and process it or move it to a more permanent location. You can now issue a delete command to the tmp copy.
Since you may not have delete permission and/or the file could be auto deleted, it's best to issue the command this way (the # symbol suppresses any errors, because you don't really care if the file is already deleted or what not, this is a "just in case" scenario)
#unlink($filename);
So far as I'm aware it has to be stored somewhere in order to interact with it, but it's deleted as soon as your script finishes executing. See PHP: When does the temporary uploaded files get deleted?

efficient file update with flock and move

I want to update a file while other processes may be using reading it. PHP flock() function allows exactly to do that.
However as I see the flock only takes a file handle .. that generally comes from fopen. If I want to do this efficiently, I don't want to keep the file open and write it, because file is coming over the network and the write operation may span to a few seconds (say 2-3 seconds).
So I was hoping if could write the data to temp file and then move it. In that case readers of the file will only be disturbed when I am renaming it.
Now writing data to temp will not require me to use flock. However how can I move tempfile to actual file correctly using locking.
I also wonder if I would actually need locking in the first place .. wouldn't the move operation will be very quick? Would it hurt simultaineous file reads. And I expect there will be 100s of reads but just one update, and that update will happen once every hour
Rename is atomic in POSIX systems, so you don't need flock. Readers that have already opened the file will be undisturbed. (Justification: An open file handle points to the inode, not to the directory entry. Rename changes just the directory entry.)
However, readers must close and reopen the file to get the new content. If readers keep the file open, they will be able to reread the old content.

Is there a better way to see if a file is being written to?

We have a FreeBSD server with samba that employees copy image files on to, which then get uploaded to our web servers (this way they don't have to mess with ftp). Sometimes, if the upload script is running at the same time as files are being copied, it can upload an incomplete file.
We fixed this by getting the list of files along with the file sizes, then waiting 5 seconds and rechecking the file sizes. If the sizes match then its save to upload, if they don't match it checks again in another 5 seconds.
This seems like an odd way to check if the files are being written to. is there a better, more simple way of doing this?
Use a flock function http://php.net/flock - when writing a file obtain an exclusive lock flock($handle, LOCK_EX), after it is written release the lock flock($handle, LOCK_UN).
The upload script could try to obtain the exclusive writing lock too, if it succeeds it is Okay to move the file, otherwise no.
EDIT: Sorry, I forgot about the users copying the files to the server through samba... So there is no space to use flock while copying... But the upload script could still use flock($handle, LOCK_EX) to see, if it is successful or not.
I recommend to shell_exec() smbstatus(1), e.g. smbstatus -LB to check for locked files
Write a script to copy the files to a temp folder on the Samba server and then, when fully copied and flushed, move, (ie, unlink/link, not copy again), them to the upload folder.

Detect with php if files is being uploaded, or is open

I have a PHP script that opens a local directory in order to copy and process some files. But these files may be incomplete, because they are being uploaded by a slow FTP process, and I do not want to copy or process any files which have not been completely uploaded yet.
Is is possible in PHP to find out if a file is still being copied (that is, read from), or written to?
I need my script to process only those files that have been completely uploaded.
The ftp process now, upload files in parallel, and it take more than 1 second for each filesize to change, so this trick is not working for me now, any other method suggests
Do you have script control over the FTP process? If so, have the script that's doing the uploading upload a [FILENAME].complete file (blank text file) after the primary upload completes, so the processing script knows that the file is complete if there's a matching *.complete file there also.
+1 to #MidnightLightning for his excellent suggestion. If you don't have control over the process you have a couple of options:
If you know what the final size of the file should be then use filesize() to compare the current size to the known size. Keep checking until they match.
If you don't know what the final size should be it gets a little trickier. You could use filesize() to check the size of the file, wait a second or two and check it again. If the size hasn't changed then the upload should be complete. The problem with the second method is if your file upload stalls for whatever reason it could give you a false positive. So the time to wait is key.
You don't specify what kind of OS you're on, but if it's a Unix-type box, you should have fuser and/or lsof available. fuser will report on who's using a particular file, and lsof will list all open files (including sockets, fifos, .so's, etc...). Either of those could most likely be used to monitor your directory.
On the windows end, there's a few free tools from Sysinternals that do the same thing. handle might do the trick

Best way to replace file on server

I have to write script in PHP which will be dynamicly replace some files on server from time to time. Easy thing, but the problem is that I want to avoid situation when user request this file during replacing. Then he could get uncompleted file or even error.
Best solution to me is block access to my site during replacing by e.g. setting .htaccess redirecting all requests to page with information about short break. But normally .htaccess file already exist, so there may be situation when server gets uncomplited .htaccess file.
Is there any way to solve it?
Edit: Thank you so much for all answers, guys. You are briliant.
#ircmaxell Your idea sounds great for me. I read what dudes from PHP.net wrote and I don't know if I understand all correctly.
So, tell me: If I do all steps you wrote and add apc.file_update_protection to my php.ini, there will be no way to get uncompleted file by user by any time? There will be always one, correct file? Are you sure at 100% ?
It is very important to me coz these replacements will be very often and there is big chance to request file during renaming.
Here's something that's easy, and will work on any local filesystem on linux:
Upload (or write) the file to a temporary filename
Move the file (using the mv (move) command, either in FTP, or command line, etc, or the rename command in PHP) to overwrite the existing one.
When you execute the mv command, it basically deletes the old file pointer, and writes the new one. Since it's being done at the filesystem level, it's an atomic operation. So the client can't get an old file...
APC recommends doing this to prevent these very issues from cropping up...
Also note that you could use rsync to do it as well (since it basically does this behind the scenes)...
Doesn't this work already? I never tested for this specifically but I've done what you're doing and that problem never showed up.
It seems like an easy thing for an operating system to
Upload / write to a temporary file
When writing is done, block access to the original file (make the request for the file wait)
Delete the file, rename the temporary one and remove any locks
I'm fairly sure this is what an OS should do for copying. If you're writing the file contents yourself with PHP you'll just have to do this yourself...
Try railless Capistrano or a method they use:
in a directory you have two things:
A folder containering folders, each subfolder is a release
A soft link to the current release folder
When you upload the new file, do the upload making a new release folder. Check to see that no one is currently running the current release (this might be a little tricky assuming you dont have a crazy number of users you could probably do it with a db entry) and then rewrite the softlink to point to the newest release.
maybe do try it like this:
delete file and save it's path
ln -nfs movedfilepath pathtosorrypage.html
upload file to some temporary folder on the server
remove symlink
mv newfile movedfilepath
Option 1: If you have a lot of users and this replacing is done not so frequent, you can set up a maintenance on the site (block access) and have no one log in after a certain time, and finally cut off everyone who is logged in when you're about to do the replacement.
Option 2: If the file replacing is done frequently (in which case you shouldn't do the maintenance every day), have it done by code. Have two of the same files (same folder if you want). Then, by code, when you're about to replace the file, have it just give the copy, while you replace the one you want. You can do it with a simple IF.
Pseudo-code:
if (replaceTime - 15 seconds <= currentTime <= replaceTime + 15 seconds){
// allows 30 seconds for another script to bring in the new image into 'myImage.jpg'
<img src="/myFiles/myOldImage.jpg" />
} else {
<img src="/myFiles/myImage.jpg" />
}
No need to update any database or manually move/copy/rename a file.
After replaceTime + 15 has passed:
copyFileTo("myImage.jpg","myOldImage.jpg");
// Now you have the copy ready for the next time to replace

Categories