How to determine whether a file is still being transferred via ftp - php

I have a directory with files that need processing in a batch with PHP. The files are copied on the server via FTP. Some of the files are very big and take a long time to copy. How can I determine in PHP if a file is still being transferred (so I can skip the processing on that file and process it in the next run of the batch process)?
A possibility is to get the file size, wait a few moments, and verify if the file size is different. This is not waterproof because there is a slight chance that the transfer was simply stalled for a few moments...

One of the safest ways of doing this is to upload the files with a temporary name, and rename them once the transfer is finished. You program should skip files with the temporary name (a simple extension works just fine.) Obviously this requires the client (uploader) to cooperate, so it's not ideal.
[This also allows you to delete failed (partial) transfers after a given time period if you need that.]
Anything based on polling the file size is racy and unsafe.
Another scheme (that also requires cooperation from the uploader) can involve uploading the file's hash and size first, then the actual file. That allows you to know both when the transfer is done, and if it is consistent. (There are lots of variants around this idea.)
Something that doesn't require cooperation from the client is checking whether the file is open by another process or not. (How you do that is OS dependent - I don't know of a PHP builtin that does this. lsof and/or fuser can be used on a variety of Unix-type platforms, Windows has APIs for this.) If another process has the file open, chances are it's not complete yet.
Note that this last approach might not be fool-proof if you allow restarting/resuming uploads, or if your FTP server software doesn't keep the file open for the entire duration of the transfer, so YMMV.

Our server admin suggested ftpwho, which outputs which files are currently transferred.
http://www.castaglia.org/proftpd/doc/ftpwho.html
So the solution is to parse the output of ftpwho to see if a file in the directory is being transferred.

Some FTP servers allow running commands when certain event occurs. So if your FTP server allows this, then you can build a simple signalling scheme to let your application know that the file has been uploaded more or less successfully (more or less is because you don't know if the user intended to upload the file completely or in parts). The signalling scheme can be as simple as creation of "uploaded_file_name.ext.complete" file, and you will monitor existence of files with ".complete" extension.
Now, you can check if you can open file for writing. Most FTP servers won't let you do this if the file is being uploaded.
One more approach mentioned by Mat is using system-specific techniques to check if the file is opened by other process.

Best way to check would be to try and get an exclusive lock on the file using flock. The sftp/ftp process will be using the fopen libraries.
// try and get exclusive lock on file
$fp = fopen($pathname, "r+");
if (flock($fp, LOCK_EX)) { // acquire an exclusive lock
flock($fp, LOCK_UN); // release the lock
fclose($fp);
}
else {
error_log("Failed to get exclusive lock on $pathname. File may be still uploading.");
}

It's not realy nice trick, but it's simple :-), the same u can do with filemtime
$result = false;
$tryies = 5;
if (file_exists($filepath)) {
for ($i=0; $i < $tryies; $i++) {
sleep(1);
$filesize[] = filesize($filepath);
}
$filesize = array_unique($filesize);
if (count($filesize) == 1) {
$result = true;
} else {
$result = false;
}
}
return $result;

Related

Put a file on FTP site with contents from string variable (no local file)

I want to upload a file to an FTP server, but the file content is held in a variable, not in an actual local file. I want to avoid using a file; this is to avoid security risks when dealing with sensitive data on a (possibly) not-so-secure system(*), as well as to minimize the (already low) overhead of file handling.
But PHP's FTP API only offers uploading files from local files via the function ftp_put or (when the file is already opened as a file handle) ftp_fput.
Currently, I use this function with a temporary file in which I write the contents before the upload:
$tmpfile = tmpfile();
fwrite($tmpfile, $content);
fseek($tmpfile, 0);
ftp_fput($ftp, $filename, $tmpfile, FTP_BINARY);
Is there a simpler way without using files on the local (PHP) site at all?
There is ftp_raw which can be used to send arbitrary commands, so I could issue the PUT command manually, however I don't see a way to manually write the data on the data channel...
I don't know if it is important, but the FTP connection is secured with SSL (ftp_ssl_connect).
(*) Consider the scenario where an attacker has read-only control over the entire file system.
This may be no ultimate solution but I guess is still better than the original approach:
You can avoid temporary files on the file system by using a PHP memory stream. It basically is a file handle wrapper which (behind the scenes) uses no actual file but instead some chunk of memory.
So virtually you still use a file handle (so ftp_fput is happy), but no actual file (so no file is written to the disk and the overhead is kept minimal).
$tmpfile = fopen('php://memory', 'r+');
fputs($tmpfile, $content);
rewind($tmpfile); // or fseek
Note that when uploading multiple files, you can further minimize overhead by reusing the same file handle for all files (unless you parallelize the procedure, of course). In this case, also rewind the file after ftp_fput as well as truncate it using ftruncate($tmpfile, 0).
With no local file involved (ftp_fput):
$stream = fopen('php://memory','r+');
fwrite($stream, $newFileContent);
rewind($stream);
$success = ftp_fput($connectionId, "remoteFileName", $stream, FTP_BINARY);
fclose($stream);
fopen should be able to connect via ftp:
http://php.net/manual/en/function.fopen.php
and then use fwrite to write string to connection:
http://php.net/manual/en/function.fwrite.php
Second parameter of fwrite is string that need to be written - you don't need file there.
-----FTP PUT contents (php 7.0) ---
$tmpFile = tmpfile();
fwrite($tmpFile, $contents);
rewind($tmpFile);
$tmpMetaData = stream_get_meta_data($tmpFile);
if (ftp_put($ftpObj, $remoteFile, $tmpMetaData['uri'], FTP_ASCII)) {
echo "success";
} else {
echo "fail";
}
fclose($tmpFile);

Why is unlink successful on an open file?

Why open file is deleted? On Windows Xamp, I get message "still working", but on other PHP serwer file is deleted, even if it is open and I get message "file deleted". I can delete file from FTP too, even if first script is still working :(
<?php
$handle = fopen("resource.txt", "x");
sleep(10);
?>
<?php
if (file_exists("resource.txt") && #unlink("resource.txt") === false) {
echo "still worning";
exit;
}
else
echo "file deleted";
?>
UNIX systems typically let you do this, yes. The underlying C unlink function is documented as such:
The unlink() function removes the link named by path from its directory
and decrements the link count of the file which was referenced by the
link. If that decrement reduces the link count of the file to zero, and
no process has the file open, then all resources associated with the file
are reclaimed. If one or more process have the file open when the last
link is removed, the link is removed, but the removal of the file is
delayed until all references to it have been closed.
In other words, you can basically mark the file for deletion at any time, but the system will actually keep it around as long as applications are still accessing it. Only when all applications have let go of the file will it finally actually be removed. Windows apparently does not do it that way. Update: Since PHP 7.3 it's now possible to unlink open files.
As a side note, UNIX' behaviour is the only sane behaviour in a multi-process environment. If you have to wait for all processes to close access to a file before the system lets you remove it, it's basically impossible to remove frequently accessed files at all. Yes, that's where those Windows dialog boxes about "Cannot delete file, still in use, retry?" come from which you can never get rid of.

Apache/PHP5 popen/fread ties up Apache

I'm trying to develop an online management system for a very large FLAC music library for a radio station. It's got a beefy server and not many users, so I want to be able to offer a file download service where PHP transcodes the FLAC files into MP3/WAV depending on what the endpoint wants.
This works fine:
if($filetype == "wav") {
header("Content-Length: ". $bitrate * $audio->get_length());
$command = "flac -c -d ".$audio->get_filename().".flac";
}
ob_end_flush();
$handle = popen($command, "r");
while($read = fread($handle, 8192)) echo $read;
pclose($handle);
and allows the server to start sending the file to the user before the transcoding (well, decoding in this case) completes, for maximum speed.
However, the problem I'm getting is that while this script is executing, I can't get Apache to handle any other requests on the entire domain. It'll still work fine on other VirtualHosts on the same machine, but nobody can load any pages on this website while one person happens to be downloading a file.
I've also tried implementing the same thing using proc_open with no difference, and have played with the Apache settings for number of workers and the like.
Is the only way to stop this behaviour to use something like exec and waiting for the encoding process to finish before I start sending the user the file? Because that seems sub-optimal! :(
UPDATE: it seems that other people can still access the website, but not me - i.e. it's somehow related to sessions. This confuses me even more!
Use session_write_close() at some point before you start streaming... You may also want to stream_set_blocking(false) on the read pipe.

File upload with breaks

I'd like to upload large files to my server, but i would like to be able to make breaks (for example, the user must be able to shut down his computer and to continue after reboot) in the upload process.
I think i can handle the client side upload, but I don't know how to make the server side. What is the best way to make it on the server side? Is PHP able to do that ? Is PHP the most efficient?
Thanks a lot
If you manage to do the client side do post the file in chunks, you could do something like this on the server side:
// set the path of the file you upload
$path = $_GET['path'];
// set the `append` parameter to 1 if you want to append to an existing file, if you are uploading a new chunk of data
$append = intval($_GET['append']);
// convert the path you sent via post to a physical filename on the server
$filename = $this->convertToPhysicalPath($path);
// get the temporary file
$tmp_file = $_FILES['file']['tmp_name'];
// if this is not appending
if ($append == 0) {
// just copy the uploaded file
copy($tmp_file, $filename);
} else {
// append file contents
$write_handle = fopen($filename, "ab");
$read_handle = fopen($tmp_file, "rb");
$contents = fread($read_handle, filesize($tmp_file));
fwrite($write_handle, $contents);
fclose($write_handle);
fclose($read_handle);
}
If you are trying to design a web interface to allow anyone to upload a large file and resume the upload part way though I don't know how to help you. But if all you want to do is get files from you computer to a server in a resume-able fashion you may be able to use a tool like rsync. Rsync compare the files on the source and destination, and then only copies the differences between the two. This way if you have 50 GB of files that you upload to your server and then change one, rsync will very quickly check that all the other files are the same, and then only send your one changed file. This also means that if a transfer is interrupted part way through rsync will pick up where it left off.
Traditionally rsync is run from the command line (terminal) and it is installed by default on most Linux and Mac OS X.
rsync -avz /home/user/data sever:src/data
This would transfer all files from /home/user/data to the src/data on the server. If you then change any file in /home/user/data you can run the command again to resync it.
If you use windows the easiest solution is probably use DeltaCopy which is a GUI around rsync.download

Read content of 12000 files from another FTP server

What I would like to script: a PHP script to find a certain string in loads of files
Is it possible to read contents of thousands of text files from another ftp server without actually downloading those files (ftp_get) ?
If not, would downloading them ONCE -> if already exists = skip / filesize differs = redownload -> search certain string -> ...
be the easiest option?
If URL fopen wrappers are enabled, then file_get_contents can do the trick and you do not need to save the file on your server.
<?php
$find = 'mytext'; //text to find
$files = array('http://example.com/file1.txt', 'http://example.com/file2.txt'); //source files
foreach($files as $file)
{
$data = file_get_contents($file);
if(strpos($data, $find) !== FALSE)
echo "found in $file".PHP_EOL;
}
?>
[EDIT]: If Files are accessible only by FTP:
In that case, you have to use like this:
$files = array('ftp://user:pass#domain.com/path/to/file', 'ftp://user:pass#domain.com/path/to/file2');
If you are going to store the files after you download them, then you may be better served to just download or update all of the files, then search through them for the string.
The best approach depends on how you will use it.
If you are going to be deleting the files after you have searched them, then you may want to also keep track of which ones you searched, and their file date information, so that later, when you go to search again, you won't waste time searching files that haven't changed since the last time you checked them.
When you are dealing with so many files, try to cache any information that will help your program to be more efficient next time it runs.
PHP's built-in file reading functions, such as fopen()/fread()/fclose() and file_get_contents() do support FTP URLs, like this:
<?php
$data = file_get_contents('ftp://user:password#ftp.example.com/dir/file');
// The file's contents are stored in the $data variable
If you would need to get a list of the files in the directory, you might want to check out opendir(), readdir() and closedir(), which I'm pretty sure supports FTP URLs.
An example:
<?php
$dir = opendir('ftp://user:password#ftp.example.com/dir/');
if(!$dir)
die;
while(($file = readdir($dir)) !== false)
echo htmlspecialchars($file).'<br />';
closedir($dir);
If you can connect via SSH to that server, and if you can install new PECL (and PEAR) modules, then you might consider using PHP SSH2. Here's a good tutorial on how to install and use it. This is a better alternative to FTP. But if it is not possible, your only solution is file_get_content('ftp://domain/path/to/remote/file');.
** UPDATE **
Here is a PHP-only implementation of an SSH client : SSH in PHP.
With FTP you'll always have to download to check.
I do not know what kind of bandwidth you're having and how big the files are, but this might be an interesting use-case to run this from the cloud like Amazon EC2, or google-apps (if you can download the files in the timelimit).
In the EC2 case you then spin up the server for an hour to check for updates in the files and shut it down again afterwards. This will cost a couple of bucks per month and avoid you from potentially upgrading your line or hosting contract.
If this is a regular task then it might be worth using a simple queue system so you can run multiple processes at once (will hugely increase speed) This would involve two steps:
Get a list of all files on the remote server
Put the list into a queue (you can use memcached for a basic message queuing system)
Use a seperate script to get the next item from the queue.
The procesing script would contain simple functionality (in do while loop)
ftp_connect
do
item = next item from queue
$contents = file_get_contents;
preg_match(.., $contents);
while (true);
ftp close
You could then in theory fork off multiple processes through the command line without needing to worry about race conditions.
This method is probabaly best suited to crons/batch processing, however it might work in this situation too.

Categories