Is it possible to use PHP readfile function on a remote file whose size is unknown and is increasing in size? Here is the scenario:
I'm developing a script which downloads a video from a third party website and simultaneously trans-codes the video into MP3 format. This MP3 is then transferred to the user via readfile.
The query used for the above process is like this:
wget -q -O- "VideoURLHere" | ffmpeg -i - "Output.mp3" > /dev/null 2>&1 &
So the file is fetched and encoded at the same time.
Now when the above process is in progress I begin sending the output mp3 to the user via readfile. The problem is that the encoding process takes some time and therefore depending on the users download speed readfile reaches an assumed EoF before the whole file is encoded, resulting in the user receiving partial content/incomplete files.
My first attempt to fix this was to apply a speed limit on the users download, but this is not foolproof as the encoding time and speed vary with load and this still led to partial downloads.
So is there a way to implement this system in such a way that I can serve the downloads simultaneously along with the encoding and also guarantee sending the complete file to the end user?
Any help is appreciated.
EDIT:
In response to Peter, I'm actually using fread(read readfile_chunked):
<?php
function readfile_chunked($filename,$retbytes=true) {
$chunksize = 1*(1024*1024); // how many bytes per chunk
$totChunk = 0;
$buffer = '';
$cnt =0;
$handle = fopen($filename, 'rb');
if ($handle === false) {
return false;
}
while (!feof($handle)) {
//usleep(120000); //Used to impose an artificial speed limit
$buffer = fread($handle, $chunksize);
echo $buffer;
ob_flush();
flush();
if ($retbytes) {
$cnt += strlen($buffer);
}
}
$status = fclose($handle);
if ($retbytes && $status) {
return $cnt; // return num. bytes delivered like readfile() does.
}
return $status;
}
readfile_chunked($linkToMp3);
?>
This still does not guarantee complete downloads as depending on the users download speed and the encoding speed, the EOF() may be reached prematurely.
Also in response to theJeztah's comment, I'm trying to achieve this without having to make the user wait..so that's not an option.
Since you are dealing with streams, you probably should use stream handling functions :). passthru comes to mind, although this will only work if the download | transcode command is started in your script.
If it is started externally, take a look at stream_get_contents.
Libevent as mentioned by Evert seems like the general solution where you have to use a file as a buffer. However in your case, you could do it all inline in your script without using a file as a buffer:
<?php
header("Content-Type: audio/mpeg");
passthru("wget -q -O- http://localhost/test.avi | ffmpeg -i - -f mp3 -");
?>
I don't think there's any of being notified about there being new data, short of something like inotify.
I suggest that if you hit EOF, you start polling the modification time of the file (using clearstatcache() between calls) every 200 ms or so. When you find the file size has increased, you can reopen the file, seek to the last position and continue.
I can highly recommend using libevent for applications like this.
It works perfect for cases like this.
The PHP documentation is a bit sparse for this, but you should be able to find more solid examples around the web.
Related
Maybe I'm asking the impossible but I wanted to clone a stream multiple times. A sort of multicast emulation. The idea is to write every 0.002 seconds a 1300 bytes big buffer into a .sock file (instead of IP:port to avoid overheading) and then to read from other scripts the same .sock file multiple times.
Doing it through a regular file is not doable. It works only within the same script that generates the buffer file and then echos it. The other scripts will misread it badly.
This works perfectly with the script that generates the chunks:
$handle = #fopen($url, 'rb');
$buffer = 1300;
while (1) {
$chunck = fread($handle, $buffer);
$handle2 = fopen('/var/tmp/stream_chunck.tmp', 'w');
fwrite($handle2, $chunck);
fclose($handle2);
readfile('/var/tmp/stream_chunck.tmp');
}
BUT the output of another script that reads the chunks:
while (1) {
readfile('/var/tmp/stream_chunck.tmp');
}
is messy. I don't know how to synchronize the reading process of chunks and I thought that sockets could make a miracle.
It works only within the same script that generates the buffer file and then echos it. The other scripts will misread it badly
Using a single file without any sort of flow control shouldn't be a problem - tail -F does just that. The disadvantage is that the data will just accululate indefinitely on the filesystem as long as a single client has an open file handle (even if you truncate the file).
But if you're writing chunks, then write each chunk to a different file (using an atomic write mechanism) then everyone can read it by polling for available files....
do {
while (!file_exists("$dir/$prefix.$current_chunk")) {
clearstatcache();
usleep(1000);
}
process(file_get_contents("$dir/$prefix.$current_chunk"));
$current_chunk++;
} while (!$finished);
Equally, you could this with a database - which should have slightly lower overhead for the polling, and simplifies the garbage collection of old chunks.
But this is all about how to make your solution workable - it doesn't really address the problem you are trying to solve. If we knew what you were trying to achieve then we might be able to advise on a more appropriate solution - e.g. if it's a chat application, video broadcast, something else....
I suspect a more appropriate solution would be to use mutli-processing, single memory model server - and when we're talking about PHP (which doesn't really do threading very well) that means an event based/asynchronous server. There's a bit more involved than simply calling socket_select() but there are some good scripts available which do most of the complicated stuff for you.
Please look at the PHP code below. It is from a download script:
while(ob_get_level() > 0){
ob_end_clean();
}
set_time_limit(0);
ignore_user_abort(true);
$file = fopen(MATIN_FILE_PATH,"rb"); // the main file
$chunksize = 2*1024*1024;
while(!feof($file)){
echo #fread($file, $chunksize);
flush();
if (connection_status() == 1){ // if client aborted
#fclose($file);
exit;
}
}
#fclose($file);
exit;
In this code, you see that I send 2 MB per chunk.
Imagine a client with speed of 100kb/s
After many times of debugging, I found out that when client downloads each 2MB, fwrite happens and while goes to next loop. so, what Is PHP doing at this time? is it waiting for the user to download 2MB completely and then send another 2MB? so, isn't it better that I send 10MB or 50MB per chunk?
Thanks for any detailed guide.
Imagine you have 10 simultaneous client requests for downloading some file with this script and you have set 50MBs per chunk size. For each of the requests a new php process will be invoked, each of them demanding 50MBs of your server's memory to process fread($file, 50*1024*1024). So, you will have 500MBs memory consumed.
If, as you suggested, a client speed is 100kb/s, then the probability that you have 100 simultaneous connections is not so low and you could get 100 concurrent requests, which is 5GBs of RAM already. Do you have that much or need that all?
You cannot make the user download the file faster than his actual download speed, so the chunk size does not significantly matter for this. Neither reducing the number of loops iterations will help to speed up. I have not tested this, but I think, the loop is executed faster than I/O operations with remote client. So, the only thing you should really be concerned of, is to make your server work reliably.
I'm currently looking into a way of showing the file download status on a page.
I know this isnt needed since the user usually has a download status in the browser, but I would like to keep the user on the page he is downloading from, as long as the download is lasting. To do that, the download status should match the status the file actually has (not a fake prograss bar). Maybe it will also display the speed the user is downloading at, and estimate the time it will take, depending on the current download rate.
Can this be done using PHP and Javascript? Or does it realy require Flash or Java?
Should not somewhere on the Server be an information about who is downloading what at what speed and how much?
Thank you for your help in advance.
Not really possible cross-browser, but have a look into http://markmail.org/message/kmrpk7w3h56tidxs#query:jquery%20ajax%20download%20progress+page:1+mid:kmrpk7w3h56tidxs+state:results for a pretty close effort. IE (as usual) is the main culprit for not playing ball.
You can do it with two seperate php files, first file for downloading process.
Like as:
$strtTime=time();
$download_rate=120; //downloading rate
$fp = fopen($real, "r");
flush();// Flush headers
while (!feof($fp)) {
$downloaded=round($download_rate * 1024);
echo fread($fp,$downloaded );
ob_flush();
flush();
if (connection_aborted ()) {
// unlink("yourtempFile.txt" ;
exit;
}
$totalDw +=$downloaded;
// file_put_contents("yourtempFile.txt", "downloaded: $totalDw ; StartTime:$strtTime");
sleep(1);
}
fclose($fp);
// unlink("yourtempFile.txt") ;
Second file would be used for reading yourtempFile.txt by Ajax continusly. Using Sessions and Cookies wouldn't be used because of starting print.
Alright, I know my question is not entirely specific, as an optimum fread chunk size is more of a trial error based thing. However, I was hoping some of you guys could shed some light on this.
This also involves server related stuff, so am not sure if Stackoverflow is entirely the right place, but it did seem to be a better choice in comparison to ServerFault.
To begin with, I'm going to post two screenshots:
http://screensnapr.com/e/pnF1ik.png
http://screensnapr.com/e/z85FWG.png
Now I've got a script that uses PHP to stream files to the end user. It uses fopen and fread to stream the file. Most of these files are larger than 100MB. My concern is that sometimes, the above is what my server stats turn into. The two screens are from different servers; both servers are dedicated file streaming boxes. Nothing else runs on them except PHP streaming the file to the end user.
I'm confused about the fact that even when my server's are only transmitting an aggregate total of about 4MB/sec of data to the end client(s), the disk reads are going at 100M/s and over. This insane level of IO eventually locks my CPU because it waits for IO and tasks pile up; eventually my server becomes completely unresponsive, requiring a reboot.
My current fread chunk size is set to 8 * 1024. My question is, will changing the block size and experimenting help at all? The client is only downloading data at an average ~4MB/sec. So why is the disk reading data at 100MB/sec? I've tried every possible solution on the server end; I even swapped the disks with new ones to rule out a potential disk issue. Looks to me like this is a script issue; maybe PHP is reading the entire data from the disk regardless of how much it transfers to the end client?
Any help at all would be appreciated. And if this belongs to ServerFault, then my apologies for posting here. And if you guys need me to post snippets from the actual script, I can do that too.
8 * 1024 bytes? That seems perfectly reasonable and if so your high disk I/O is probably related to concurrent request. Have you considered implementing some sort of bandwidth throttling? Here is a PHP-only implementation I did for my framework, phunction:
public static function Download($path, $speed = null, $multipart = false)
{
if (strncmp('cli', PHP_SAPI, 3) !== 0)
{
if (is_file($path) === true)
{
while (ob_get_level() > 0)
{
ob_end_clean();
}
$file = #fopen($path, 'rb');
$size = sprintf('%u', filesize($path));
$speed = (empty($speed) === true) ? 1024 : floatval($speed);
if (is_resource($file) === true)
{
set_time_limit(0);
session_write_close();
if ($multipart === true)
{
$range = array(0, $size - 1);
if (array_key_exists('HTTP_RANGE', $_SERVER) === true)
{
$range = array_map('intval', explode('-', preg_replace('~.*=([^,]*).*~', '$1', $_SERVER['HTTP_RANGE'])));
if (empty($range[1]) === true)
{
$range[1] = $size - 1;
}
foreach ($range as $key => $value)
{
$range[$key] = max(0, min($value, $size - 1));
}
if (($range[0] > 0) || ($range[1] < ($size - 1)))
{
ph()->HTTP->Code(206, 'Partial Content');
}
}
header('Accept-Ranges: bytes');
header('Content-Range: bytes ' . sprintf('%u-%u/%u', $range[0], $range[1], $size));
}
else
{
$range = array(0, $size - 1);
}
header('Pragma: public');
header('Cache-Control: public, no-cache');
header('Content-Type: application/octet-stream');
header('Content-Length: ' . sprintf('%u', $range[1] - $range[0] + 1));
header('Content-Disposition: attachment; filename="' . basename($path) . '"');
header('Content-Transfer-Encoding: binary');
if ($range[0] > 0)
{
fseek($file, $range[0]);
}
while ((feof($file) !== true) && (connection_status() === CONNECTION_NORMAL))
{
ph()->HTTP->Flush(fread($file, round($speed * 1024)));
ph()->HTTP->Sleep(1);
}
fclose($file);
}
exit();
}
else
{
ph()->HTTP->Code(404, 'Not Found');
}
}
return false;
}
The above method has some minor dependencies and it adds some unnecessary functionality, like multi-part downloads but you should be able to reuse the throttling logic without problems.
// serve file at 4 MBps (max)
Download('/path/to/file.ext', 4 * 1024);
You can even be more generous by default and decrease the $speed depending on the values you get from the first index of sys_getloadavg() to avoid stressing your CPU.
Generally, it can happen that actual I/O is faster than userspace I/O because of prefetching and filesystem overhead. However, that should never lock your server. The cache size will have little to no impact on that as long as it's between a 1KiB and, say, 16MiB. However, instead of using php to stream files, you should really consider the much more optimized readfile.
That being said, barring a serious programming error, this behavior is probably not directly related to your small loop. First, you should use iotop to find out which program is actually causing the I/O. If it's php (how many concurrent scripts? Sorry, the screenshots seem to be completely garbled and show next to no useful information), rule out you're using output buffering and have a look at memory consumption as well as the various php tuning parameters (phpinfo has a good overview). By the way, htop is a way nicer alternative to top ;).
Now I've got a script that uses PHP to stream files to the end user.
Just to clarify whats really going on, Apache is responsible for the actual "stream". PHP deals directly with Apache for it's output. Therefore your end user for the PHP script is Apache. Apache then handles the output to the user, which apparently in your case is around ~4MB/sec. Apache however does not have that restriction and can take all of your output at once and then handle a delayed delivery to the client. To prove this you should be able to see your script exit before the stream is delivered. If your script turns around and tries to deliver another file, then your queing up Apache against your server resources.
A better solution may be to allow Apache to handle the file delivery completely by letting the user request the download from an accesible url. Obviously this is limited to static content. To fix your above script, would require delaying some of the file read to allow Apache to deliver the chunks instead of buffering the whole output.
EDIT: If your memory is fine and we can rule out swap drive activity, then it may simply be concurrent file read requests. If we request 5 files at 100mb, well thats 500mb of read activity. Apache will not throttle your script and will in fact buffer all output, which can be over 100mb at a time. This would account for alot of disk i/o activity, because each request results in reading the complete file into the buffer. Utilizing a throttle as suggested by Alix would allow for more concurrent requests, but eventually your going to reach a limit. We can't be sure how fast the user receives the data from Apache, so you might have to find a nice balance for the throttle size to allow for Apache and PHP to work with chunks of your files instead of the whole file.
is file_get_contents() enough for downloading remote movie files located on a server ?
i just think that perhaps storing large movie files to string is harmful ? according to the php docs.
OR do i need to use cURL ? I dont know cURL.
UPDATE: these are big movie files. around 200MB each.
file_get_contents() is a problem because it's going to load the entire file into memory in one go. If you have enough memory to support the operation (taking into account that if this is a web server, you may have multiple hits that generate this behavior simultaneously, and therefore each need that much memory), then file_get_contents() should be fine. However, it's not the right way to do it - you should use a library specifically intended for these sort of operations. As mentioned by others, cURL will do the trick, or wget. You might also have good luck using fopen('http://someurl', 'r') and reading blocks from the file and then dumping them straight to a local file that's been opened for write privileges.
As #mopoke suggested it could depend on the size of the file. For a small movie it may suffice. In general I think cURL would be a better fit though. You have much more flexibility with it than with file_get_contents().
For the best performance you may find it makes sense to just use a standard unix util like WGET. You should be able to call it with system("wget ...") or exec()
http://www.php.net/manual/en/function.system.php
you can read a few bytes at a time using fread().
$src="http://somewhere/test.avi";
$dst="test.avi";
$f = fopen($src, 'rb');
$o = fopen($dst, 'wb');
while (!feof($f)) {
if (fwrite($o, fread($f, 2048)) === FALSE) {
return 1;
}
}
fclose($f);
fclose($o);