I am using 2 php libraries to serve files- unzip, and dUnzip2
zip zip.lib
http://www.zend.com/codex.php?id=470&single=1
They work fine on files under 10MB , but with files over 10MB, I have to set the mem limit to 256. With files over 25MB, I set it to 512. It seems kind of high... Is it?
I'm on a dedicated server- 4 CPU's and 16GB RAM - but we also have a lot of traffic and downloading, so I'm kind of wondering here.
Perhaps you're using php to load the whole files into memory before serving them to the user? I've used a function found at http://www.php.net/manual/en/function.readfile.php (comments section) that serves the file in parts, keeping memory low. Copying from that post (because my version is changed):
<?php
function readfile_chunked ($filename,$type='array') {
$chunk_array=array();
$chunksize = 1*(1024*1024); // how many bytes per chunk
$buffer = '';
$handle = fopen($filename, 'rb');
if ($handle === false) {
return false;
}
while (!feof($handle)) {
switch($type)
{
case'array':
// Returns Lines Array like file()
$lines[] = fgets($handle, $chunksize);
break;
case'string':
// Returns Lines String like file_get_contents()
$lines = fread($handle, $chunksize);
break;
}
}
fclose($handle);
return $lines;
}
?>
what my script does is licensing files before user downloads
Generally you should avoid any web script that consumes that much memory or CPU time. Ideally you'd convert the task to a run via a detached process. Setting a high limit isn't bad per se, but it makes it harder to detect poorly written scripts and easier for high demand of the complex pages to hurt performance across even the simple pages.
For example, with gearman you can easily set up some license worker scripts that run on the CLI and communicate with them via the gearman API.
This way your web server can remain free to perform low CPU and memory tasks, and you can easily guarantee that you'll never run more than X licensing tasks at once (where X is based on how many worker scripts you let run).
Your front end script could just be an AJAX widget that polls the server, checking to see if the task has completed.
Related
Maybe I'm asking the impossible but I wanted to clone a stream multiple times. A sort of multicast emulation. The idea is to write every 0.002 seconds a 1300 bytes big buffer into a .sock file (instead of IP:port to avoid overheading) and then to read from other scripts the same .sock file multiple times.
Doing it through a regular file is not doable. It works only within the same script that generates the buffer file and then echos it. The other scripts will misread it badly.
This works perfectly with the script that generates the chunks:
$handle = #fopen($url, 'rb');
$buffer = 1300;
while (1) {
$chunck = fread($handle, $buffer);
$handle2 = fopen('/var/tmp/stream_chunck.tmp', 'w');
fwrite($handle2, $chunck);
fclose($handle2);
readfile('/var/tmp/stream_chunck.tmp');
}
BUT the output of another script that reads the chunks:
while (1) {
readfile('/var/tmp/stream_chunck.tmp');
}
is messy. I don't know how to synchronize the reading process of chunks and I thought that sockets could make a miracle.
It works only within the same script that generates the buffer file and then echos it. The other scripts will misread it badly
Using a single file without any sort of flow control shouldn't be a problem - tail -F does just that. The disadvantage is that the data will just accululate indefinitely on the filesystem as long as a single client has an open file handle (even if you truncate the file).
But if you're writing chunks, then write each chunk to a different file (using an atomic write mechanism) then everyone can read it by polling for available files....
do {
while (!file_exists("$dir/$prefix.$current_chunk")) {
clearstatcache();
usleep(1000);
}
process(file_get_contents("$dir/$prefix.$current_chunk"));
$current_chunk++;
} while (!$finished);
Equally, you could this with a database - which should have slightly lower overhead for the polling, and simplifies the garbage collection of old chunks.
But this is all about how to make your solution workable - it doesn't really address the problem you are trying to solve. If we knew what you were trying to achieve then we might be able to advise on a more appropriate solution - e.g. if it's a chat application, video broadcast, something else....
I suspect a more appropriate solution would be to use mutli-processing, single memory model server - and when we're talking about PHP (which doesn't really do threading very well) that means an event based/asynchronous server. There's a bit more involved than simply calling socket_select() but there are some good scripts available which do most of the complicated stuff for you.
Please look at the PHP code below. It is from a download script:
while(ob_get_level() > 0){
ob_end_clean();
}
set_time_limit(0);
ignore_user_abort(true);
$file = fopen(MATIN_FILE_PATH,"rb"); // the main file
$chunksize = 2*1024*1024;
while(!feof($file)){
echo #fread($file, $chunksize);
flush();
if (connection_status() == 1){ // if client aborted
#fclose($file);
exit;
}
}
#fclose($file);
exit;
In this code, you see that I send 2 MB per chunk.
Imagine a client with speed of 100kb/s
After many times of debugging, I found out that when client downloads each 2MB, fwrite happens and while goes to next loop. so, what Is PHP doing at this time? is it waiting for the user to download 2MB completely and then send another 2MB? so, isn't it better that I send 10MB or 50MB per chunk?
Thanks for any detailed guide.
Imagine you have 10 simultaneous client requests for downloading some file with this script and you have set 50MBs per chunk size. For each of the requests a new php process will be invoked, each of them demanding 50MBs of your server's memory to process fread($file, 50*1024*1024). So, you will have 500MBs memory consumed.
If, as you suggested, a client speed is 100kb/s, then the probability that you have 100 simultaneous connections is not so low and you could get 100 concurrent requests, which is 5GBs of RAM already. Do you have that much or need that all?
You cannot make the user download the file faster than his actual download speed, so the chunk size does not significantly matter for this. Neither reducing the number of loops iterations will help to speed up. I have not tested this, but I think, the loop is executed faster than I/O operations with remote client. So, the only thing you should really be concerned of, is to make your server work reliably.
Is it possible to use PHP readfile function on a remote file whose size is unknown and is increasing in size? Here is the scenario:
I'm developing a script which downloads a video from a third party website and simultaneously trans-codes the video into MP3 format. This MP3 is then transferred to the user via readfile.
The query used for the above process is like this:
wget -q -O- "VideoURLHere" | ffmpeg -i - "Output.mp3" > /dev/null 2>&1 &
So the file is fetched and encoded at the same time.
Now when the above process is in progress I begin sending the output mp3 to the user via readfile. The problem is that the encoding process takes some time and therefore depending on the users download speed readfile reaches an assumed EoF before the whole file is encoded, resulting in the user receiving partial content/incomplete files.
My first attempt to fix this was to apply a speed limit on the users download, but this is not foolproof as the encoding time and speed vary with load and this still led to partial downloads.
So is there a way to implement this system in such a way that I can serve the downloads simultaneously along with the encoding and also guarantee sending the complete file to the end user?
Any help is appreciated.
EDIT:
In response to Peter, I'm actually using fread(read readfile_chunked):
<?php
function readfile_chunked($filename,$retbytes=true) {
$chunksize = 1*(1024*1024); // how many bytes per chunk
$totChunk = 0;
$buffer = '';
$cnt =0;
$handle = fopen($filename, 'rb');
if ($handle === false) {
return false;
}
while (!feof($handle)) {
//usleep(120000); //Used to impose an artificial speed limit
$buffer = fread($handle, $chunksize);
echo $buffer;
ob_flush();
flush();
if ($retbytes) {
$cnt += strlen($buffer);
}
}
$status = fclose($handle);
if ($retbytes && $status) {
return $cnt; // return num. bytes delivered like readfile() does.
}
return $status;
}
readfile_chunked($linkToMp3);
?>
This still does not guarantee complete downloads as depending on the users download speed and the encoding speed, the EOF() may be reached prematurely.
Also in response to theJeztah's comment, I'm trying to achieve this without having to make the user wait..so that's not an option.
Since you are dealing with streams, you probably should use stream handling functions :). passthru comes to mind, although this will only work if the download | transcode command is started in your script.
If it is started externally, take a look at stream_get_contents.
Libevent as mentioned by Evert seems like the general solution where you have to use a file as a buffer. However in your case, you could do it all inline in your script without using a file as a buffer:
<?php
header("Content-Type: audio/mpeg");
passthru("wget -q -O- http://localhost/test.avi | ffmpeg -i - -f mp3 -");
?>
I don't think there's any of being notified about there being new data, short of something like inotify.
I suggest that if you hit EOF, you start polling the modification time of the file (using clearstatcache() between calls) every 200 ms or so. When you find the file size has increased, you can reopen the file, seek to the last position and continue.
I can highly recommend using libevent for applications like this.
It works perfect for cases like this.
The PHP documentation is a bit sparse for this, but you should be able to find more solid examples around the web.
All,
I have built a form, if the users fills in the form, my code can determine what (large) files need to be downloaded by that user. I display each file with a download button, and I want to show the status of the download of that file next to the button (download in progress, cancelled/aborted or completed). Because the files are large (200 MB+), I use a well-known function to read the file to be downloaded in chuncks.
My idea was to log the reading of each (100 kbyte) chunk in a database, so I can keep track of the progress of the download.
My code is below:
function readfile_chunked($filename, $retbytes = TRUE)
{
// Read the personID from the cookie
$PersonID=$_COOKIE['PERSONID'];
// Setup connection to database for logging download progress per file
include "config.php";
$SqlConnectionInfo= array ( "UID"=>$SqlServerUser,"PWD"=>$SqlServerPass,"Database"=>$SqlServerDatabase);
$SqlConnection= sqlsrv_connect($SqlServer,$SqlConnectionInfo);
ChunckSize=100*1024; // 100 kbyte buffer
$buffer = '';
$cnt =0;
$handle = fopen($filename, 'rb');
if ($handle === false)
{
return false;
}
$BytesProcessed=0;
while (!feof($handle))
{
$buffer = fread($handle, $ChunckSize);
$BytesSent = $BytesSent + $ChunckSize;
// update database
$Query= "UPDATE [SoftwareDownload].[dbo].[DownloadProgress] SET LastTimeStamp = '" . time() . "' WHERE PersonID='$PersonID' AND FileName='$filename'";
$QueryResult = sqlsrv_query($SqlConnection, $Query);
echo $buffer;
ob_flush();
flush();
if ($retbytes)
{
$cnt += strlen($buffer);
}
}
$status = fclose($handle);
if ($retbytes && $status)
{
return $cnt; // return num. bytes delivered like readfile() does.
}
return $status;
}
The weird thing is that my download runs smoothly at a constant rate, but my database gets updated in spikes: sometimes I see a couple of dozen of database updates (so a couple of dozens of 100 kilobyte blocks) within a second, the next moment I see no database updates (so no 100 k blocks being fed by PHP) for up to a minute (this timeout depends on the bandwidth available for the download). The fact that no progress is written for a minute makes my code believe that the download was aborted.
In the meantime I see the memory usage of IIS increasing, so I get the idea that IIS buffers the chuncks of data that PHP delivers.
My webserver is running Windows 2008 R2 datacenter (64 bits), IIS 7.5 and PHP 5.3.8 in FastCGI mode. Is there anything I can do (in either my code, in my PHP config or in IIS config) to prevent IIS from caching the data that PHP delivers, or is there any way to see in PHP if the data generated was actually delivered to the downloading client?
Thanks in advance!
As no usefull answers came up, I've created a real dodgy workaround by using Apache strictly for the download of the file(s). The whole form, CSS etc is still delivered through IIS, just the downloads links are server by Apache (a reverse proxy server sits in between, so users don't have to fiddle with port numbers, the proxy takes care of this).
Apache doesn't cache PHP output, so the timestamps I get in my database are reliable.
I've asked a similar question once, and got very good results with the answers there.
Basically, you use a APC cache to cache a variable, then retrieve it in a different page with an Ajax call from the client.
Another possibility is to do the same with a database. Write the progress to a database and have the client to test for that (let's say every second) using an Ajax call.
Alright, I know my question is not entirely specific, as an optimum fread chunk size is more of a trial error based thing. However, I was hoping some of you guys could shed some light on this.
This also involves server related stuff, so am not sure if Stackoverflow is entirely the right place, but it did seem to be a better choice in comparison to ServerFault.
To begin with, I'm going to post two screenshots:
http://screensnapr.com/e/pnF1ik.png
http://screensnapr.com/e/z85FWG.png
Now I've got a script that uses PHP to stream files to the end user. It uses fopen and fread to stream the file. Most of these files are larger than 100MB. My concern is that sometimes, the above is what my server stats turn into. The two screens are from different servers; both servers are dedicated file streaming boxes. Nothing else runs on them except PHP streaming the file to the end user.
I'm confused about the fact that even when my server's are only transmitting an aggregate total of about 4MB/sec of data to the end client(s), the disk reads are going at 100M/s and over. This insane level of IO eventually locks my CPU because it waits for IO and tasks pile up; eventually my server becomes completely unresponsive, requiring a reboot.
My current fread chunk size is set to 8 * 1024. My question is, will changing the block size and experimenting help at all? The client is only downloading data at an average ~4MB/sec. So why is the disk reading data at 100MB/sec? I've tried every possible solution on the server end; I even swapped the disks with new ones to rule out a potential disk issue. Looks to me like this is a script issue; maybe PHP is reading the entire data from the disk regardless of how much it transfers to the end client?
Any help at all would be appreciated. And if this belongs to ServerFault, then my apologies for posting here. And if you guys need me to post snippets from the actual script, I can do that too.
8 * 1024 bytes? That seems perfectly reasonable and if so your high disk I/O is probably related to concurrent request. Have you considered implementing some sort of bandwidth throttling? Here is a PHP-only implementation I did for my framework, phunction:
public static function Download($path, $speed = null, $multipart = false)
{
if (strncmp('cli', PHP_SAPI, 3) !== 0)
{
if (is_file($path) === true)
{
while (ob_get_level() > 0)
{
ob_end_clean();
}
$file = #fopen($path, 'rb');
$size = sprintf('%u', filesize($path));
$speed = (empty($speed) === true) ? 1024 : floatval($speed);
if (is_resource($file) === true)
{
set_time_limit(0);
session_write_close();
if ($multipart === true)
{
$range = array(0, $size - 1);
if (array_key_exists('HTTP_RANGE', $_SERVER) === true)
{
$range = array_map('intval', explode('-', preg_replace('~.*=([^,]*).*~', '$1', $_SERVER['HTTP_RANGE'])));
if (empty($range[1]) === true)
{
$range[1] = $size - 1;
}
foreach ($range as $key => $value)
{
$range[$key] = max(0, min($value, $size - 1));
}
if (($range[0] > 0) || ($range[1] < ($size - 1)))
{
ph()->HTTP->Code(206, 'Partial Content');
}
}
header('Accept-Ranges: bytes');
header('Content-Range: bytes ' . sprintf('%u-%u/%u', $range[0], $range[1], $size));
}
else
{
$range = array(0, $size - 1);
}
header('Pragma: public');
header('Cache-Control: public, no-cache');
header('Content-Type: application/octet-stream');
header('Content-Length: ' . sprintf('%u', $range[1] - $range[0] + 1));
header('Content-Disposition: attachment; filename="' . basename($path) . '"');
header('Content-Transfer-Encoding: binary');
if ($range[0] > 0)
{
fseek($file, $range[0]);
}
while ((feof($file) !== true) && (connection_status() === CONNECTION_NORMAL))
{
ph()->HTTP->Flush(fread($file, round($speed * 1024)));
ph()->HTTP->Sleep(1);
}
fclose($file);
}
exit();
}
else
{
ph()->HTTP->Code(404, 'Not Found');
}
}
return false;
}
The above method has some minor dependencies and it adds some unnecessary functionality, like multi-part downloads but you should be able to reuse the throttling logic without problems.
// serve file at 4 MBps (max)
Download('/path/to/file.ext', 4 * 1024);
You can even be more generous by default and decrease the $speed depending on the values you get from the first index of sys_getloadavg() to avoid stressing your CPU.
Generally, it can happen that actual I/O is faster than userspace I/O because of prefetching and filesystem overhead. However, that should never lock your server. The cache size will have little to no impact on that as long as it's between a 1KiB and, say, 16MiB. However, instead of using php to stream files, you should really consider the much more optimized readfile.
That being said, barring a serious programming error, this behavior is probably not directly related to your small loop. First, you should use iotop to find out which program is actually causing the I/O. If it's php (how many concurrent scripts? Sorry, the screenshots seem to be completely garbled and show next to no useful information), rule out you're using output buffering and have a look at memory consumption as well as the various php tuning parameters (phpinfo has a good overview). By the way, htop is a way nicer alternative to top ;).
Now I've got a script that uses PHP to stream files to the end user.
Just to clarify whats really going on, Apache is responsible for the actual "stream". PHP deals directly with Apache for it's output. Therefore your end user for the PHP script is Apache. Apache then handles the output to the user, which apparently in your case is around ~4MB/sec. Apache however does not have that restriction and can take all of your output at once and then handle a delayed delivery to the client. To prove this you should be able to see your script exit before the stream is delivered. If your script turns around and tries to deliver another file, then your queing up Apache against your server resources.
A better solution may be to allow Apache to handle the file delivery completely by letting the user request the download from an accesible url. Obviously this is limited to static content. To fix your above script, would require delaying some of the file read to allow Apache to deliver the chunks instead of buffering the whole output.
EDIT: If your memory is fine and we can rule out swap drive activity, then it may simply be concurrent file read requests. If we request 5 files at 100mb, well thats 500mb of read activity. Apache will not throttle your script and will in fact buffer all output, which can be over 100mb at a time. This would account for alot of disk i/o activity, because each request results in reading the complete file into the buffer. Utilizing a throttle as suggested by Alix would allow for more concurrent requests, but eventually your going to reach a limit. We can't be sure how fast the user receives the data from Apache, so you might have to find a nice balance for the throttle size to allow for Apache and PHP to work with chunks of your files instead of the whole file.