Download a file from a server - php

At the current time I'm using this code to output a file to a user.
header('Content-type: application/octet-stream');
header('Content-Disposition: attachment; filename='.$or.'');
readfile($file);
The code, however doesn't tell the browser how large the file is. And it can't output large files like 1 gb. I want the code to tell the browser the actual size of the file and be able to output large files

For large files, you need to use chunked transfer. In most cases the underlying web server (Apach/Nginx/WHY) will have facilities to do that. I recommend you use them.
That way your code does not take up a worker thread for ages, and the run-away timer will not cut in in the middle of your down-load (which would upset your users).
btw - You are talking about file download, not upload - that would be user to server. Your tag is wrong.

readfile() will not present any memory issues, even when sending large files, on its own. If you encounter an out of memory error ensure that output buffering is off with ob_get_level().

header("Content-Length: $value");

Related

PHP: filesize() stat failed with link

I'm randomly getting download errors from a link on a page. I also simplified the link to a directory for easy usage in emails for users.
On the main page the link looks like this:
a href="http://myPage.com/Mac" target="_blank" id="macDownloadButton" class="downloadbutton w-button">Download Mac version</a>
On my server, that's a directory with an index.php in it which looks like this:
<?php
// mac version
$file="http://www.myPage.com/downloads/myApp_Mac.zip";
$filename="myApp_Mac.zip";
header('Content-Transfer-Encoding: binary');
header('Accept-Ranges: bytes');
header('Content-Length: ' . filesize($file));
header('Content-Encoding: none');
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename=' . $filename);
readfile($file);
exit;
?>
Again, the reason I do this is so it's a simple link to send to users in email like, "http://myPage.com/Mac" and "http://myPage.com/Windows".
The weird thing is that it mostly works...but sometimes it doesn't.
What am I doing wrong?
It's hard to know precisely what's wrong unless you check for errors on your readfile() call.
But you're invoking your web server from your web server here when you specify a filename starting with http. You're doing
readfile('http://www.myPage.com/downloads/myApp_Mac.zip');
where you could just as easily do
readfile('../downloads/myApp_Mac.zip');
and read the zip file from the local file system to send to your user.
What's more, filesize('../downloads/myApp_Mac.zip'); will yield a numerical value quickly and send it in the Content-Length header. That will allow the browser, by knowing the total size of the file you're sending, to display a meaningful progress bar.
You should remove the Accept-Ranges header; the php program you showed us doesn't honor range requests. If you lie to the browser by telling it you do honor those requests, the browser may get confused and corrupt the downloaded copy of your file. That will baffle your user.
Your Content-Disposition header is perfect. It defines the filename to be used on your user's machine in the downloads folder.
Simple operations are more reliable, and this may help you.
The reason you got stat failed with link as an error message is this: stat(2) is a operating-system call that operates on files in local and mounted file systems.
As previously mentioned by O. Jones you should definitely always use your local file path.
Most of my previous issues have been mostly browser related where I needed to tweak/add a http header, and in one case I needed to send all the HTTP headers in lowercase but I haven't had an issue like that in years. My personal recommendation would be to use a solid download library/function - it will make a noticeable difference to your productivity as well as rule out most browser related issues you may come across.
I have used the codeIgniter download helper for the last 3 years and recommend it for 99% of use cases. At the very least I would recommend your read through it's code - you will probably find a few cases you have never even considered such as clearing the output buffer,mime detection and even a special case for Android 2.1 as well as a few headers you may or may not need.
If all else fails I have no idea what server your running this on but if you continue to have issues I would recommend monitoring which processes your machine is running while paying close attention to ram and IO usage. I've have encountered bad/misbehaving services that run periodically using 99% of my IO or ram for short intervals at a time that caused a few really odd and unexpected errors.

PHP readfile() and large files

When using readfile() -- using PHP on Apache -- is the file immediately read into Apache's output buffer and the PHP script execution completed, or does the PHP script execution wait until the client finishes downloading the file (or the server times out, whichever happens first)?
The longer back-story:
I have a website with lots of large mp3 files (sermons for a local church). Not all files in the audio archive are allowed to be downloaded, so the /sermon/{filename}.mp3 path is rewritten to really execute /sermon.php?filename={filename} and if the file is allowed to be downloaded then the content type is set to "audio/mpeg" and the file streamed out using readfile(). I've been getting complaints (almost exclusively from iPhone users who are streaming the downloads over 3G) that the files don't fully download, or that they cut off after about 10 or 15 minutes. When I switched from streaming out the file with a readfile() to simply redirecting to the file -- header("Location: $file_url"); -- all of the complaints went away (I even checked with a few users who could reliably reproduce the problem on demand previously).
This leads me to suspect that when using readfile() the PHP script engine is in use until the file is fully downloaded but I cannot find any references which confirm or deny this theory. I'll admit I'm more at home in the ASP.NET world and the dotNet equivalent of readfile() pushes the whole file to the IIS output buffer immediately so the ASP.NET execution pipeline can complete independently of the delivery of the file to the end client... is there an equivalent to this behavior with PHP+Apache?
You may still have PHP output buffering active while performing the readfile(). Check that with:
if (ob_get_level()) ob_end_clean();
or
while (ob_get_level()) ob_end_clean();
This way theonly remaining output Buffer should be apache's Output Buffer, see SendBufferSize for apache tweaks.
EDIT
You can also have a look at mod_xsendfile (an SO post on such usage, PHP + apache + x-sendfile), so that you simply tell the web server you have done the security check and that now he can deliver the file.
a few things you can do (I am not reporting all the headers that you need to send that are probably the same ones that you currently have in your script):
set_time_limit(0); //as already mention
readfile($filename);
exit(0);
or
passthru('/bin/cat '.$filename);
exit(0);
or
//When you enable mod_xsendfile in Apache
header("X-Sendfile: $filename");
or
//mainly to use for remove files
$handle = fopen($filename, "rb");
echo stream_get_contents($handle);
fclose($handle);
or
$handle = fopen($filename, "rb");
while (!feof($handle)){
//I would suggest to do some checking
//to see if the user is still downloading or if they closed the connection
echo fread($handle, 8192);
}
fclose($handle);
The script will be running until the user finishes downloading the file. The simplest, most efficient and surely working solution is to redirect the user:
header("Location: /real/path/to/file");
exit;
But this may reveal the location of the files. It's a good idea to password-protect the files that may not be downloaded by everyone anyway with an .htaccess file, but perhaps you use a database to detemine access and this is no option.
Another possible solution is setting the maximum execution time of PHP to 0, which disables the limit:
set_time_limit(0);
Your host may disallow this, though. Also PHP reads the file into the memory first, then goes through Apache's output buffer, and finally makes it to the network. Making users download the file directly is much more efficient, and does not have PHP's limitations like the maximum execution time.
Edit: The reason you get this complaint a lot from iPhone users is probably that they have a slower connection (e.g. 3G).
downloading files thru php isnt very efficient, using a redirect is the way to go. If you dont want to expose the location of the file, or the file isnt in a public location then look into internal redirects, here is a post that talks about it a bit, Can I tell Apache to do an internal redirect from PHP?
Try using stream_copy_to_stream() instead. I find is has fewer problems than readfile().
set_time_limit(0);
$stdout = fopen('php://output', 'w');
$bfname = basename($fname);
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=\"$bfname\"");
$filein = fopen($fname, 'r');
stream_copy_to_stream($filein, $stdout);
fclose($filein);
fclose($stdout);
Under Apache, there is a nice elgant solution not involving php at all:
Just place an .htaccess config file into the folder containing the files to be offered for download with the following contents:
<Files *.*>
ForceType applicaton/octet-stream
</Files>
This tells the Apache to offer all files in this folder (and all its subfolders) for download, instead of directly displaying them in the browser.
See below url
http://php.net/manual/en/function.readfile.php
<?php
$file = 'monkey.gif';
if (file_exists($file)) {
header('Content-Description: File Transfer');
header('Content-Type: application/octet-stream');
header('Content-Disposition: attachment; filename='.basename($file));
header('Content-Transfer-Encoding: binary');
header('Expires: 0');
header('Cache-Control: must-revalidate');
header('Pragma: public');
header('Content-Length: ' . filesize($file));
ob_clean();
flush();
readfile($file);
exit;
}
?>

simultaneously download on server and on user

I am currently developing an application in PHP in which my server (a dedicated server) must to download a file, and the user should download the file in same time.
Here is an example :
Server start to download a file at a time A.
User wants to download this file at the time A + 3 seconds (for example)
I already solved the problem :"If the user downloads the file faster than the server..". But I didn't know how to make a php script in which the user is gonna to download the full file (it means that the size must be the full size of the file, not the size it's currently downloaded at the time A+3seconds). I already make that :
header('Content-Type: application/octet-stream');
header('Content-Disposition: attachment; filename="'.$data['name'].'";');
header('Content-Transfer-Encoding: binary');
header('Content-Length: '.$data['size']);
readfile($remoteFile);
But it doesn't work, the user is gonna download just the size it is currently on the server (which corrupt the file) and not the full file...
If you have any solution, thank you.
You could probably pipe the file manually, by opening the connection and reading until you're past all headers. Then once you've figured out the Content-Length, send that to the user and just echo all remaining data you get (do use flush() and avoid output buffers).
Pseudocode(-ish):
open the file
# grab headers
while you didn't get all HTTP headers:
read more
look for the Content-Length header
send the Content-Length header
# grab the file
while the rest of the request isn't done
read more
send it to the user
flush the buffers
done
Expanding on #Tom answer, you can use cURL to greatly simplify the algorithm by using the CURLOPT_HEADERFUNCTION and CURLOPT_READFUNCTION callbacks - see curl_setopt().
Don't send the content-length header. It's not required assuming you're using http 1.1(your webserver almost certainly does). Drawback is their browser cant show download time/size remaining.

Sporadic error downloading data from PHP application

A PHP application is offering binary data as a download:
header("Content-Type: application/octet-stream");
header("Pragma: public");
header("Cache-Control: private");
header("Content-Disposition: attachment; filename=\"$filename\"");
header("expires: 0");
set_time_limit(0);
ob_clean();
flush();
#readfile($completefilename); exit;
$completefilename is a stream like "ftp://user:pwd#..."
The size of the data can be several MByte. It works fine, but sporadically I get the following error:
It's most likely that the remote stream is occasionally down, or times out.
Also as #fab says it could be that the file you are trying to load is larger than your script's memory.
You should start logging the errors readfile() returns, e.g. using the error_log php.ini directive.
If this needs to be completely foolproof, I think you'll have to use something more refined than readfile() that allows to set a timeout (like curl, or readfile with stream context options).
You could then catch any errors that occur while downloading, and serve a locally hosted fallback document instead. That document could e.g. be a text file containing the message "Resource xyz could not be loaded".
Do you have anything in your error logs?
Maybe PHP is running out of memory because readfile() needs to pull the while file into memory. Make sure memory_limit is larger than the largest file you work on with readfile(). Another option is to output the file in chunks using fread().

LAMP: How to create .Zip of large files for the user on the fly, without disk/CPU thrashing

Often a web service needs to zip up several large files for download by the client. The most obvious way to do this is to create a temporary zip file, then either echo it to the user or save it to disk and redirect (deleting it some time in the future).
However, doing things that way has drawbacks:
a initial phase of intensive CPU and disk thrashing, resulting in...
a considerable initial delay to the user while the archive is prepared
very high memory footprint per request
use of substantial temporary disk space
if the user cancels the download half way through, all resources used in the initial phase (CPU, memory, disk) will have been wasted
Solutions like ZipStream-PHP improve on this by shovelling the data into Apache file by file. However, the result is still high memory usage (files are loaded entirely into memory), and large, thrashy spikes in disk and CPU usage.
In contrast, consider the following bash snippet:
ls -1 | zip -# - | cat > file.zip
# Note -# is not supported on MacOS
Here, zip operates in streaming mode, resulting in a low memory footprint. A pipe has an integral buffer – when the buffer is full, the OS suspends the writing program (program on the left of the pipe). This here ensures that zip works only as fast as its output can be written by cat.
The optimal way, then, would be to do the same: replace cat with a web server process, streaming the zip file to the user with it created on the fly. This would create little overhead compared to just streaming the files, and would have an unproblematic, non-spiky resource profile.
How can you achieve this on a LAMP stack?
You can use popen() (docs) or proc_open() (docs) to execute a unix command (eg. zip or gzip), and get back stdout as a php stream. flush() (docs) will do its very best to push the contents of php's output buffer to the browser.
Combining all of this will give you what you want (provided that nothing else gets in the way -- see esp. the caveats on the docs page for flush()).
(Note: don't use flush(). See the update below for details.)
Something like the following can do the trick:
<?php
// make sure to send all headers first
// Content-Type is the most important one (probably)
//
header('Content-Type: application/x-gzip');
// use popen to execute a unix command pipeline
// and grab the stdout as a php stream
// (you can use proc_open instead if you need to
// control the input of the pipeline too)
//
$fp = popen('tar cf - file1 file2 file3 | gzip -c', 'r');
// pick a bufsize that makes you happy (64k may be a bit too big).
$bufsize = 65535;
$buff = '';
while( !feof($fp) ) {
$buff = fread($fp, $bufsize);
echo $buff;
}
pclose($fp);
You asked about "other technologies": to which I'll say, "anything that supports non-blocking i/o for the entire lifecycle of the request". You could build such a component as a stand-alone server in Java or C/C++ (or any of many other available languages), if you were willing to get into the "down and dirty" of non-blocking file access and whatnot.
If you want a non-blocking implementation, but you would rather avoid the "down and dirty", the easiest path (IMHO) would be to use nodeJS. There is plenty of support for all the features you need in the existing release of nodejs: use the http module (of course) for the http server; and use child_process module to spawn the tar/zip/whatever pipeline.
Finally, if (and only if) you're running a multi-processor (or multi-core) server, and you want the most from nodejs, you can use Spark2 to run multiple instances on the same port. Don't run more than one nodejs instance per-processor-core.
Update (from Benji's excellent feedback in the comments section on this answer)
1. The docs for fread() indicate that the function will read only up to 8192 bytes of data at a time from anything that is not a regular file. Therefore, 8192 may be a good choice of buffer size.
[editorial note] 8192 is almost certainly a platform dependent value -- on most platforms, fread() will read data until the operating system's internal buffer is empty, at which point it will return, allowing the os to fill the buffer again asynchronously. 8192 is the size of the default buffer on many popular operating systems.
There are other circumstances that can cause fread to return even less than 8192 bytes -- for example, the "remote" client (or process) is slow to fill the buffer - in most cases, fread() will return the contents of the input buffer as-is without waiting for it to get full. This could mean anywhere from 0..os_buffer_size bytes get returned.
The moral is: the value you pass to fread() as buffsize should be considered a "maximum" size -- never assume that you've received the number of bytes you asked for (or any other number for that matter).
2. According to comments on fread docs, a few caveats: magic quotes may interfere and must be turned off.
3. Setting mb_http_output('pass') (docs) may be a good idea. Though 'pass' is already the default setting, you may need to specify it explicitly if your code or config has previously changed it to something else.
4. If you're creating a zip (as opposed to gzip), you'd want to use the content type header:
Content-type: application/zip
or... 'application/octet-stream' can be used instead. (it's a generic content type used for binary downloads of all different kinds):
Content-type: application/octet-stream
and if you want the user to be prompted to download and save the file to disk (rather than potentially having the browser try to display the file as text), then you'll need the content-disposition header. (where filename indicates the name that should be suggested in the save dialog):
Content-disposition: attachment; filename="file.zip"
One should also send the Content-length header, but this is hard with this technique as you don’t know the zip’s exact size in advance. Is there a header that can be set to indicate that the content is "streaming" or is of unknown length? Does anybody know?
Finally, here's a revised example that uses all of #Benji's suggestions (and that creates a ZIP file instead of a TAR.GZIP file):
<?php
// make sure to send all headers first
// Content-Type is the most important one (probably)
//
header('Content-Type: application/octet-stream');
header('Content-disposition: attachment; filename="file.zip"');
// use popen to execute a unix command pipeline
// and grab the stdout as a php stream
// (you can use proc_open instead if you need to
// control the input of the pipeline too)
//
$fp = popen('zip -r - file1 file2 file3', 'r');
// pick a bufsize that makes you happy (8192 has been suggested).
$bufsize = 8192;
$buff = '';
while( !feof($fp) ) {
$buff = fread($fp, $bufsize);
echo $buff;
}
pclose($fp);
Update: (2012-11-23) I have discovered that calling flush() within the read/echo loop can cause problems when working with very large files and/or very slow networks. At least, this is true when running PHP as cgi/fastcgi behind Apache, and it seems likely that the same problem would occur when running in other configurations too. The problem appears to result when PHP flushes output to Apache faster than Apache can actually send it over the socket. For very large files (or slow connections), this eventually causes in an overrun of Apache's internal output buffer. This causes Apache to kill the PHP process, which of course causes the download to hang, or complete prematurely, with only a partial transfer having taken place.
The solution is not to call flush() at all. I have updated the code examples above to reflect this, and I placed a note in the text at the top of the answer.
Another solution is my mod_zip module for Nginx, written specifically for this purpose:
https://github.com/evanmiller/mod_zip
It is extremely lightweight and does not invoke a separate "zip" process or communicate via pipes. You simply point to a script that lists the locations of files to be included, and mod_zip does the rest.
Trying to implement a dynamic generated download with lots of files with different sizes i came across this solution but i run into various memory errors like "Allowed memory size of 134217728 bytes exhausted at ...".
After adding ob_flush(); right before the flush(); the memory errors disappear.
Together with sending the headers, my final solution looks like this (Just storing the files inside the zip without directory structure):
<?php
// Sending headers
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename="download.zip"');
header('Content-Transfer-Encoding: binary');
ob_clean();
flush();
// On the fly zip creation
$fp = popen('zip -0 -j -q -r - file1 file2 file3', 'r');
while (!feof($fp)) {
echo fread($fp, 8192);
ob_flush();
flush();
}
pclose($fp);
I wrote this s3 steaming file zipper microservice last weekend - might be useful: http://engineroom.teamwork.com/how-to-securely-provide-a-zip-download-of-a-s3-file-bundle/
According to the PHP manual, the ZIP extension provides a zip: wrapper.
I have never used it and I don't know its internals, but logically it should be able to do what you're looking for, assuming that ZIP archives can be streamed, which I'm not entirely sure of.
As for your question about the "LAMP stack" it shouldn't be a problem as long as PHP is not configured to buffer output.
Edit: I'm trying to put a proof-of-concept together, but it seems not-trivial. If you're not experienced with PHP's streams, it might prove too complicated, if it's even possible.
Edit(2): rereading your question after taking a look at ZipStream, I found what's going to be your main problem here when you say (emphasis added)
the operative Zipping should operate in streaming mode, ie processing files and providing data at the rate of the download.
That part will be extremely hard to implement because I don't think PHP provides a way to determine how full Apache's buffer is. So, the answer to your question is no, you probably won't be able to do that in PHP.
It seems, you can eliminate any output-buffer related problems by using fpassthru(). I also use -0 to save CPU time since my data is compact already. I use this code to serve a whole folder, zipped on-the-fly:
chdir($folder);
$fp = popen('zip -0 -r - .', 'r');
header('Content-Type: application/octet-stream');
header('Content-disposition: attachment; filename="'.basename($folder).'.zip"');
fpassthru($fp);
I just released a ZipStreamWriter class written in pure PHP userland here:
https://github.com/cubiclesoft/php-zipstreamwriter
Instead of using external applications (e.g. zip) or extensions like ZipArchive, it supports streaming data into and out of the class by implementing a full-blown ZIP writer.
How the streaming aspect works is by using the ZIP file format's "Data Descriptors" as described by section 4.3.5 of the PKWARE ZIP file specification:
4.3.5 File data MAY be followed by a "data descriptor" for the file.
Data descriptors are used to facilitate ZIP file streaming.
There are some possible limitations to be aware of though. Not every tool can read streaming ZIP files. Also, support for Zip64 streaming ZIP files may have even less support but that's only of concern for files over 2GB with this class. However, both 7-Zip and the Windows 10 built-in ZIP file reader seem to be fine with handling all of crazy files that the ZipStreamWriter class threw at them. The hex editor I use got a good workout too.
When using the ZipStreamWriter class, I recommend allowing a buffer to build up to at least 4KB but no more than 65KB at a time before sending it on to the web server. Otherwise, for lots of really tiny files, you'll be flushing out tiny bits of piecemeal data and waste a bunch of extra CPU cycles on the Apache callback end of things.
When something doesn't exist or I don't like the existing options, I find both official and unofficial specifications, some examples to work with, and then I build it from scratch. It's a fairly solid approach to problem solving, if just a tad overkill.

Categories