I'm using passthru("cat filepath") in my download script. My concern is that it might use a lot of server resource.
What is the difference between directly link a file in a public directory and download a file using passthru("cat filepath") in php?
What is the difference between directly link a file in a public directory and download a file using passthru("cat filepath") in php?
The difference is that linking directly to a file does not invoke PHP, while running a PHP script which in turn runs cat causes, well, both PHP and cat to be invoked. This will take up a moderate amount of extra memory, but won't cause server load under most circumstances.
I was using readfile(), but this function can't be used for files larger than 2gb
You might want to find a better solution than passing all of the file contents through PHP, in that case. Look into X-Sendfile support in your web server software of choice.
Don't use passthru() for that, you're opening yourself to CLI Injection and performance is terrible. readfile() exists just for that.
readfile($filepath);
There is a small overhead when passing through PHP compared to a direct link but we are usually talking of milliseconds. However, the browser will not be able to request a 206 Partial when using readfile() unless you code support for it or use something like PEAR::HTTP_Download.
EDIT: Seems you are using passthru() because apparently readfile() doesn't handle >2GB files properly (I never had that problem with readfile(), in fact I just tested it with a 7.2 GB file and it worked fine). In which case, at least escape your parameters.
function readfile_ext($filepath) {
if(!file_exists($filepath))
return false;
passthru('cat ' . escapeshellarg($filepath));
return true;
}
Instead of passthru('cat filepath'), use the PHP native readfile('filepath'), which has better performance.
Both methods will be slower than simply directly linking to the file though, since PHP has a certain overhead.
Related
In a PHP script intended to work on generic shared LAMP hosting, I am using require() as a function to read data from a file. The data file looks like this:
<?php
# storage.php
return [
// Rotating secrets
'last_rand' => '532e89355b78aafdb85f5f01f0eed20440d6bd9e0a2d6ae1bd17be4e1d7d21c7bb7a822a2077e3f4',
];
And I read it like this:
$data = require($root . '/storage.php');
$lastRand = $data['last_rand'];
In my script, I will read information from a configuration file using this approach. In some cases, the operation will re-write a new config file (using file_put_contents()) and then re-read it, again using require().
This has worked solidly so far, on a variety of LAMP hosts, but today I found a web host where the require() does not seem to notice changes in the file. This host is using PHP 5.6, whereas all the others I have tested are using 7+.
What's extremely odd is that require() gets the old contents of the file even though file_get_contents() can see the new version, as if require() is doing its own internal caching.
Failed fix attempts
I have even tried waiting for require() to catch up, in case some underlying cache needed time to expire:
$ok = true;
$t = microtime(true);
while ($oldRandom != $this->getFileService()->requirePhp($this->getStoragePath())['last_rand'])
{
$elapsedTime = microtime(true) - $t;
if ($elapsedTime > 4) // Wait 4 seconds
{
$ok = false;
break;
}
usleep(1000);
}
That did not work either (the timeout just expires with no change in the results brought back from require()).
However, upon the next run of the PHP script, require() suddenly sees the new file contents.
I've also tried to delete the storage.php file before re-reading it, and that has no effect either! I was really expecting that to help.
Future actions
So, I don't understand this behaviour at all. To work around it I could:
rather than modifying a new config, saving it to file and then reloading it from file, I could refactor my code so that I save the new config to memory, thus avoiding having to reload it
refactor so that I use file_get_contents() rather than require() (for rather dull reasons it's not as convenient, but I'll prefer whatever works reliably).
However, these both feel like I am ignoring a bug, and that I should investigate the cause. I happen to know also that this host (a free one) is nearly always under heavy CPU load. It's a 64 bit server running Linux on the rather old 2.6 kernel, and phpinfo() indicates the CGI/FastCGI server API is in force.
Can anything be done to mitigate this behaviour?
More attempts
Some helpful commenters suggest this problem could be related to opcaching, which seems to fit the general purpose of require(). I've added this code after the file_put_contents() that re-writes the config:
$reset = opcache_reset(); // Returns true
$invalid = opcache_invalidate($this->getStoragePath()); // Returns false
However, it makes no difference - require() stubbornly reads the same value. I have confirmed this by doing a require() before and after the file write, and also a file_get_contents() to get the true contents.
When I originally encountered this error, I was hesitant to work around it, since it felt like a critical bug that ought not be ignored. However, now that the culprit is likely to be opcaching (and thus related to require() specifically), I think working around it by keeping values in memory is not a bad solution. I shall have to remember that require() is only good for one call per HTTP request, even if that does not hold true for all PHP installations.
I am conscious also that if I did try to work around this, I would have to contend with a number of opcache invalidation mechanisms, which is more complexity than I am comfortable with.
I found the same problem with a config file, the only valid solution for now is to change the name of the config file with each modification:
1 - load config file
// get the config file name
$config_file=glob('config/config*.php'))[0] ?? null;
// load the settings
if($config_file) $settings=require_once($config_file);
2 - save the config file when needed
...
$data=['var'=>'value','other-var'=>'other value'];
$vars=var_export($data, TRUE);
// new file name
$file_name='config/config-'.time().'.php';
file_put_contents($file_name,"<?php\n return $vars;");
// delete old config file
if($config_file) #unlink($config_file);
Often a web service needs to zip up several large files for download by the client. The most obvious way to do this is to create a temporary zip file, then either echo it to the user or save it to disk and redirect (deleting it some time in the future).
However, doing things that way has drawbacks:
a initial phase of intensive CPU and disk thrashing, resulting in...
a considerable initial delay to the user while the archive is prepared
very high memory footprint per request
use of substantial temporary disk space
if the user cancels the download half way through, all resources used in the initial phase (CPU, memory, disk) will have been wasted
Solutions like ZipStream-PHP improve on this by shovelling the data into Apache file by file. However, the result is still high memory usage (files are loaded entirely into memory), and large, thrashy spikes in disk and CPU usage.
In contrast, consider the following bash snippet:
ls -1 | zip -# - | cat > file.zip
# Note -# is not supported on MacOS
Here, zip operates in streaming mode, resulting in a low memory footprint. A pipe has an integral buffer – when the buffer is full, the OS suspends the writing program (program on the left of the pipe). This here ensures that zip works only as fast as its output can be written by cat.
The optimal way, then, would be to do the same: replace cat with a web server process, streaming the zip file to the user with it created on the fly. This would create little overhead compared to just streaming the files, and would have an unproblematic, non-spiky resource profile.
How can you achieve this on a LAMP stack?
You can use popen() (docs) or proc_open() (docs) to execute a unix command (eg. zip or gzip), and get back stdout as a php stream. flush() (docs) will do its very best to push the contents of php's output buffer to the browser.
Combining all of this will give you what you want (provided that nothing else gets in the way -- see esp. the caveats on the docs page for flush()).
(Note: don't use flush(). See the update below for details.)
Something like the following can do the trick:
<?php
// make sure to send all headers first
// Content-Type is the most important one (probably)
//
header('Content-Type: application/x-gzip');
// use popen to execute a unix command pipeline
// and grab the stdout as a php stream
// (you can use proc_open instead if you need to
// control the input of the pipeline too)
//
$fp = popen('tar cf - file1 file2 file3 | gzip -c', 'r');
// pick a bufsize that makes you happy (64k may be a bit too big).
$bufsize = 65535;
$buff = '';
while( !feof($fp) ) {
$buff = fread($fp, $bufsize);
echo $buff;
}
pclose($fp);
You asked about "other technologies": to which I'll say, "anything that supports non-blocking i/o for the entire lifecycle of the request". You could build such a component as a stand-alone server in Java or C/C++ (or any of many other available languages), if you were willing to get into the "down and dirty" of non-blocking file access and whatnot.
If you want a non-blocking implementation, but you would rather avoid the "down and dirty", the easiest path (IMHO) would be to use nodeJS. There is plenty of support for all the features you need in the existing release of nodejs: use the http module (of course) for the http server; and use child_process module to spawn the tar/zip/whatever pipeline.
Finally, if (and only if) you're running a multi-processor (or multi-core) server, and you want the most from nodejs, you can use Spark2 to run multiple instances on the same port. Don't run more than one nodejs instance per-processor-core.
Update (from Benji's excellent feedback in the comments section on this answer)
1. The docs for fread() indicate that the function will read only up to 8192 bytes of data at a time from anything that is not a regular file. Therefore, 8192 may be a good choice of buffer size.
[editorial note] 8192 is almost certainly a platform dependent value -- on most platforms, fread() will read data until the operating system's internal buffer is empty, at which point it will return, allowing the os to fill the buffer again asynchronously. 8192 is the size of the default buffer on many popular operating systems.
There are other circumstances that can cause fread to return even less than 8192 bytes -- for example, the "remote" client (or process) is slow to fill the buffer - in most cases, fread() will return the contents of the input buffer as-is without waiting for it to get full. This could mean anywhere from 0..os_buffer_size bytes get returned.
The moral is: the value you pass to fread() as buffsize should be considered a "maximum" size -- never assume that you've received the number of bytes you asked for (or any other number for that matter).
2. According to comments on fread docs, a few caveats: magic quotes may interfere and must be turned off.
3. Setting mb_http_output('pass') (docs) may be a good idea. Though 'pass' is already the default setting, you may need to specify it explicitly if your code or config has previously changed it to something else.
4. If you're creating a zip (as opposed to gzip), you'd want to use the content type header:
Content-type: application/zip
or... 'application/octet-stream' can be used instead. (it's a generic content type used for binary downloads of all different kinds):
Content-type: application/octet-stream
and if you want the user to be prompted to download and save the file to disk (rather than potentially having the browser try to display the file as text), then you'll need the content-disposition header. (where filename indicates the name that should be suggested in the save dialog):
Content-disposition: attachment; filename="file.zip"
One should also send the Content-length header, but this is hard with this technique as you don’t know the zip’s exact size in advance. Is there a header that can be set to indicate that the content is "streaming" or is of unknown length? Does anybody know?
Finally, here's a revised example that uses all of #Benji's suggestions (and that creates a ZIP file instead of a TAR.GZIP file):
<?php
// make sure to send all headers first
// Content-Type is the most important one (probably)
//
header('Content-Type: application/octet-stream');
header('Content-disposition: attachment; filename="file.zip"');
// use popen to execute a unix command pipeline
// and grab the stdout as a php stream
// (you can use proc_open instead if you need to
// control the input of the pipeline too)
//
$fp = popen('zip -r - file1 file2 file3', 'r');
// pick a bufsize that makes you happy (8192 has been suggested).
$bufsize = 8192;
$buff = '';
while( !feof($fp) ) {
$buff = fread($fp, $bufsize);
echo $buff;
}
pclose($fp);
Update: (2012-11-23) I have discovered that calling flush() within the read/echo loop can cause problems when working with very large files and/or very slow networks. At least, this is true when running PHP as cgi/fastcgi behind Apache, and it seems likely that the same problem would occur when running in other configurations too. The problem appears to result when PHP flushes output to Apache faster than Apache can actually send it over the socket. For very large files (or slow connections), this eventually causes in an overrun of Apache's internal output buffer. This causes Apache to kill the PHP process, which of course causes the download to hang, or complete prematurely, with only a partial transfer having taken place.
The solution is not to call flush() at all. I have updated the code examples above to reflect this, and I placed a note in the text at the top of the answer.
Another solution is my mod_zip module for Nginx, written specifically for this purpose:
https://github.com/evanmiller/mod_zip
It is extremely lightweight and does not invoke a separate "zip" process or communicate via pipes. You simply point to a script that lists the locations of files to be included, and mod_zip does the rest.
Trying to implement a dynamic generated download with lots of files with different sizes i came across this solution but i run into various memory errors like "Allowed memory size of 134217728 bytes exhausted at ...".
After adding ob_flush(); right before the flush(); the memory errors disappear.
Together with sending the headers, my final solution looks like this (Just storing the files inside the zip without directory structure):
<?php
// Sending headers
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename="download.zip"');
header('Content-Transfer-Encoding: binary');
ob_clean();
flush();
// On the fly zip creation
$fp = popen('zip -0 -j -q -r - file1 file2 file3', 'r');
while (!feof($fp)) {
echo fread($fp, 8192);
ob_flush();
flush();
}
pclose($fp);
I wrote this s3 steaming file zipper microservice last weekend - might be useful: http://engineroom.teamwork.com/how-to-securely-provide-a-zip-download-of-a-s3-file-bundle/
According to the PHP manual, the ZIP extension provides a zip: wrapper.
I have never used it and I don't know its internals, but logically it should be able to do what you're looking for, assuming that ZIP archives can be streamed, which I'm not entirely sure of.
As for your question about the "LAMP stack" it shouldn't be a problem as long as PHP is not configured to buffer output.
Edit: I'm trying to put a proof-of-concept together, but it seems not-trivial. If you're not experienced with PHP's streams, it might prove too complicated, if it's even possible.
Edit(2): rereading your question after taking a look at ZipStream, I found what's going to be your main problem here when you say (emphasis added)
the operative Zipping should operate in streaming mode, ie processing files and providing data at the rate of the download.
That part will be extremely hard to implement because I don't think PHP provides a way to determine how full Apache's buffer is. So, the answer to your question is no, you probably won't be able to do that in PHP.
It seems, you can eliminate any output-buffer related problems by using fpassthru(). I also use -0 to save CPU time since my data is compact already. I use this code to serve a whole folder, zipped on-the-fly:
chdir($folder);
$fp = popen('zip -0 -r - .', 'r');
header('Content-Type: application/octet-stream');
header('Content-disposition: attachment; filename="'.basename($folder).'.zip"');
fpassthru($fp);
I just released a ZipStreamWriter class written in pure PHP userland here:
https://github.com/cubiclesoft/php-zipstreamwriter
Instead of using external applications (e.g. zip) or extensions like ZipArchive, it supports streaming data into and out of the class by implementing a full-blown ZIP writer.
How the streaming aspect works is by using the ZIP file format's "Data Descriptors" as described by section 4.3.5 of the PKWARE ZIP file specification:
4.3.5 File data MAY be followed by a "data descriptor" for the file.
Data descriptors are used to facilitate ZIP file streaming.
There are some possible limitations to be aware of though. Not every tool can read streaming ZIP files. Also, support for Zip64 streaming ZIP files may have even less support but that's only of concern for files over 2GB with this class. However, both 7-Zip and the Windows 10 built-in ZIP file reader seem to be fine with handling all of crazy files that the ZipStreamWriter class threw at them. The hex editor I use got a good workout too.
When using the ZipStreamWriter class, I recommend allowing a buffer to build up to at least 4KB but no more than 65KB at a time before sending it on to the web server. Otherwise, for lots of really tiny files, you'll be flushing out tiny bits of piecemeal data and waste a bunch of extra CPU cycles on the Apache callback end of things.
When something doesn't exist or I don't like the existing options, I find both official and unofficial specifications, some examples to work with, and then I build it from scratch. It's a fairly solid approach to problem solving, if just a tad overkill.
WARNING: This is a possible exploit. Do not run directly on your server if you're not sure what to do with this.
http://pastehtml.com/view/1b1m2r6.txt
I believe this was uploaded via an insecure upload script. How do I decode and uncompress this code? Running it in the browser might execute it as a shell script, open up a port or something.
I can do a base64 decode online but i couldn't really decompress it.
So there's a string. It's gzipped and base64 encoded, and the code decodes the base64 and then uncompresses it.
When that's done, I am resulted with this:
<? eval(base64_decode('...')); ?>
Another layer of base64, which is 720440 bytes long.
Now, base64 decoding that, we have 506961 bytes of exploit code.
I'm still examining the code, and will update this answer when I have more understanding. The code is huge.
Still reading through the code, and the (very well-done) exploit allows these tools to be exposed to the hacker:
TCP backdoor setup
unauthorised shell access
reading of all htpasswd, htaccess, password and configuration files
log wiping
MySQL access (read, write)
append code to all files matching a name pattern (mass exploit)
RFI/LFI scanner
UDP flooding
kernel information
This is probably a professional PHP-based server-wide exploit toolkit, and seeing as it's got a nice HTML interface and the whole lot, it could be easily used by a pro hacker, or even a script kiddie.
This exploit is called c99shell (thanks Yi Jiang) and it turns out to have been quite popular, being talked about and running for a few years already. There are many results on Google for this exploit.
Looking at Delan's decoded source, it appears to be a full-fledged backdoor providing a web interface that can be used to control the server in various ways. Telling fragments from the source:
echo '<center>Are you sure you want to install an IP:Port proxy on this
website/server?<br />
or
<b>Mass Code Injection:</b><br><br>
Use this to add PHP to the end of every .php page in the directory specified.
or
echo "<br><b>UDP Flood</b><br>Completed with $pakits (" .
round(($pakits*65)/1024, 2) . " MB) packets averaging ".
round($pakits/$exec_time, 2) . " packets per second \n";
or
if (!$fp) {echo "Can't get /etc/passwd for password-list.";}
I'd advise you to scrub that server and reinstall everything from scratch.
I know Delan Azabani has done this, but just so you actually know how he got the data out:
Just in case you're wondering how to decompress this, use base64 -d filename > output to parse base64 strings and gunzip file.name.gz to parse gzipped data.
The trick is in recognising that what you've got is base64 or gunzip and decompressing the right bits.
This way it goes absolutely nowhere near a JS parser or PHP parser.
First, replace the eval with an echo to see what code it would execute if you'd let it.
Send the output of that script to another file, say, test2.php.
In that file, do the same trick again. Run it, and it will output the complete malicious program (it's quite a beast), ~4k lines of hacker's delight.
This is code for php shell.
to decode this
replace replace eval("?>". with print(
run this
php5 file.php > file2.php
then replace eval with print and run in browser. http://loclhost/file2.php
I have a window 2003 dedicated server, i have installed xampp on it.
So i tried to pass download using PHP script such as zina from pancake.org, phpIndexer php functions such as fread, fgets, file, file_get_contents;
If i download let say from apache mod_dirlisting, the speed is 1mbps however on same server with using php, the speed dropped to 30kbps.
Any idea what causing it? should i tweak any php.ini to reflect?
You could try to use readfile function (see PHP doc). fread or file_get_contents read file to memory then you send it by print or echo, readfile reads the file directly to the output buffer, should be faster.
so i have a bunch of files, some can be up to 30-40mb
and i want to use php to handle security of the files, so i can control who has access to them
that means i have a script sort of like this rough example
$has_permission = check_database_for_permission($user, filename);
if ($has_permission) {
header('Content-Type: image/jpeg');
readfile ($filename);
exit;
} else {
// return 401 error
}
i would hate for every request to load the full file into memory, as it would soon chew up all the memory on my server with a few simultaneous requests
so a couple of questions
is readfile the most memory efficient way of doing this?
is there some better method of achieving the same outcome, that i am overlooking?
server: apache/php5
thanks
readfile is the correct way to do this. By all means don't try to read the file yourself and print it to output--that will consume excessive memory. With the readfile function the contents of the file are buffered directly to output, taking up a trivial amount of transitory memory.
The fastest way is when you can relay this to the webserver. The webserver can use the sendfile() call to ask the operating system kernel to directly copy from a file to the network stream.
for instance when using lighttpd there is a way that PHP can signal the server to take over and do the sendfile trick:
http://redmine.lighttpd.net/projects/lighttpd/wiki/X-LIGHTTPD-send-file