What are the "serious performance implications" of implicit_flush?

What are the "serious performance implications" of implicit_flush? - php

My site's admin section has a bunch of very slow report-generating scripts that echo output line by line as it is generated. To have this output flushed immediately to the browser, instead of the user having to wait for minutes before they see any response, we have output_buffering disabled and we call ob_implicit_flush at the beginning of such scripts.
For convenience, I was considering just turning on the implicit_flush setting in php.ini instead of adding ob_implicit_flush() calls to every script that would benefit from it.
However, the documentation contains the following scary but unexplained remark:
implicit_flush
...
When using PHP within an web environment, turning this option on has serious performance implications and is generally recommended for debugging purposes only.
What are these "serious performance implications", and do they justify the manual's recommendation?

It may or may not be what the manual is hinting at, but one context in which either turning on implicit_flush or calling ob_implicit_flush() has serious performance implications is when using PHP with Apache through mod_php with mod_deflate enabled.
In this context, flush() calls are able to push output all the way through mod_deflate to the browser. If you have any scripts that echo large amounts of data in small chunks, flushing every chunk will cripple mod_deflate's ability to compress your output, quite possibly resulting in a 'compressed' form that is larger than the original content.
As an extreme example, consider this simple script which echoes out a million random numbers:
<?php
header('Content-Type: text/plain');
for ($i=0; $i < 1000000; $i++) {
echo rand();
echo "\n";
}
?>
With output_buffering off and implicit_flush also off (for now), let's hit this in Chrome with the dev tools open:
Note the Size/Content column; the decompressed output is 10.0MB in size, but thanks to mod_deflate's gzip compression, the entire response was compressed down to 4.8MB, roughly halving it in size.
Now hitting exactly the same script with implicit_flush set to On:
Once again, the 'decompressed' output is 10.0MB in size. This time, though, the size of the HTTP response was 28.6MB - mod_deflate's 'compression' has actually trebled the size of the response.
This, for me, is more than enough reason to heed the PHP manual's advice of leaving the implicit_flush config option off, and only using ob_implicit_flush() (or manual flush() calls) in contexts where doing so actually serves a purpose.

Related

What is the difference between output_buffering vs ob_start?

Trying to understand the difference between using "output_buffering = on" in .user.ini and the ob_start function directly in a script?
Does one give me more control over the other in terms of limiting the buffer size, or when the buffer is published to the browser?

PHP flush the output to browser

I work on a PHP project and I use flush().
I did a lot of search and found that PHP sends long outputs of scripts to the browser in chunk parts and does not send all the huge data when the script terminates.
I want to know the size of this data, I mean how many bytes the output must be for PHP to send them to browser.

It's not only PHP that chunks the data; it's actually the job of Apache (or Tomcat etc) to do this. That's why the default is to turn off the "chunking" in PHP and leave it to Apache. Even if you force a flush from PHP, it still can get trapped by Apache. From the manual:
flush() may not be able to override the buffering scheme of your web
server and it has no effect on any client-side buffering in the
browser. It also doesn't affect PHP's userspace output buffering
mechanism. This means you will have to call both ob_flush() and
flush() to flush the ob output buffers if you are using those.
There's a Wikipedia article on transfer encoding / chunking: http://en.wikipedia.org/wiki/Chunked_transfer_encoding
Apache gets more complicated with GZIP or deflate encoding; you'll need to hit an apache server as to how you chan configure it.

i think you are wrong
see this code
echo str_repeat(' ',1024);
for($i=0;$i<10;$i++){
echo $i;
flush();
sleep(1);
if you run it see that every 1 byte sent to browser and print
//the str_repeat is for browser buffer for showing data and nothing else

How do echo out progress on MAMP using flush()

I'm trying to run a simple PHP script on MAMP. I'm using PHP 5.2.17 and I have compression turned off. Im trying to execute this simple script
<?php
ob_flush();
foreach(range(1,9) as $n){
echo $n."\n";
flush();
sleep(1);
}
For sme reason this is not doing what it is supposed to. Rather than sequentially echoing out the numbers, it's simply echoing them out when the loop is done. Am I missing something? is there another way to do this?

Output buffering is a mechanism for controlling how much output data
(excluding headers and cookies) PHP should keep internally before
pushing that data to the client. If your application's output exceeds
this setting, PHP will send that data in chunks of roughly the size
you specify. Turning on this setting and managing its maximum buffer
size can yield some interesting side-effects depending on your
application and web server. You may be able to send headers and
cookies after you've already sent output through print or echo. You
also may see performance benefits if your server is emitting less
packets due to buffered output versus PHP streaming the output as it
gets it. On production servers, 4096 bytes is a good setting for
performance reasons.
Note: Output buffering can also be controlled via Output Buffering Control
functions.
php.ini Possible Values:
On = Enabled and buffer is unlimited. (Use with caution)
Off = Disabled
Integer = Enables the buffer and sets its maximum size in bytes.
eg: output_buffering = Off
Note: This directive is hardcoded to Off for the CLI SAPI
http://php.net/output-buffering
A Working example if output_buffering is set to 4096
<?php
ob_start();
// Output string to overflow browser php.ini output_buffering setting.
echo str_repeat(PHP_EOL, 4097);
for ($i=0; $i<5; $i++) {
echo PHP_EOL.$i;
ob_flush();
flush();
sleep(1);
}
ob_end_flush();
?>

Indeed, it is the buffer size
PHP buffer why \r\n
This example worked for me

php flush not working

My flush mechanism stopped working, i'm not sure why.
I'm trying to run a simple flush example now, with no luck:
echo "before sleep";
flush();
sleep(5);
echo "after sleep";
after doing some reading, and understanding ngin x was was installed on my server lately, I requested it to be disabled for my domain. (the server admin said he disabled it for this specific domain)
also, i tried disabling gzip, added these lines to .htaccess
SetOutputFilter DEFLATE
SetEnv no-gzip dont-vary
also, tried adding these to my php file
ini_set('output_buffering','on');
ini_set('zlib.output_compression', 0);
nothing helps. its sleeping 5 seconds and then displaying all the content together.
I've been using it before, and have been using also through the output buffer (ob_start, ob_flush etc., now just trying to make the simplest example work)

"Stopped working" is a pretty high level. You should actually take a look what works or not to find out more.
This can be done by monitoring the network traffic. You will see how much of the response is already done and in which encoding it's send.
If the response is getting compressed, most compression functions need a certain number of bytes before they can compress them. So even you do a flush() to signal PHP to flush the output buffer, there still can be a place either within PHP output filtering or the server waiting for more to do the compression. So next to compression done by apache, check if your PHP configuration does compression as well and disable it.
If you don't want to monitor your network traffic, the curl command-line utility is doing a pretty well job to display what's going on as well and it might be easier to use it instead of network monitoring.
curl -Ni --raw URL
Make sure you use the -N switch which will disable buffering by curl so you see your scripts/servers output directly.
Please see the section Inspecting HTTP Compression Problems with Curl in a previous answer of mine that shows some curl commands to look into the output of a request while it's done with compression as well.
curl is able to show you eventually compressed data uncompressed and you can disable compression per request, so regardless of the server or PHP output compression settings, you can test more differentiated.

<?php
ini_set('zlib.output_handler', '');
ini_set('zlib.output_compression', 0);
ini_set('output_handler', '');
ini_set('output_buffering', false);
ini_set('implicit_flush', true);
apache_setenv( 'no-gzip', '1' );
for($i = 0; $i < 5; $i++){
echo str_repeat(chr(0), 4096); #flood apache some null bytes so it feels the packet is big enough to be sent...
echo "$i<br/>";
flush();
sleep(1);
}
?>

LAMP: How to create .Zip of large files for the user on the fly, without disk/CPU thrashing

Often a web service needs to zip up several large files for download by the client. The most obvious way to do this is to create a temporary zip file, then either echo it to the user or save it to disk and redirect (deleting it some time in the future).
However, doing things that way has drawbacks:
a initial phase of intensive CPU and disk thrashing, resulting in...
a considerable initial delay to the user while the archive is prepared
very high memory footprint per request
use of substantial temporary disk space
if the user cancels the download half way through, all resources used in the initial phase (CPU, memory, disk) will have been wasted
Solutions like ZipStream-PHP improve on this by shovelling the data into Apache file by file. However, the result is still high memory usage (files are loaded entirely into memory), and large, thrashy spikes in disk and CPU usage.
In contrast, consider the following bash snippet:
ls -1 | zip -# - | cat > file.zip
# Note -# is not supported on MacOS
Here, zip operates in streaming mode, resulting in a low memory footprint. A pipe has an integral buffer – when the buffer is full, the OS suspends the writing program (program on the left of the pipe). This here ensures that zip works only as fast as its output can be written by cat.
The optimal way, then, would be to do the same: replace cat with a web server process, streaming the zip file to the user with it created on the fly. This would create little overhead compared to just streaming the files, and would have an unproblematic, non-spiky resource profile.
How can you achieve this on a LAMP stack?

You can use popen() (docs) or proc_open() (docs) to execute a unix command (eg. zip or gzip), and get back stdout as a php stream. flush() (docs) will do its very best to push the contents of php's output buffer to the browser.
Combining all of this will give you what you want (provided that nothing else gets in the way -- see esp. the caveats on the docs page for flush()).
(Note: don't use flush(). See the update below for details.)
Something like the following can do the trick:
<?php
// make sure to send all headers first
// Content-Type is the most important one (probably)
//
header('Content-Type: application/x-gzip');
// use popen to execute a unix command pipeline
// and grab the stdout as a php stream
// (you can use proc_open instead if you need to
// control the input of the pipeline too)
//
$fp = popen('tar cf - file1 file2 file3 | gzip -c', 'r');
// pick a bufsize that makes you happy (64k may be a bit too big).
$bufsize = 65535;
$buff = '';
while( !feof($fp) ) {
$buff = fread($fp, $bufsize);
echo $buff;
}
pclose($fp);
You asked about "other technologies": to which I'll say, "anything that supports non-blocking i/o for the entire lifecycle of the request". You could build such a component as a stand-alone server in Java or C/C++ (or any of many other available languages), if you were willing to get into the "down and dirty" of non-blocking file access and whatnot.
If you want a non-blocking implementation, but you would rather avoid the "down and dirty", the easiest path (IMHO) would be to use nodeJS. There is plenty of support for all the features you need in the existing release of nodejs: use the http module (of course) for the http server; and use child_process module to spawn the tar/zip/whatever pipeline.
Finally, if (and only if) you're running a multi-processor (or multi-core) server, and you want the most from nodejs, you can use Spark2 to run multiple instances on the same port. Don't run more than one nodejs instance per-processor-core.
Update (from Benji's excellent feedback in the comments section on this answer)
1. The docs for fread() indicate that the function will read only up to 8192 bytes of data at a time from anything that is not a regular file. Therefore, 8192 may be a good choice of buffer size.
[editorial note] 8192 is almost certainly a platform dependent value -- on most platforms, fread() will read data until the operating system's internal buffer is empty, at which point it will return, allowing the os to fill the buffer again asynchronously. 8192 is the size of the default buffer on many popular operating systems.
There are other circumstances that can cause fread to return even less than 8192 bytes -- for example, the "remote" client (or process) is slow to fill the buffer - in most cases, fread() will return the contents of the input buffer as-is without waiting for it to get full. This could mean anywhere from 0..os_buffer_size bytes get returned.
The moral is: the value you pass to fread() as buffsize should be considered a "maximum" size -- never assume that you've received the number of bytes you asked for (or any other number for that matter).
2. According to comments on fread docs, a few caveats: magic quotes may interfere and must be turned off.
3. Setting mb_http_output('pass') (docs) may be a good idea. Though 'pass' is already the default setting, you may need to specify it explicitly if your code or config has previously changed it to something else.
4. If you're creating a zip (as opposed to gzip), you'd want to use the content type header:
Content-type: application/zip
or... 'application/octet-stream' can be used instead. (it's a generic content type used for binary downloads of all different kinds):
Content-type: application/octet-stream
and if you want the user to be prompted to download and save the file to disk (rather than potentially having the browser try to display the file as text), then you'll need the content-disposition header. (where filename indicates the name that should be suggested in the save dialog):
Content-disposition: attachment; filename="file.zip"
One should also send the Content-length header, but this is hard with this technique as you don’t know the zip’s exact size in advance. Is there a header that can be set to indicate that the content is "streaming" or is of unknown length? Does anybody know?
Finally, here's a revised example that uses all of #Benji's suggestions (and that creates a ZIP file instead of a TAR.GZIP file):
<?php
// make sure to send all headers first
// Content-Type is the most important one (probably)
//
header('Content-Type: application/octet-stream');
header('Content-disposition: attachment; filename="file.zip"');
// use popen to execute a unix command pipeline
// and grab the stdout as a php stream
// (you can use proc_open instead if you need to
// control the input of the pipeline too)
//
$fp = popen('zip -r - file1 file2 file3', 'r');
// pick a bufsize that makes you happy (8192 has been suggested).
$bufsize = 8192;
$buff = '';
while( !feof($fp) ) {
$buff = fread($fp, $bufsize);
echo $buff;
}
pclose($fp);
Update: (2012-11-23) I have discovered that calling flush() within the read/echo loop can cause problems when working with very large files and/or very slow networks. At least, this is true when running PHP as cgi/fastcgi behind Apache, and it seems likely that the same problem would occur when running in other configurations too. The problem appears to result when PHP flushes output to Apache faster than Apache can actually send it over the socket. For very large files (or slow connections), this eventually causes in an overrun of Apache's internal output buffer. This causes Apache to kill the PHP process, which of course causes the download to hang, or complete prematurely, with only a partial transfer having taken place.
The solution is not to call flush() at all. I have updated the code examples above to reflect this, and I placed a note in the text at the top of the answer.

Another solution is my mod_zip module for Nginx, written specifically for this purpose:
https://github.com/evanmiller/mod_zip
It is extremely lightweight and does not invoke a separate "zip" process or communicate via pipes. You simply point to a script that lists the locations of files to be included, and mod_zip does the rest.

Trying to implement a dynamic generated download with lots of files with different sizes i came across this solution but i run into various memory errors like "Allowed memory size of 134217728 bytes exhausted at ...".
After adding ob_flush(); right before the flush(); the memory errors disappear.
Together with sending the headers, my final solution looks like this (Just storing the files inside the zip without directory structure):
<?php
// Sending headers
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename="download.zip"');
header('Content-Transfer-Encoding: binary');
ob_clean();
flush();
// On the fly zip creation
$fp = popen('zip -0 -j -q -r - file1 file2 file3', 'r');
while (!feof($fp)) {
echo fread($fp, 8192);
ob_flush();
flush();
}
pclose($fp);

I wrote this s3 steaming file zipper microservice last weekend - might be useful: http://engineroom.teamwork.com/how-to-securely-provide-a-zip-download-of-a-s3-file-bundle/

According to the PHP manual, the ZIP extension provides a zip: wrapper.
I have never used it and I don't know its internals, but logically it should be able to do what you're looking for, assuming that ZIP archives can be streamed, which I'm not entirely sure of.
As for your question about the "LAMP stack" it shouldn't be a problem as long as PHP is not configured to buffer output.
Edit: I'm trying to put a proof-of-concept together, but it seems not-trivial. If you're not experienced with PHP's streams, it might prove too complicated, if it's even possible.
Edit(2): rereading your question after taking a look at ZipStream, I found what's going to be your main problem here when you say (emphasis added)
the operative Zipping should operate in streaming mode, ie processing files and providing data at the rate of the download.
That part will be extremely hard to implement because I don't think PHP provides a way to determine how full Apache's buffer is. So, the answer to your question is no, you probably won't be able to do that in PHP.

It seems, you can eliminate any output-buffer related problems by using fpassthru(). I also use -0 to save CPU time since my data is compact already. I use this code to serve a whole folder, zipped on-the-fly:
chdir($folder);
$fp = popen('zip -0 -r - .', 'r');
header('Content-Type: application/octet-stream');
header('Content-disposition: attachment; filename="'.basename($folder).'.zip"');
fpassthru($fp);

I just released a ZipStreamWriter class written in pure PHP userland here:
https://github.com/cubiclesoft/php-zipstreamwriter
Instead of using external applications (e.g. zip) or extensions like ZipArchive, it supports streaming data into and out of the class by implementing a full-blown ZIP writer.
How the streaming aspect works is by using the ZIP file format's "Data Descriptors" as described by section 4.3.5 of the PKWARE ZIP file specification:
4.3.5 File data MAY be followed by a "data descriptor" for the file.
Data descriptors are used to facilitate ZIP file streaming.
There are some possible limitations to be aware of though. Not every tool can read streaming ZIP files. Also, support for Zip64 streaming ZIP files may have even less support but that's only of concern for files over 2GB with this class. However, both 7-Zip and the Windows 10 built-in ZIP file reader seem to be fine with handling all of crazy files that the ZipStreamWriter class threw at them. The hex editor I use got a good workout too.
When using the ZipStreamWriter class, I recommend allowing a buffer to build up to at least 4KB but no more than 65KB at a time before sending it on to the web server. Otherwise, for lots of really tiny files, you'll be flushing out tiny bits of piecemeal data and waste a bunch of extra CPU cycles on the Apache callback end of things.
When something doesn't exist or I don't like the existing options, I find both official and unofficial specifications, some examples to work with, and then I build it from scratch. It's a fairly solid approach to problem solving, if just a tad overkill.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.