Compressing content with PHP ob_start() vs Apache Deflate/Gzip? - php

Most sites want to compress their content to save on bandwidth. However, When it comes to apache servers running PHP there are two ways to do it - with PHP or with apache. So which one is faster or easier on your server?
For example, in PHP I run the following function at the start of my pages to enable it:
/**
* Gzip compress page output
* Original function came from wordpress.org
*/
function gzip_compression() {
//If no encoding was given - then it must not be able to accept gzip pages
if( empty($_SERVER['HTTP_ACCEPT_ENCODING']) ) { return false; }
//If zlib is not ALREADY compressing the page - and ob_gzhandler is set
if (( ini_get('zlib.output_compression') == 'On'
OR ini_get('zlib.output_compression_level') > 0 )
OR ini_get('output_handler') == 'ob_gzhandler' ) {
return false;
}
//Else if zlib is loaded start the compression.
if ( extension_loaded( 'zlib' ) AND (strpos($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip') !== FALSE) ) {
ob_start('ob_gzhandler');
}
}
The other option is to use Apache deflate or gzip (both which are very close). To enable them you can add something like this to your .htaccess file.
AddOutputFilterByType DEFLATE text/html text/plain text/xml application/x-httpd-php
Since PHP is a scripting language (which must be loaded by PHP) I would assume that the apache method would be 1) more stable and 2) faster. But assumptions don't have much use in the real world.
After all, you would assume that with the huge financial backing windows has... uh, we won't go there.

We're running... a lot of webservers, handling 60M/uniques/day. Normally this isn't worth mentioning but your question seems based on experience.
We run with doing it in apache. What comes out the other end is the same (or near enough so as to not to matter) regardless of the method you choose.
We choose apache for few reasons:
Zero maintenance, we just turned it on. No one needs to maintain some case structure
Performance, in our tests servers where Apache did the work faired marginally better.
Apache will apply the output filter to everything, as opposed to just PHP. On some occasions there are other types of content being served on the same server, we'd like to compress our .css and .js
One word of warning, some browsers or other applications purposefully mangle the client headers indicating that compression is supported. Some do this to ease their job in terms of client side security (think applications like norton internet security and such). You can either ignore this, or try to add in extra cases to re-write requests to look normal (the browsers do support it, the application or proxy just futzed it to make its own life easier).
Alternatively, if you're using the flush() command to send output to the browser earlier, and you're applying compression you may need to pad the end of your string with whitespace to convince the server to send data early.

Related

PHP or Apache limiting Content-Length in HTML Header?

I have to distribute a huge file to some people (pictures of a prom) via my Apache2/PHP server which is giving me some headaches: Chrome and Firefox both show a filesize of 2GB but the file is actually >4GB, so I started to track things down.
I am doing the following thing in my php script:
header("Content-Length: ".filesize_large($fn));
header("Actual-File-Size: ".filesize_large($fn)); //Debug
readfile($fn);
filesize_large() is returning the correct filesize for >4gb files as a string (yes, even on 32-bit PHP).
Now the interesting part; the actual HTTP header:
Content-Length: 2147483647
Actual-File-Size: 4236525700
So the filesize_large() method is working totally fine, but PHP or Apache somehow limit the value of Content-Length?! Why that?
Apache/2.2.22 x86, PHP 5.3.10 x86, I am using SSL over https
Just so you guys believe me when I say filesize_large() is correct:
function filesize_large($filename)
{
return trim(shell_exec('stat -c %s '.escapeshellarg($filename)));
}
Edit:
Seems like PHP casts the content length to an integer when communicating with apache2 over the sapi interface on 32-bit systems. No workaround sadly except not including the Content-Size in case of files >2GB
Workaround (and actually a far better solution in the first place): Use mod_xsendfile
You have to use 64-bit operation system in order to support long integers for Content-length header.
I would recommend to use Vagrant for development.
Header based on strings, but content length based on int. If take a look here https://books.google.com/books?id=HTo_AmTpQPMC&pg=PA130&lpg=PA130&dq=ap_set_content_length%28r,+r-%3Efinfo.size%29;&source=bl&ots=uNqmcTbKYy&sig=-Wth33sukeEiSnUUwVJPtyHSpXU&hl=en&sa=X&ei=GP0SVdSlFM_jsATWvoGwBQ&ved=0CDEQ6AEwAw#v=onepage&q=ap_set_content_length%28r%2C%20r-%3Efinfo.size%29%3B&f=false
you will see example of ap_set_content_length(); function which was used to serve content-length response. It accepts file length from system function. Try to call php filesize() and you'll probably see the same result.
If you take a look into ap_set_content_length declaration http://ci.apache.org/projects/httpd/trunk/doxygen/group__APACHE__CORE__PROTO.html#ga7ab393c56cf073ce7aadc3b7ca3db7b2
you will see that length declared as apr_off_t.
And here http://svn.haxx.se/dev/archive-2004-01/0871.shtml you can read, that this type depends from compiler options which is 32bit in your case.
I would recommend you to read source code of Apache and PHP projects.

How can I disable gzip inside php code on HHVM? (eg setting content-encoding header)

I'm converting php code to hhvm. One page in particular sometimes needs to flush() a status-message to the browser before sending some emails and a few other slow tasks and then updating the status message.
Before hhvm (using php-fpm and nginx) I used:
header('Content-Encoding: none;');
echo "About to send emails...";
if (ob_get_level() > 0) { ob_end_flush(); }
flush();
// Emails sent here
echo "Emails sent.";
So the content-encoding stops gzip being used, then the flush sends the first message, then the second message is sent when the page ends.
Using HHVM (and nginx), setting the Content-encoding header works (it shows up in the browser), but either hhvm or nginx is ignoring it and sending the page as gzipped content, so the browser interprets the content-encoding=none with binary data.
How can I disable gzip inside php code on HHVM?
(I know I could turn it off in the config files, but I want it kept on for nearly every page load except a few that will run slower.)
While my suggestion would be to have different nginx location paths with different gzip configuration, here's a better alternative solution to achieve what you want to happen.
Better Solution:
It is often referred to as bad practice to keep a connection open (and the browser loading bar spinning) while you're doing work in the background.
Since PHP 5.3.3 there is a method fastcgi_finish_request() which flushes the data and closes the connection, while it continues to work in the background.
Now, this is unfortunately not supported yet on HHVM. However, there is an alternative way of doing this.
HHVM alternative:
You can use register_postsend_function('function_name'); instead. This closes the connection, and the given function will be executed in the background.
Here is an example:
<?php
echo "and ...";
register_postsend_function(function() {
echo "... you should not be seeing this";
sleep("10"); // do a lot of work
});
die();

Caching HTTP responses when they are dynamically created by PHP

I think my question seems pretty casual but bear with me as it gets interesting (at least for me :)).
Consider a PHP page that its purpose is to read a requested file from filesystem and echo it as the response. Now the question is how to enable cache for this page? The thing to point out is that the files can be pretty huge and enabling the cache is to save the client from downloading the same content again and again.
The ideal strategy would be using the "If-None-Match" request header and "ETag" response header in order to implement a reverse proxy cache system. Even though I know this far, I'm not sure if this is possible or what should I return as response in order to implement this technique!
Serving huge or many auxiliary files with PHP is not exactly what it's made for.
Instead, look at X-accel for nginx, X-Sendfile for Lighttpd or mod_xsendfile for Apache.
The initial request gets handled by PHP, but once the download file has been determined it sets a few headers to indicate that the server should handle the file sending, after which the PHP process is freed up to serve something else.
You can then use the web server to configure the caching for you.
Static generated content
If your content is generated from PHP and particularly expensive to create, you could write the output to a local file and apply the above method again.
If you can't write to a local file or don't want to, you can use HTTP response headers to control caching:
Expires: <absolute date in the future>
Cache-Control: public, max-age=<relative time in seconds since request>
This will cause clients to cache the page contents until it expires or when a user forces a page reload (e.g. press F5).
Dynamic generated content
For dynamic content you want the browser to ping you every time, but only send the page contents if there's something new. You can accomplish this by setting a few other response headers:
ETag: <hash of the contents>
Last-Modified: <absolute date of last contents change>
When the browser pings your script again, they will add the following request headers respectively:
If-None-Match: <hash of the contents that you sent last time>
If-Modified-Since: <absolute date of last contents change>
The ETag is mostly used to reduce network traffic as in some cases, to know the contents hash, you first have to calculate it.
The Last-Modified is the easiest to apply if you have local file caches (files have a modification date). A simple condition makes it work:
if (!file_exists('cache.txt') ||
filemtime('cache.txt') > strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
// update cache file and send back contents as usual (+ cache headers)
} else {
header('HTTP/1.0 304 Not modified');
}
If you can't do file caches, you can still use ETag to determine whether the contents have changed meanwhile.

Any way to chunk gzip with Apache and PHP

I have a web application on a site that takes a while (~10 seconds) to complete a portion of the page near the bottom - it has been as optimized as it can be, and caching is not an option.
We have compression enabled on the server via an .htaccess directive SetOutputFilter DEFLATE the problem is this causes the whole page to be held until completion before it starts outputting to the user, this is not optimal as the user sees nothing until the page completes.
I have also tried it via the php ob_start("ob_gzhandler"); method.
Currently I have a <FilesMatch > in my .htaccess restricting this specific script from being compressed.
Basically my question is this - Is there a way to say chunk gzip or deflate so that the user gets it in pieces, so they can see that the page has begun loading?
I would say: no. I think there is now way provided by HTTP.
If you are using the ob_start("ob_gzhandler") method, you can do this - you need to look at the flush and ob_flush functions.
Some sample code - try loading with curl, or use fiddler to inspect the actual http responses
<?php
ob_start('ob_gzhandler');
print "chunk 1";
ob_flush();
flush();
sleep(2);
print "chunk 2";
ob_end_flush();
Unfortunately, browsers don't seem to display this in chunks - I think this is because the data of each chunk is too small. You can verify this effect by calling wget -O - -q http://chunktest/chunktest.php on your test file.
There are some more useful resources here
If the page is that long of a load time, the creative way to handle it is to use a very quick loading page with an ajax call to that long-loading content on the page. We do this for the pages that pull detailed member usage statistics... Other sites, like Adsense for example, do this on their reports page.

PHP gzcompress vs gzopen/gzwrite

I'm writing a PHP script that generates gzipped files. The approach I've been using is to build up a string in PHP and gzcompress() the string before writing it out to a file at the end of the script.
Now I'm testing my script with larger files and running into memory allocation errors. It seems that the result string is becoming too large to hold in memory at one time.
To solve this I've tried to use gzopen() and gzwrite() to avoid allocating a large string in PHP. However, the gzipped file generated with gzwrite() is very different from when I use gzcompress(). I've experimented with different zip levels but it doesn't help. I've also tried using gzdeflate() and end up with the same results as gzwrite(), but still not similar to gzcompress(). It's not just the first two bytes (zlib header) that are different, it's the entire file.
What does gzcompress() do differently from these other gzip functions in PHP? Is there a way I can emulate the results of gzcompress() while incrementally producing the result?
The primary difference is that the gzwrite function initiates zlib with the SYNC_FLUSH option, which will pad the output to a 4 byte boundary (or is it 2), and then a little extra (0x00 0x00 0xff 0xff 0x03).
If you are using these to create Zip files, beware that the default Mac Archive utility does NOT accept this format.
From what I can tell, SYNC_FLUSH is a gzip option, and is not allowed in the PKZip/Info-ZIP format, all .zip files and their derivatives come from.
If you deflate a small file/text, resulting in a single deflate block, and compare it to the same text written with gzwrite, you'll see 2 differences, one of the bytes in the header of the deflate block is different by 1, and the end is padded with the above bytes. If the result is larger than one deflate block, the differences start piling up. It is hard to fix this, as the deflate stream block headers aren't even byte aligned. There is a reason everybody uses the zlib. Few people are brave enough to even attempt to rewrite that format!
I am not 100% certain, but my guess is that gzcompress uses GZIP format, and gzopen/gzwrite use ZLIB. Honestly, I can't tell you what the difference between the two is, but I do know that GZIP uses ZLIB for the actual compression.
It is possible that none of that will matter though. Try creating a gzip file with gzopen/gzwrite and then decompress it using the command-line gzip program. If it works, then using gzopen/gzwrite will work for you.
I ran into a similar problem once - basically there wasn't enough ram allocated to php to do the business.
I ended up saving the string as a text file, then using exec() to gzip the file using the filesystem. Its not an ideal solution but it worked for my situation.
try increasing the memory_limit parameter in your php.ini file
Both gzcompress() and gzopen() use the DEFLATE method for compressing blocks.
But they have different header/trailer.

Categories