I'm wondering, how to compress my output best.
Usually I just add ob_start('ob_gzhandler') at the top of my index.php, to compress the whole output.
I'm using a simple caching-Class to store the generated HTML in a file (index.cache.htm) instead of rebuilding it on every refresh. The content of index.cache.htm is minified due to a better performance.
Couldn't I compress the cached content instead of using ob_start('ob_gzhandler')?
Example 1 (caching the buffered output):
ob_start(); // start buffer
$b = ob_get_contents(); // get buffer
ob_end_clean(); // free buffer
$b = CustomHTMLMinifyFunction($b); // minify the HTML
$b = gzcompress($b); // compress the HTML
file_put_contents('index.cache.htm', $b); // save file
Example 2 (caching explicit data):
$d = 'Some data, e.g. JSON'; // some data
$d = gzcompress($d); // compress data
file_put_contents('data.cache.txt', $d); // save file
What's the difference or best practise? Thanks in advance!
Edit: Does it ever make sense to store the compressed data in a file? Or is it only useful while outputting the data?
ob_start:
The [callback] function will be called when the output buffer is flushed (sent) or cleaned (with ob_flush(), ob_clean() or similar function) or when the output buffer is flushed to the browser at the end of the request.
In other words, ob_get_contents() will return the output buffer uncompressed contents:
$log = 0;
function callback($input){
global $log;
$log += 1;
return ob_gzhandler($input);
}
ob_start('callback');
$ob = ob_get_contents();
echo $log; // echoes 0, callback function was not called
You must compress the output of ob_get_contents() if you want to cache a compressed version of the output data.
But you must configure your web server so that it is aware the files are pre-compressed (instructions for Apache). You can't just send compressed files to your client without setting proper headers.
To answer your edit, it makes sense to pre-compress your cache, otherwise the content is compressed on the fly. Also keep in mind that some clients do not support compression: you should keep an uncompressed version of your files if you want to be able to serve them.
Related
Is it possible to know how many bytes sent to the client browser using php? My pages are created dynamically, so the size isn't fixed.
Using php's output buffering
// start output buffering
ob_start();
// create your page
// once the page is ready, measure the size of the output buffer
$length = ob_get_length();
// and emit the page, stop buffering and flush the buffer
ob_get_flush();
As usual with php, these functions are pretty well documented in the standard documentation, don't forget to read the user contributed notes.
You can see this in your webserver's access log file.
But you can also code some php to get an answer like this:
ob_start();
echo "your content"
$data = ob_get_contents();
$size = strlen($data);
see also: Measure string size in Bytes in php
I have the code below to output a big file, but it's falling over because PHP's memory use seems to grow and grow as the file is read:
<?php
// various header() calls etc.
$stream = fopen($tarfile,'r');
ob_end_flush();
while (!feof($stream)) {
$buf = fread($stream, 4096);
print $buf;
flush();
unset($buf);
$aa_usage = memory_get_usage(TRUE); // ← this keeps going up!
}
fclose($stream);
I had thought that by the combination of flush and unset the additional memory use would be limited to the 4k buffer, but I'm clearly wrong.
If all you need is to output the content of a file then the right tool to do it is the PHP function readfile(). Replace all the code you posted with:
readfile($tarfile);
As the documentation says:
Note:
readfile() will not present any memory issues, even when sending large files, on its own. If you encounter an out of memory error ensure that output buffering is off with ob_get_level().
You can try load only as much data as you need first, and if you load more data use the function: fseek()
Is it possible, in PHP, to get all the generated HTML code, at the end of request processing?
What I want to achieve is to be able to retrieve (and, possibly, save/cache) the actual HTML that is about to be sent to users. I can do something similar in ASP.net with a Global.asax filter, that can access to low-level generated html code and modify/access it.
If needed, I can modify the web server settings and/or php interpreter settings (currently the web application runs on Apache+mod_php).
Use output buffering:
<?php
// Start buffering (no output delivered to the browser from now on)
ob_start();
// Generate the HTML
// ...
// Grab the buffer as a variable
$html_output = ob_get_contents();
// If you want to stop buffering and send the buffer to the browser
ob_end_flush();
// OR if you want to stop buffering and throw away the buffer
ob_end_clean();
Potential issues
There is a potential user impact as (depending on your web server) your page output is streamed to the user's browser as it's outputted (why you can start seeing really large pages before they've finished loading). But if you use the output buffer the user will only see the result after you've stopped buffering and outputted it.
Also, because you're buffering and not streaming your server will need to store what you're buffering which will use up additional memory (not a problem unless you're generating really large pages that exceed the memory limits of your PHP memory limit).
To avoid running out of memory you can chunk your buffering and write it to disc (or flush it to the user) at specific chunk sizes using a callback like this:
<?php
// The callback function each time we want to deal with a chunk of the buffer
$callback = function ($buffer, $flag) {
// Cache the next part of the buffer to file?
file_put_contents('page.cache', $buffer, FILE_APPEND & LOCK_EX);
// $flag contains which action is performing the callback.
// We could be ending due to the final flush and not because
// the buffer size limit was reached. PHP_OUTPUT_HANDLER_END
// means an ob_end_*() function has been called.
if ($flag == PHP_OUTPUT_HANDLER_END) {
// Do something different
}
// We could echo out this chunk if we want
echo $buffer;
// Whatever we return from this function is the new buffer
return '';
};
// Pass the buffer to $callback each time it reaches 1024 bytes
ob_start($callback, 1024)
// Generate the HTML
// ...
ob_end_clean();
I think what you would want to use is output buffering! At the start of your page use: ob_start();
At the end of the page you send to the client / browser using something like : ob_end_flush();
Before it is sent you can record that buffer to the db or text file
I'm having a hard time figuring out the problem in the following code, I really need a solution to this.
Consider the following code :
<?php
//starting a new output buffer, with a GZIP compression
ob_start("ob_gzhandler");
//this goes into the buffer
echo "Hi";
//grabbing the buffer's content
$content = ob_get_contents();
//cleaning the buffer
ob_clean();
//we're still inside the buffer, show the content again
echo $content;
This code fails to output "Hi", instead I see "‹óÈM", what have done that broke correct buffering? Knowing that once I remove "ob_gzhandler", the buffering is correct and everything is fine. I don't want to create another buffer and destroy the current one. I just want to clean the current one using ob_clean.
Any ideas? Thanks in advance.
Thank you for your answer, I figured out why, GZIP is insalled on my machine by the way, it's that when setting ob_gzhandler, the buffer is compressed chunk by chunk, so when using ob_get_contents(), parts of the last chunck are missing, and I end up getting weird output.
To correct that behaviour (or at least to bypass it), open a second output buffer, and leave the one with gzhandler() alone.
Like this
ob_start("ob_gzhandler");
ob_start();
Now the second one isn't compressed, I can do whatever I want with it (hence get its content, clean it etc). The content will be compressed anyway given that a higher level output buffer with gzhandler is opened.
Maybe you don't have gzip compression enabled/installed on your machine.
Tried your code and got something like that. I don't have gzip installed on my machine, try this:
It's your code but with a condition, if gzip doesn't start, the buffer starts.
//starting a new output buffer, with a GZIP compression
if (!ob_start("ob_gzhandler")) ob_start();
//this goes into the buffer
echo "Hi";
//grabbing the buffer's content
$content = ob_get_contents();
//cleaning the buffer
ob_clean();
//we're still inside the buffer, show the content again
echo "<pre>"; echo $content; echo "</pre>";
ob_end_flush();
If you get "Hi", maybe gzip is not installed.
Is ob_start() used for output buffering so that the headers are buffered and not sent to the browser? Am I making sense here? If not then why should we use ob_start()?
Think of ob_start() as saying "Start remembering everything that would normally be outputted, but don't quite do anything with it yet."
For example:
ob_start();
echo("Hello there!"); //would normally get printed to the screen/output to browser
$output = ob_get_contents();
ob_end_clean();
There are two other functions you typically pair it with: ob_get_contents(), which basically gives you whatever has been "saved" to the buffer since it was turned on with ob_start(), and then ob_end_clean() or ob_flush(), which either stops saving things and discards whatever was saved, or stops saving and outputs it all at once, respectively.
I use this so I can break out of PHP with a lot of HTML but not render it. It saves me from storing it as a string which disables IDE color-coding.
<?php
ob_start();
?>
<div>
<span>text</span>
link
</div>
<?php
$content = ob_get_clean();
?>
Instead of:
<?php
$content = '<div>
<span>text</span>
link
</div>';
?>
The accepted answer here describes what ob_start() does - not why it is used (which was the question asked).
As stated elsewhere ob_start() creates a buffer which output is written to.
But nobody has mentioned that it is possible to stack multiple buffers within PHP. See ob_get_level().
As to the why....
Sending HTML to the browser in larger chunks gives a performance benefit from a reduced network overhead.
Passing the data out of PHP in larger chunks gives a performance and capacity benefit by reducing the number of context switches required
Passing larger chunks of data to mod_gzip/mod_deflate gives a performance benefit in that the compression can be more efficient.
buffering the output means that you can still manipulate the HTTP headers later in the code
explicitly flushing the buffer after outputting the [head]....[/head] can allow the browser to begin marshaling other resources for the page before HTML stream completes.
Capturing the output in a buffer means that it can redirected to other functions such as email, or copied to a file as a cached representation of the content
You have it backwards. ob_start does not buffer the headers, it buffers the content. Using ob_start allows you to keep the content in a server-side buffer until you are ready to display it.
This is commonly used to so that pages can send headers 'after' they've 'sent' some content already (ie, deciding to redirect half way through rendering a page).
I prefer:
ob_start();
echo("Hello there!");
$output = ob_get_clean(); //Get current buffer contents and delete current output buffer
this is to further clarify JD Isaaks answer ...
The problem you run into often is that you are using php to output html from many different php sources, and those sources are often, for whatever reason, outputting via different ways.
Sometimes you have literal html content that you want to directly output to the browser; other times the output is being dynamically created (server-side).
The dynamic content is always(?) going to be a string. Now you have to combine this stringified dynamic html with any literal, direct-to-display html ... into one meaningful html node structure.
This usually forces the developer to wrap all that direct-to-display content into a string (as JD Isaak was discussing) so that it can be properly delivered/inserted in conjunction with the dynamic html ... even though you don't really want it wrapped.
But by using ob_## methods you can avoid that string-wrapping mess. The literal content is, instead, output to the buffer. Then in one easy step the entire contents of the buffer (all your literal html), is concatenated into your dynamic-html string.
(My example shows literal html being output to the buffer, which is then added to a html-string ... look also at JD Isaaks example to see string-wrapping-of-html).
<?php // parent.php
//---------------------------------
$lvs_html = "" ;
$lvs_html .= "<div>html</div>" ;
$lvs_html .= gf_component_assembler__without_ob( ) ;
$lvs_html .= "<div>more html</div>" ;
$lvs_html .= "----<br/>" ;
$lvs_html .= "<div>html</div>" ;
$lvs_html .= gf_component_assembler__with_ob( ) ;
$lvs_html .= "<div>more html</div>" ;
echo $lvs_html ;
// 02 - component contents
// html
// 01 - component header
// 03 - component footer
// more html
// ----
// html
// 01 - component header
// 02 - component contents
// 03 - component footer
// more html
//---------------------------------
function gf_component_assembler__without_ob( )
{
$lvs_html = "<div>01 - component header</div>" ; // <table ><tr>" ;
include( "component_contents.php" ) ;
$lvs_html .= "<div>03 - component footer</div>" ; // </tr></table>" ;
return $lvs_html ;
} ;
//---------------------------------
function gf_component_assembler__with_ob( )
{
$lvs_html = "<div>01 - component header</div>" ; // <table ><tr>" ;
ob_start();
include( "component_contents.php" ) ;
$lvs_html .= ob_get_clean();
$lvs_html .= "<div>03 - component footer</div>" ; // </tr></table>" ;
return $lvs_html ;
} ;
//---------------------------------
?>
<!-- component_contents.php -->
<div>
02 - component contents
</div>
Following things are not mentioned in the existing answers :
Buffer size configuration
HTTP Header
and Nesting.
Buffer size configuration for ob_start :
ob_start(null, 4096); // Once the buffer size exceeds 4096 bytes, PHP automatically executes flush, ie. the buffer is emptied and sent out.
The above code improve server performance as PHP will send bigger chunks of data, for example, 4KB (wihout ob_start call, php will send each echo to the browser).
If you start buffering without the chunk size (ie. a simple ob_start()), then the page will be sent once at the end of the script.
Output buffering does not affect the HTTP headers, they are processed in different way. However, due to buffering you can send the headers even after the output was sent, because it is still in the buffer.
ob_start(); // turns on output buffering
$foo->bar(); // all output goes only to buffer
ob_clean(); // delete the contents of the buffer, but remains buffering active
$foo->render(); // output goes to buffer
ob_flush(); // send buffer output
$none = ob_get_contents(); // buffer content is now an empty string
ob_end_clean(); // turn off output buffering
Nicely explained here : https://phpfashion.com/everything-about-output-buffering-in-php
This function isn't just for headers. You can do a lot of interesting stuff with this. Example: You could split your page into sections and use it like this:
$someTemplate->selectSection('header');
echo 'This is the header.';
$someTemplate->selectSection('content');
echo 'This is some content.';
You can capture the output that is generated here and add it at two totally different places in your layout.
No, you are wrong, but the direction fits ;)
The Output-Buffering buffers the output of a script. Thats (in short) everthing after echo or print. The thing with the headers is, that they only can get sent, if they are not already sent. But HTTP says, that headers are the very first of the transmission. So if you output something for the first time (in a request) the headers are sent and you can not set any other headers.