PHP performance file_get_contents() vs readfile() and cat

PHP performance file_get_contents() vs readfile() and cat - php

I am doing some benchmarking with PHP file reading functions just for my overall knowledge.
So I tested three different ways to read the whole content of a file that I thought would be very fast.
file_get_contents() well-know for its very high performance
readfile() known to be a very good alternative to file_get_contents() when it comes to outputting the data directly to stdout
exec('cat filename') one very handy and fast UNIX command
So here is my benchmarking code, note that I enabled the PHP cache system for readfile() to avoid the direct output that would totally falsify the results.
<?php
/* Using a quick PNG file to benchmark with a big file */
/* file_get_contents() benchmark */
$start = microtime(true);
$foo = file_get_contents("bla.png");
$end = microtime(true) - $start;
echo "file_get_contents() time: " . $end . "s\n";
/* readfile() benchmark */
ob_start();
$start = microtime(true);
readfile('bla.png');
$end = microtime(true) - $start;
ob_end_clean();
echo "readfile() time: " . $end . "s\n";
/* exec('cat') benchmark */
$start = microtime(true);
$bar = exec('cat bla.png');
$end = microtime(true) - $start;
echo "exec('cat filename') time: " . $end . "s\n";
?>
I have ran this code several times to confirm the results shown and every time I had the same order. Here is an example of one of them:
$ php test.php
file_get_contents() time: 0.0006861686706543s
readfile() time: 0.00085091590881348s
exec('cat filename') time: 0.0048539638519287s
As you can see file_get_contents() comes first then arrives readfile() and finally cat
As for cat even though it is a UNIX command (so fast and everything :)) I understand that calling a separate binary may cause the relative high result.
But the thing I have some difficulty to understand is that why is file_get_contents() faster than readfile()? That's about 1.3 times slower after all.
Both functions are built-in and therefore pretty well optimized and since I enabled the cache, readfile() is not "trying" to output the data to stdout but just like file_get_contents() it will put the data inside the RAM.
I am looking for a technical low-level explanation here to understand the pros and cons of file_get_contents() and readfile() besides the fact that one is designed to write directly to stdout whereas the other does a memory allocation inside the RAM.
Thanks in advance.

file_get_contents only loads the data from the file in memory, while both readfile and cat also output the data on the screen, so they just perform more operations.
If you want to compare file_get_contents to the others, add echo before it
Also, you are not freeing the memory allocated for $foo. There is a chance that if you move the file_get_contents as last test, you will get different result.
Additionally, you are using output buffering, which also cause some difference - just try to add the rest of the functions in an output buffering code to remove any differences.
When comparing different functions, the rest of the code should be the same, otherwise you are open to all kinds of influences.

file_get_contents function is generally considered faster than the readfile function when it comes to caching, as it allows data to be stored in the memory cache, whereas readfile writes the data directly to the output buffer, bypassing the memory cache.
This allows for the contents of the file to be easily manipulated and cached in memory, which can result in faster access times compared to the readfile function, which reads the file one chunk at a time and outputs the contents directly to the browser. file_get_contents can take advantage of PHP's memory caching system (opcache).
If in some cases you can't use file_get_contents, you can use the output buffering mechanism in PHP to cache the contents of a file before sending it to the client. This will allow you to use PHP's memory caching system with the readfile function. You can do this by starting an output buffer with the ob_start function before calling readfile, then flushing the buffer with the ob_end_flush function. This way, the contents of the file will be stored in the output buffer, which is part of PHP's memory caching system.

Related

Is stream_get_contents lower level and faster than file_get_contents?

From a comment to this answer I read that "stream_get_contents is low-level" comparing to file_get_contents. However according to Manual, stream_get_contents is
Identical to file_get_contents(), except that stream_get_contents() operates on an already open stream resource and returns the remaining contents in a string, up to maxlength bytes and starting at the specified offset.
Which statement is correct?
Is stream_get_contents really lower level and faster?
Specifically I am interested in reading local files from HD.

I'm late here but it might help others
file_get_contents() loads the file content into memory. It sits there in memory and waits for the program to call echo upon which it will be delivered to the output buffer.
A good usage example is:
echo file_get_contents('file.txt');
stream_get_contents() delivers the content on an already open stream. An example is this:
$handle = fopen('file.txt', 'w+');
echo stream_get_contents($handle);
You could see that stream_get_contents() used an existing stream created by fopen() to get the contents as a string.
file_get_contents() is the more preferred way as it doesn't depend on an open stream, and is efficient with your memory using memory mapping techniques. For external sites reading, you can also set HTTP headers when getting the content. (Refer to https://www.php.net/manual/en/function.file-get-contents.php for more info)
For larger files/resources, stream_get_contents() may be preferred as it delivers the content fractionally as opposed to file_get_contents() where the entire data is dumped in memory.

PHP: How to solve ob_start() in combination imagepng() issue?

I use the following code to create an image and encode it to base64. There is no direct output of the image.
ob_start(); // catching the output buffer
imagepng($imgSignature);
$base64Signature=base64_encode(ob_get_contents());
ob_end_clean();
ob_start started recently to throw error 500 and I have trouble figuring out the issue. The server uses php 5.4.11. I really don't know if it was running the same version as I installed the script, of if the memory runs full. I know that ob_start has changed throughout the php version. I really have a hard time to wrap my head around this. Is the script correct for php 5.4.11?
I really appreciate any help.

I'm not sure how to solve your issue with ob_start(), but I have an alternative for what you are doing that don't envolve output buffers.
imagepng($imgSignature, 'php://memory/file.png');
$base64Signature = base64_encode(file_get_contents('php://memory/file.png'));
This is basically saving the png image to a virtual temporary file that exists only in memory, then you read it back and have the same result.
My theory about your error:
At some point in your code, you will have this image stored multiple times in memory. In the $imgSignature, the internal buffer you created with ob_start(), the buffer you read with ob_get_contents(), and the resulting value of base64_encode(). Pretty much all in one line. God only knows how much memory its using, not to mention you probably allocated more resources before as you were mounting this image.
It is important to not have too much stuff allocated at the same time, specially when dealing with memory consuming resources like images. If you unset() or overwrite variables you no longer need, you will allow the garbage collector to do its job of disposing those unreferenced resources from memory.
For instance, you can change the way this piece of code was written to this:
ob_start();
imagepng($imgSignature);
imagedestroy($imgSignature);
$data = ob_get_contents();
ob_end_clean();
$data = base64_encode($data);
I dropped $imgSignature as soon as I didn't need it anymore, ended and cleaned my buffer as soon I was done getting what I wanted from it, and then disposed $data as I overwrote it with the base64 encoded $data that was really what I wanted.
Now this will use significantly less memory. If you extend this to the rest of your code, or do it at least to the parts that use a lot of memory like the images you loaded or created with the GD2 lib, it should optimize the memory usage of your script giving you that extra space you need.

Output Buffering? Why not?

I found few articles on the Internet that are suggesting use of ob_ functions, all of these emphase benefits, and there are no downsides of using functions mentioned.
My question is what are the downsides of using ob_ functions, or setting ini_set('output_buffering', '1'); ?

The cons of using output buffering entirely depend on the context of your usage.
One of the biggest cons of output buffering is your runtime error messages or warnings may get suppressed, and you may sometimes end up with erroneous data.
Consider this example:
<?php
function render_template() {
ob_start();
// Do some processing
fetch_template_and_render();
do_render();
// end capture
$output = ob_get_clean();
return $output;
}
memchace::set( $some_key, render_template() );
?>
If either of fetch_template_and_render or do_render throw run time errors, they will get dumped into your output, and eventually in this example will end up in the database or cache.
Here are 2 snippets that demonstrate what I mean which you can try for yourself
#1
<?php
echo 1/0;
?>
outputs
Warning: Division by zero on line 1
#2
<?php
ob_start();
echo 1/0;
$var = ob_get_clean();
?>
outputs nothing.
To avoid such cases, you will need to be diligent about error checking and take precautions.
When used diligently, ob_* functions are very powerful and super useful.

There are no major drawbacks, with proper implementation, to the use of output buffering.
Output buffering can allow errors/warnings/notices (except stop errors) to appear in the output without being readily apparent. This is typically resolved with proper error checking, better configuration of the php environment and the implementation of a good error handler (such as one that converts errors to ErrorExceptions which can be caught with try/catch - see Whoops! as an example of an error handler using ErrorExceptions).
Memory can possibly be a drawback, but the output size is typically insignificant for most scripts. An exception to this may be when sending large amounts of data, such as using fpassthru to deliver file content. This can be resolved by turning off the output buffering (ob_end_clean or ob_end_flush) before writing this content to the output.

Memory consumption is the most important drawback. I've recently built a PHP script that outputs a large chunk of XML data of several megabytes. The framework this 'page' was part of used output buffering. With output buffering, you need a memory buffer large enough to contain all the data. In my case it wasn't and the script failed.
If you output the data directly to the client, you don't have this problem. This is escpecially important in cases like this and when throughputting files. In case of generating a 'normal' HTML page, you will likely not use up the whole buffer, although you still need a lot of memory if you have many simultaneous requests.
Without buffering, the data is gone and doesn't trouble your server anymore. As long as the data is buffered, it can be changed or flushed, but actually puts a load on your server.

Here's a pretty good use for ob_ functions:
ob_start("ob_gzhandler");
Provided that the zlib extension is enabled in PHP, that should ensure your output is gz-compressed. It noticably speeds up page transfers for large pages.

Include -- using a string rather than a filename

I'd like to run include on a string rather than a file, but an unaware of how to achieve this.
//This is the desired functionality
include($filename);
//But I want to do something like this instead.
$file_contents = getFileFromCacheOrSomewhereElse($filename);
include($file_contents); // Doens't work...
eval($file_contents); // Also incorrect.
Please note: "eval" is not the same as include -- "include" echos out the contents of the file (and executes any PHP tags) while "eval" executes the string as PHP code.
An example use case is loading a template file from Memcache (as a string), then running include on that string, rather than running include and relying on PHP filecache.

If you can turn on the allow_url_fopen and allow_url_include php.ini settings, then an alternative is the data stream wrapper (manual).
include 'data:text/plain,' . urlencode($file_contents);

eval("?>" . $file_contents . "<?php ");
does it.

Storing PHP code in the memcache is not the best idea.
And evaling it thereafter is even worse.
Any opcode cache, APC or EAccelerator will cache your PHP files on the fly, with no strange efforts like this, and even parse it for the faster execution.
EDIT. Given the voting results after all these years, I assume that this question is attracting only noobs, who have the same strange whim. So I have to repeat: although it defeats your brilliant idea,
just leave your includes as is
They will be cached much better and executed much faster by the internal PHP's opcode cache.

How to write to file in large php application(multiple questions)

What is the best way to write to files in a large php application. Lets say there are lots of writes needed per second. How is the best way to go about this.
Could I just open the file and append the data. Or should i open, lock, write and unlock.
What will happen of the file is worked on and other data needs to be written. Will this activity be lost, or will this be saved. and if this will be saved will is halt the application.
If you have been, thank you for reading!

Here's a simple example that highlights the danger of simultaneous wites:
<?php
for($i = 0; $i < 100; $i++) {
$pid = pcntl_fork();
//only spawn more children if we're not a child ourselves
if(!$pid)
break;
}
$fh = fopen('test.txt', 'a');
//The following is a simple attempt to get multiple threads to start at the same time.
$until = round(ceil(time() / 10.0) * 10);
echo "Sleeping until $until\n";
time_sleep_until($until);
$myPid = posix_getpid();
//create a line starting with pid, followed by 10,000 copies of
//a "random" char based on pid.
$line = $myPid . str_repeat(chr(ord('A')+$myPid%25), 10000) . "\n";
for($i = 0; $i < 1; $i++) {
fwrite($fh, $line);
}
fclose($fh);
echo "done\n";
If appends were safe, you should get a file with 100 lines, all of which roughly 10,000 chars long, and beginning with an integer. And sometimes, when you run this script, that's exactly what you'll get. Sometimes, a few appends will conflict, and it'll get mangled, however.
You can find corrupted lines with grep '^[^0-9]' test.txt
This is because file append is only atomic if:
You make a single fwrite() call
and that fwrite() is smaller than PIPE_BUF (somewhere around 1-4k)
and you write to a fully POSIX-compliant filesystem
If you make more than a single call to fwrite during your log append, or you write more than about 4k, all bets are off.
Now, as to whether or not this matters: are you okay with having a few corrupt lines in your log under heavy load? Honestly, most of the time this is perfectly acceptable, and you can avoid the overhead of file locking.

I do have high-performance, multi-threaded application, where all threads are writing (appending) to single log file. So-far did not notice any problems with that, each thread writes multiple times per second and nothing gets lost. I think just appending to huge file should be no issue. But if you want to modify already existing content, especially with concurrency - I would go with locking, otherwise big mess can happen...

If concurrency is an issue, you should really be using databases.

If you're just writing logs, maybe you have to take a look in syslog function, since syslog provides an api.
You should also delegate writes to a dedicated backend and do the job in an asynchroneous maneer ?

These are my 2p.
Unless a unique file is needed for a specific reason, I would avoid appending everything to a huge file. Instead, I would wrap the file by time and dimension. A couple of configuration parameters (wrap_time and wrap_size) could be defined for this.
Also, I would probably introduce some buffering to avoid waiting the write operation to be completed.
Probably PHP is not the most adapted language for this kind of operations, but it could still be possible.

Use flock()
See this question

If you just need to append data, PHP should be fine with that as filesystem should take care of simultaneous appends.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP performance file_get_contents() vs readfile() and cat - php

Related

Is stream_get_contents lower level and faster than file_get_contents?

PHP: How to solve ob_start() in combination imagepng() issue?

Output Buffering? Why not?

Include -- using a string rather than a filename

How to write to file in large php application(multiple questions)

Categories

Resources