Bug in my caching code - php

Here's my code:
$cachefile = "cache/ttcache.php";
if(file_exists($cachefile) && ((time() - filemtime($cachefile)) < 900))
{
include($cachefile);
}
else
{
ob_start();
/*resource-intensive loop that outputs
a listing of the top tags used on the website*/
$fp = fopen($cachefile, 'w');
fwrite($fp, ob_get_contents());
fflush($fp);
fclose($fp);
ob_end_flush();
}
This code seemed like it worked fine at first sight, but I found a bug, and I can't figure out how to solve it. Basically, it seems that after I leave the page alone for a period of time, the cache file empties (either that, or when I refresh the page, it clears the cache file, rendering it blank). Then the conditional sees the now-blank cache file, sees its age as less than 900 seconds, and pulls the blank cache file's contents in place of re-running the loop and refilling the cache.
I catted the cache file in the command line and saw that it is indeed blank when this problem exists.
I tried setting it to 60 seconds to replicate this problem more often and hopefully get to the bottom of it, but it doesn't seem to replicate if I am looking for it, only when I leave the page and come back after a while.
Any help?

In the caching routines that I write, I almost always check the filesize, as I want to make sure I'm not spewing blank data, because I rely on a bash script to clear out the cache.
if(file_exists($cachefile) && (filesize($cachefile) > 1024) && ((time() - filemtime($cachefile)) < 900))
This assumes that your outputted cache file is > 1024 bytes, which, usually it will be if it's anything relatively large. Adding a lock file would be useful as well, as noted in the comments above to avoid multiple processes trying to write to the same lock file.

you can double check the file size with the filesize() function, if it's too small, act as if the cache was old.
if there's no PHP in the file, you may want to either use readfile() for performance reasons to just spit the file back out to the end user.

Related

PHP - Is fstat() cached?

I'd like to know if fstat() is cached.
Documentation says nothing about it.
(PHP 8.1.12)
EDIT:
I made a LOT of tests, I have now an accurate answer for that.
Introduction:
I created two scripts and ran them both simultaneously (locking the file pointer with an exclusive lock (flock($fp, LOCK_EX))* to prevent them from overwriting each other) and on their own; I printed print_r(fstat($fp)) at various stages of a loop that led to writing 5 MB of data. I also used var_dump(realpath_cache_get())
Returns an array of realpath cache entries. The keys are original path entries, and the values are arrays of data items, containing the resolved path, expiration date, and other options kept in the cache.
to detect the realpath cache and ran tests trying to clear and not clear the cache via clearstatcache().
What happened:
I found that the only item that gets cached is 'atime'
time of last access (Unix timestamp)
, but most upsetting of all: it is not put in the realpath cache, consequently nothing will change if you clear it via clearstatcache(), you can only restart the machine or wait for the cache timeout to clear the data (depending on the machine).
Fortunately this doesn't affect me since I only need 'size' element of fstat. Sadly I cannot give you the source code cause I changed it a lot of time to try all possibilities.
Thank you to everyone who spent time to help me.
*NOTE: Use flock in both files, if you use it only in one the other file will be able to write in the not locked file that has been ran before
CONCLUSION
The only cached element of fstat is atime, but it is not cached in the realpath, so clearstatcache() is useless
I made a quick demo that appears to show that it is not cached. However, see the edit, too
$filename = __DIR__.'/test.txt';
#unlink($filename);
touch($filename);
echo 'STAT - Size before write: '.stat($filename)['size'], PHP_EOL;
file_put_contents($filename, 'test');
echo 'STAT - Size after write: '.stat($filename)['size'], PHP_EOL;
clearstatcache();
echo 'STAT - Size after cache clear: '.stat($filename)['size'], PHP_EOL;
#unlink($filename);
touch($filename);
$fp = fopen($filename, 'wb');
echo 'FSTAT - Size before write: '.fstat($fp)['size'], PHP_EOL;
fwrite($fp, 'test');
echo 'FSTAT - Size after write: '.fstat($fp)['size'], PHP_EOL;
clearstatcache();
echo 'FSTAT - Size after cache clear: '.fstat($fp)['size'], PHP_EOL;
Output:
STAT - Size before write: 0
STAT - Size after write: 0
STAT - Size after cache clear: 4
FSTAT - Size before write: 0
FSTAT - Size after write: 4
FSTAT - Size after cache clear: 4
Edit
Per #Barmar, I ran the test again, this time with just an fstat call followed by a sleep(10), then I quickly updated the file with vim manually, and then one final fstat call (all in the same request), and that one came back as cached.
I then ran that again, this time with clearstatcache() before the final fstat, and it did not change. I also tried the tests with both w and r modes for fopen, same results.
So there does appear to be a cache of some sort, but I don't think it is the stat cache.
As a supplement:
I also did exactly such a test as Chris Haas describes in the comment.
/*
* Open test.txt with a text editor
* Run this script
* Add some characters in the editor and save while the script is running
*/
$file = __DIR__.'/../test/test.txt';
$fp = fopen($file,'r');
$size[] = fstat($fp)['size'];
sleep(10);
$size[] = fstat($fp)['size'];
var_dump($size);
//array(2) { [0]=> int(3) [1]=> int(13) }
Same results for PHP on Win10 and a small Linux.

Simple Caching code for PHP Pages Fails - How to invalidate cache when form is Submitted

<?php
// cache - will work online - not locally
// location and prefix for cache files
define('CACHE_PATH', "siteCache/");
// how long to keep the cache files (hours)
define('CACHE_TIME', 12);
// return location and name for cache file
function cache_file()
{
return CACHE_PATH . md5($_SERVER['REQUEST_URI']);
}
// display cached file if present and not expired
function cache_display()
{
$file = cache_file();
// check that cache file exists and is not too old
if(!file_exists($file)) return;
if(filemtime($file) < time() - CACHE_TIME * 3600) return;
// if so, display cache file and stop processing
readfile($file);
exit;
}
// write to cache file
function cache_page($content)
{
if(false !== ($f = #fopen(cache_file(), 'w'))) {
fwrite($f, $content);
fclose($f);
}
return $content;
}
// execution stops here if valid cache file found
cache_display();
// enable output buffering and create cache file
ob_start('cache_page');
?>
This is the cache code that I am using in a dynamic website in db file. And every page contains this code at top.
<?php session_start();
include("db.php"); ?>
Pages are being cached and its working but on form submission, on user login, on variable passing through pages, nothing is happening. Old pages are being displayed. How do I use this caching code so that it may work but site remain functional as well.
I wonder how wordpress plugins do it. Wp Super Cache and W3T Cache cache everything, yet blog remains functional. Should I selectively use it at parts of website.
Like this:
<?php
// TOP of your script
$cachefile = 'cache/'.basename($_SERVER['SCRIPT_URI']);
$cachetime = 120 * 60; // 2 hours
// Serve from the cache if it is younger than $cachetime
if (file_exists($cachefile) && (time() - $cachetime < filemtime($cachefile))) {
include($cachefile);
echo "<!-- Cached ".date('jS F Y H:i', filemtime($cachefile))." -->";
exit;
}
ob_start(); // start the output buffer
// Your normal PHP script and HTML content here
// BOTTOM of your script
$fp = fopen($cachefile, 'w'); // open the cache file for writing
fwrite($fp, ob_get_contents()); // save the contents of output buffer to the file
fclose($fp); // close the file
ob_end_flush(); // Send the output to the browser
?>
But it will not work as well, because its about pageURL (whole page caching), not the selective content from page.
Please advise. Is there any easy script do do this. Pear::Cache_Lite seems good but its looks difficult to implement.
Update: I have used Cache_Lite. Its the same. Caches everything or included php file. There are few configuration options to play with. But if used as a whole, it will also ignore get, post, session data updates...and will show previous cached pages unless they are deleted.
I think you could separate display from logic.
I mean, change the action attribute of the form and point it to a php that does not have cache logic (and you must check the referer or other parameter, using tokens or sessions, or something, to avoid security issues like CSRF).
Other thing that I want to point out is you should look to cache only the most visited pages (i.e. the homepage), generally you don't have a "one size fits all" with caching, and it is better not to worry about pages that don't have speed/load issues. Or it may be better to cache the data if your speed issues comes from a database query (you should profile your application before implementing caching).
Other approach that migth work is checking the request method and disable the cache if it is post (given that all your forms use the POST method) using $_SERVER['REQUEST_METHOD'] == 'POST'.

How to avoid a possible missing cache file in PHP?

I have a simple caching system as
if (file_exists($cache)) {
echo file_get_contents($cache);
// if coming here when $cache is deleting, then nothing to display
}
else {
// PHP process
}
We regularly delete outdated cache files, e.g. deleting all caches after 1 hour. Although this process is very fast, but I am thinking that a cache file can be deleted right between the if statement and file_get_contents processes.
I mean when if statement checks the existence of cache file, it exists; but when file_get_contents tries to catch it, it is no longer there (deleted by simultaneous cache deleting process).
file_get_contents locks the file to avoid the undergoing delete process during the read process. But the file can be deleted when the if statement sends the PHP process to the first condition (before start of the file_get_contents).
Is there any approach to avoid this? Is the cache deleting system different?
NOTE: I did not face any practical problem, as it is not very probable to catch this event, but logically it is possible, and should happen on heavy loads.
Luckily file_get_contents return FALSE on error, so you could quick-bake it like:
if (FALSE !== ($buffer = file_get_contents())) {
echo $buffer;
return;
}
// PHP process
or similiar. It's a bit the quick and dirty way, considering you want to place the # operator to hide any warnings about non-existent files:
if (FALSE !== ($buffer = #file_get_contents())) {
The other alternative would be to lock, however that might prevent your cache-deletion to not delete the file if you have locked it.
Then left is to stall the cache your own. That means reading the file-creation time in PHP, check that it is < 5 minutes then for the file-deletion processing (5 minutes is exemplary) and then you would know that the file is already stale and for being replaced with fresh content. Re-create the file then. Otherwise read the file in, which probably is better then with readfile instead of file_get_contents and echo.
On failure, file_get_contents returns false, so what about this:
if (($output = file_get_contents($filename)) === false){
// Do the processing.
$output = 'Generated content';
// Save cache file
file_put_contents($filename, $output);
}
echo $output;
By the way, you may want to consider using fpassthru, which is more memory-efficient, especially for larger files. Using file_get_contents on large files (> 100 MB), will probably cause problems (depending on your configuration).
<?php
$fp = #fopen($filename, 'rb');
if ($fp === false){
// Generate output
} else {
fpassthru($fp);
}

PHP - Chunked file copy (via FTP) has missing bytes?

So, I'm writing a chunked file transfer script that is intended to copy files--small and large--to a remote server. It almost works fantastically (and did with a 26 byte file I tested, haha) but when I start to do larger files, I notice it isn't quite working. For example, I uploaded a 96,489,231 byte file, but the final file was 95,504,152 bytes. I tested it with a 928,670,754 byte file, and the copied file only had 927,902,792 bytes.
Has anyone else ever experienced this? I'm guessing feof() may be doing something wonky, but I have no idea how to replace it, or test that. I commented the code, for your convenience. :)
<?php
// FTP credentials
$server = CENSORED;
$username = CENSORED;
$password = CENSORED;
// Destination file (where the copied file should go)
$destination = "ftp://$username:$password#$server/ftp/final.mp4";
// The file on my server that we're copying (in chunks) to $destination.
$read = 'grr.mp4';
// If the file we're trying to copy exists...
if (file_exists($read))
{
// Set a chunk size
$chunk_size = 4194304;
// For reading through the file we want to copy to the FTP server.
$read_handle = fopen($read, 'rb');
// For appending to the destination file.
$destination_handle = fopen($destination, 'ab');
echo '<span style="font-size:20px;">';
echo 'Uploading.....';
// Loop through $read until we reach the end of the file.
while (!feof($read_handle))
{
// So Rackspace doesn't think nothing's happening.
echo PHP_EOL;
flush();
// Read a chunk of the file we're copying.
$chunk = fread($read_handle, $chunk_size);
// Write the chunk to the destination file.
fwrite($destination_handle, $chunk);
sleep(1);
}
echo 'Done!';
echo '</span>';
}
fclose($read_handle);
fclose($destination_handle);
?>
EDIT
I (may have) confirmed that the script is dying at the end somehow, and not corrupting the files. I created a simple file with each line corresponding to the line number, up to 10000, then ran my script. It stopped at line 6253. However, the script is still returning "Done!" at the end, so I can't imagine it's a timeout issue. Strange!
EDIT 2
I have confirmed that the problem exists somewhere in fwrite(). By echoing $chunk inside the loop, the complete file is returned without fail. However, the written file still does not match.
EDIT 3
It appears to work if I add sleep(1) immediately after the fwrite(). However, that makes the script take a million years to run. Is it possible that PHP's append has some inherent flaw?
EDIT 4
Alright, further isolated the problem to being an FTP problem, somehow. When I run this file copy locally, it works fine. However, when I use the file transfer protocol (line 9) the bytes are missing. This is occurring despite the binary flags the two cases of fopen(). What could possibly be causing this?
EDIT 5
I found a fix. The modified code is above--I'll post an answer on my own as soon as I'm able.
I found a fix, though I'm not sure exactly why it works. Simply sleeping after writing each chunk fixes the problem. I upped the chunk size quite a bit to speed things up. Though this is an arguably bad solution, it should work for my uses. Thanks anyway, guys!

How can I optimize this simple PHP script?

This first script gets called several times for each user via an AJAX request. It calls another script on a different server to get the last line of a text file. It works fine, but I think there is a lot of room for improvement but I am not a very good PHP coder, so I am hoping with the help of the community I can optimize this for speed and efficiency:
AJAX POST Request made to this script
<?php session_start();
$fileName = $_POST['textFile'];
$result = file_get_contents($_SESSION['serverURL']."fileReader.php?textFile=$fileName");
echo $result;
?>
It makes a GET request to this external script which reads a text file
<?php
$fileName = $_GET['textFile'];
if (file_exists('text/'.$fileName.'.txt')) {
$lines = file('text/'.$fileName.'.txt');
echo $lines[sizeof($lines)-1];
}
else{
echo 0;
}
?>
I would appreciate any help. I think there is more improvement that can be made in the first script. It makes an expensive function call (file_get_contents), well at least I think its expensive!
This script should limit the locations and file types that it's going to return.
Think of somebody trying this:
http://www.yoursite.com/yourscript.php?textFile=../../../etc/passwd (or something similar)
Try to find out where delays occur.. does the HTTP request take long, or is the file so large that reading it takes long.
If the request is slow, try caching results locally.
If the file is huge, then you could set up a cron job that extracts the last line of the file at regular intervals (or at every change), and save that to a file that your other script can access directly.
readfile is your friend here
it reads a file on disk and streams it to the client.
script 1:
<?php
session_start();
// added basic argument filtering
$fileName = preg_replace('/[^A-Za-z0-9_]/', '', $_POST['textFile']);
$fileName = $_SESSION['serverURL'].'text/'.$fileName.'.txt';
if (file_exists($fileName)) {
// script 2 could be pasted here
//for the entire file
//readfile($fileName);
//for just the last line
$lines = file($fileName);
echo $lines[count($lines)-1];
exit(0);
}
echo 0;
?>
This script could further be improved by adding caching to it. But that is more complicated.
The very basic caching could be.
script 2:
<?php
$lastModifiedTimeStamp filemtime($fileName);
if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
$browserCachedCopyTimestamp = strtotime(preg_replace('/;.*$/', '', $_SERVER['HTTP_IF_MODIFIED_SINCE']));
if ($browserCachedCopyTimestamp >= $lastModifiedTimeStamp) {
header("HTTP/1.0 304 Not Modified");
exit(0);
}
}
header('Content-Length: '.filesize($fileName));
header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 604800)); // (3600 * 24 * 7)
header('Last-Modified: '.date('D, d M Y H:i:s \G\M\T', $lastModifiedTimeStamp));
?>
First things first: Do you really need to optimize that? Is that the slowest part in your use case? Have you used xdebug to verify that? If you've done that, read on:
You cannot really optimize the first script usefully: If you need a http-request, you need a http-request. Skipping the http request could be a performance gain, though, if it is possible (i.e. if the first script can access the same files the second script would operate on).
As for the second script: Reading the whole file into memory does look like some overhead, but that is neglibable, if the files are small. The code looks very readable, I would leave it as is in that case.
If your files are big, however, you might want to use fopen() and its friends fseek() and fread()
# Do not forget to sanitize the file name here!
# An attacker could demand the last line of your password
# file or similar! ($fileName = '../../passwords.txt')
$filePointer = fopen($fileName, 'r');
$i = 1;
$chunkSize = 200;
# Read 200 byte chunks from the file and check if the chunk
# contains a newline
do {
fseek($filePointer, -($i * $chunkSize), SEEK_END);
$line = fread($filePointer, $i++ * $chunkSize);
} while (($pos = strrpos($line, "\n")) === false);
return substr($line, $pos + 1);
If the files are unchanging, you should cache the last line.
If the files are changing and you control the way they are produced, it might or might not be an improvement to reverse the order lines are written, depending on how often a line is read over its lifetime.
Edit:
Your server could figure out what it wants to write to its log, put it in memcache, and then write it to the log. The request for the last line could be fulfulled from memcache instead of file read.
The most probable source of delay is that cross-server HTTP request. If the files are small, the cost of fopen/fread/fclose is nothing compared to the whole HTTP request.
(Not long ago I used HTTP to retrieve images to dinamically generate image-based menus. Replacing the HTTP request by a local file read reduced the delay from seconds to tenths of a second.)
I assume that the obvious solution of accessing the file server filesystem directly is out of the question. If not, then it's the best and simplest option.
If not, you could use caching. Instead of getting the whole file, you just issue a HEAD request and compare the timestamp to a local copy.
Also, if you are ajax-updating a lot of clients based on the same files, you might consider looking at using comet (meteor, for example). It's used for things like chats, where a single change has to be broadcasted to several clients.

Categories