I have a very simple question: what is the best way to download a file in PHP but only if a local version has been downloaded more than 5 minute ago?
In my actual case I would like to get data from a remotely hosted csv file, for which I currently use
$file = file_get_contents($url);
without any local copy or caching. What is the simplest way to convert this into a cached version, where the end result doesn't change ($file stays the same), but it uses a local copy if it’s been fetched not so long ago (say 5 minute)?
Use a local cache file, and just check the existence and modification time on the file before you use it. For example, if $cache_file is a local cache filename:
if (file_exists($cache_file) && (filemtime($cache_file) > (time() - 60 * 5 ))) {
// Cache file is less than five minutes old.
// Don't bother refreshing, just use the file as-is.
$file = file_get_contents($cache_file);
} else {
// Our cache is out-of-date, so load the data from our remote server,
// and also save it over our cache for next time.
$file = file_get_contents($url);
file_put_contents($cache_file, $file, LOCK_EX);
}
(Untested, but based on code I use at the moment.)
Either way through this code, $file ends up as the data you need, and it'll either use the cache if it's fresh, or grab the data from the remote server and refresh the cache if not.
EDIT: I understand a bit more about file locking since I wrote the above. It might be worth having a read of this answer if you're concerned about the file locking here.
If you're concerned about locking and concurrent access, I'd say the cleanest solution would be to file_put_contents to a temporary file, then rename() it over $cache_file, which should be an atomic operation, i.e. the $cache_file will either be the old contents or the full new contents, never halfway written.
Try phpFastCache , it support files caching, and you don't need to code your cache class. easy to use on shared hosting and VPS
Here is example:
<?php
// change files to memcached, wincache, xcache, apc, files, sqlite
$cache = phpFastCache("files");
$content = $cache->get($url);
if($content == null) {
$content = file_get_contents($url);
// 300 = 5 minutes
$cache->set($url, $content, 300);
}
// use ur $content here
echo $content;
Here is a simple version which also passes a windows User-Agent string to the remote host so you don't look like a trouble-maker without proper headers.
<?php
function getCacheContent($cachefile, $remotepath, $cachetime = 120){
// Generate the cache version if it doesn't exist or it's too old!
if( ! file_exists($cachefile) OR (filemtime($cachefile) < (time() - $cachetime))) {
$options = array(
'method' => "GET",
'header' => "Accept-language: en\r\n" .
"User-Agent: Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)\r\n"
);
$context = stream_context_create(array('http' => $options));
$contents = file_get_contents($remotepath, false, $context);
file_put_contents($cachefile, $contents, LOCK_EX);
return $contents;
}
return file_get_contents($cachefile);
}
If you are using a database system of any type, you could cache this file there. Create a table for cached information, and give it at minimum the following fields:
An identifier; something you can use to retrieve the file the next time you need it. Probably something like a file name.
A timestamp from the last time you downloaded the file from the URL.
Either a path to the file, where it's stored in your local file system, or use a BLOB type field to just store the contents of the file itself in the database. I would recommend just storing the path, personally. If the file was very large, you definitely wouldn't want to put it in the database.
Now, when you run the script above next time, first check in the database for the identifier, and pull the time stamp. If the difference between the current time and the stored timestamp is greater than 5 minutes pull from the URL and update the database. Otherwise, load the file from the database.
If you don't have a database setup, you could do the same thing just using files, wherein one file, or field in a file, would contain the timestamp from when you last downloaded the file.
First, you might want to check the design pattern: Lazy loading.
The implementation should change to always load the file from local cache.
If the local cache is not existed or file time jitter longer than 5 minute, you fetch the file from server.
Pseudo code is like following:
$time = filetime($local_cache)
if ($time == false || (now() - $time) > 300000)
fetch_localcache($url) #You have to do it yourself
$file = fopen($local_cache)
Best practice for it
$cacheKey=md5_file('file.php');
You can save a copy of your file on first hit, then check with filemtime the timestamp of the last modification of the local file on following hits.
You would warp it into a cache like method:
function getFile($name) {
// code stolen from #Peter M
if ($file exists) {
if ($file time stamp older than 5 minutes) {
$file = file_get_contents($url)
}
} else {
$file = file_get_contents($url)
}
return $file;
}
I think you want some (psuedo code) logic like:
if ($file exists) {
if ($file time stamp older than 5 minutes) {
$file = file_get_contents($url)
}
} else {
$file = file_get_contents($url)
}
use $file
Related
I make use of an external XML file I can get from a certain URL. Now there is a problem regarding the getting of the XML file, because if you try too much times to get the file, you don't get anything at all, presumably to limit the amount of requests.
Is there a possibility to download via PHP the XML file every day to limit the requests to the external server.
I have checked what options I have and I saw that CRON is the most found solution to this problem. But I want to do this via PHP if that is possible, because I don't have the access to the server to setup CRON.
Does anyone have any experience with downloading an XML file to your own server and use that, and download that XML file daily to limit the requests?
I have this code to get the actual XML file:
$xml = file_get_contents("my-xml-file-url-external");
file_put_contents("my-path-to-save-xml-file", $xml);
But how can I make sure this gets called every day?
You can check the last modified time (see filemtime() documentation) of the file you write to, and if it's more than a day old (or non-existant) overwrite it:
$cacheFile = "file.xml";
if (!file_exists($cacheFile) || filemtime($cacheFile) < time() - 86400)
{
$xml = file_get_contents("my-xml-file-url-external");
file_put_contents($cacheFile, $xml);
} else {
$xml = file_get_contents($cacheFile);
}
I have a script that re-writes a file every few hours. This file is inserted into end users html, via php include.
How can I check if my script, at this exact moment, is working (e.g. re-writing) the file when it is being called to user for display? Is it even an issue, in terms of what will happen if they access the file at the same time, what are the odds and will the user just have to wait untill the script is finished its work?
Thanks in advance!
More on the subject...
Is this a way forward using file_put_contents and LOCK_EX?
when script saves its data every now and then
file_put_contents($content,"text", LOCK_EX);
and when user opens the page
if (file_exists("text")) {
function include_file() {
$file = fopen("text", "r");
if (flock($file, LOCK_EX)) {
include_file();
}
else {
echo file_get_contents("text");
}
}
} else {
echo 'no such file';
}
Could anyone advice me on the syntax, is this a proper way to call include_file() after condition and how can I limit a number of such calls?
I guess this solution is also good, except same call to include_file(), would it even work?
function include_file() {
$time = time();
$file = filectime("text");
if ($file + 1 < $time) {
echo "good to read";
} else {
echo "have to wait";
include_file();
}
}
To check if the file is currently being written, you can use filectime() function to get the actual time the file is being written.
You can get current timestamp on top of your script in a variable and whenever you need to access the file, you can compare the current timestamp with the filectime() of that file, if file creation time is latest then the scenario occured when you have to wait for that file to be written and you can log that in database or another file.
To prevent this scenario from happening, you can change the script which is writing the file so that, it first creates temporary file and once it's done you just replace (move or rename) the temporary file with original file, this action would require very less time compared to file writing and make the scenario occurrence very rare possibility.
Even if read and replace operation occurs simultaneously, the time the read script has to wait will be very less.
Depending on the size of the file, this might be an issue of concurrency. But you might solve that quite easy: before starting to write the file, you might create a kind of "lock file", i.e. if your file is named "incfile.php" you might create an "incfile.php.lock". Once you're doen with writing, you will remove this file.
On the include side, you can check for the existance of the "incfile.php.lock" and wait until it's disappeared, need some looping and sleeping in the unlikely case of a concurrent access.
Basically, you should consider another solution by just writing the data which is rendered in to that file to a database (locks etc are available) and render that in a module which then gets included in your page. Solutions like yours are hardly to maintain on the long run ...
This question is old, but I add this answer because the other answers have no code.
function write_to_file(string $fp, string $string) : bool {
$timestamp_before_fwrite = date("U");
$stream = fopen($fp, "w");
fwrite($stream, $string);
while(is_resource($stream)) {
fclose($stream);
}
$file_last_changed = filemtime($fp);
if ($file_last_changed < $timestamp_before_fwrite) {
//File not changed code
return false;
}
return true;
}
This is the function I use to write to file, it first gets the current timestamp before making changes to the file, and then I compare the timestamp to the last time the file was changed.
I have a simple caching system as
if (file_exists($cache)) {
echo file_get_contents($cache);
// if coming here when $cache is deleting, then nothing to display
}
else {
// PHP process
}
We regularly delete outdated cache files, e.g. deleting all caches after 1 hour. Although this process is very fast, but I am thinking that a cache file can be deleted right between the if statement and file_get_contents processes.
I mean when if statement checks the existence of cache file, it exists; but when file_get_contents tries to catch it, it is no longer there (deleted by simultaneous cache deleting process).
file_get_contents locks the file to avoid the undergoing delete process during the read process. But the file can be deleted when the if statement sends the PHP process to the first condition (before start of the file_get_contents).
Is there any approach to avoid this? Is the cache deleting system different?
NOTE: I did not face any practical problem, as it is not very probable to catch this event, but logically it is possible, and should happen on heavy loads.
Luckily file_get_contents return FALSE on error, so you could quick-bake it like:
if (FALSE !== ($buffer = file_get_contents())) {
echo $buffer;
return;
}
// PHP process
or similiar. It's a bit the quick and dirty way, considering you want to place the # operator to hide any warnings about non-existent files:
if (FALSE !== ($buffer = #file_get_contents())) {
The other alternative would be to lock, however that might prevent your cache-deletion to not delete the file if you have locked it.
Then left is to stall the cache your own. That means reading the file-creation time in PHP, check that it is < 5 minutes then for the file-deletion processing (5 minutes is exemplary) and then you would know that the file is already stale and for being replaced with fresh content. Re-create the file then. Otherwise read the file in, which probably is better then with readfile instead of file_get_contents and echo.
On failure, file_get_contents returns false, so what about this:
if (($output = file_get_contents($filename)) === false){
// Do the processing.
$output = 'Generated content';
// Save cache file
file_put_contents($filename, $output);
}
echo $output;
By the way, you may want to consider using fpassthru, which is more memory-efficient, especially for larger files. Using file_get_contents on large files (> 100 MB), will probably cause problems (depending on your configuration).
<?php
$fp = #fopen($filename, 'rb');
if ($fp === false){
// Generate output
} else {
fpassthru($fp);
}
I am using the PHP HTML DOM Parser to pull data from an external website. To reduce load and speed up page rendering time I want to cache data I pull for a certain period. How can I do this?
I wrote this file cache function which basically just replaces file_get_contents. You can specify the amount of time the cache should last for in $offset or completely override the cache with $override. If you don't want to use /tmp/, just change that directory to something you can read/write to.
function cache_get_contents($url, $offset = 600, $override = false) {
$file = '/tmp/file_cache_' . md5($url);
if (!$override && file_exists($file) && filemtime($file) > time() - $offset)
return file_get_contents($file);
$contents = file_get_contents($url);
if ($contents === false)
return false;
file_put_contents($file, $contents);
return $contents;
}
You could create local files with the HTML and then keep track of the file paths in the $SESSION. If you have the disk space and can run a database, you could use a database to do the same thing. A database connection and query on the URL you're looking for won't add much overhead at all.
One way would be to save the data to a database or local file. You could then use a timestamp column or file modification time to determine whether to continue using the cache or pull and save a fresh copy.
If you have access to some kind of memory caching (e.g. memcached) that would be ideal.
This first script gets called several times for each user via an AJAX request. It calls another script on a different server to get the last line of a text file. It works fine, but I think there is a lot of room for improvement but I am not a very good PHP coder, so I am hoping with the help of the community I can optimize this for speed and efficiency:
AJAX POST Request made to this script
<?php session_start();
$fileName = $_POST['textFile'];
$result = file_get_contents($_SESSION['serverURL']."fileReader.php?textFile=$fileName");
echo $result;
?>
It makes a GET request to this external script which reads a text file
<?php
$fileName = $_GET['textFile'];
if (file_exists('text/'.$fileName.'.txt')) {
$lines = file('text/'.$fileName.'.txt');
echo $lines[sizeof($lines)-1];
}
else{
echo 0;
}
?>
I would appreciate any help. I think there is more improvement that can be made in the first script. It makes an expensive function call (file_get_contents), well at least I think its expensive!
This script should limit the locations and file types that it's going to return.
Think of somebody trying this:
http://www.yoursite.com/yourscript.php?textFile=../../../etc/passwd (or something similar)
Try to find out where delays occur.. does the HTTP request take long, or is the file so large that reading it takes long.
If the request is slow, try caching results locally.
If the file is huge, then you could set up a cron job that extracts the last line of the file at regular intervals (or at every change), and save that to a file that your other script can access directly.
readfile is your friend here
it reads a file on disk and streams it to the client.
script 1:
<?php
session_start();
// added basic argument filtering
$fileName = preg_replace('/[^A-Za-z0-9_]/', '', $_POST['textFile']);
$fileName = $_SESSION['serverURL'].'text/'.$fileName.'.txt';
if (file_exists($fileName)) {
// script 2 could be pasted here
//for the entire file
//readfile($fileName);
//for just the last line
$lines = file($fileName);
echo $lines[count($lines)-1];
exit(0);
}
echo 0;
?>
This script could further be improved by adding caching to it. But that is more complicated.
The very basic caching could be.
script 2:
<?php
$lastModifiedTimeStamp filemtime($fileName);
if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
$browserCachedCopyTimestamp = strtotime(preg_replace('/;.*$/', '', $_SERVER['HTTP_IF_MODIFIED_SINCE']));
if ($browserCachedCopyTimestamp >= $lastModifiedTimeStamp) {
header("HTTP/1.0 304 Not Modified");
exit(0);
}
}
header('Content-Length: '.filesize($fileName));
header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 604800)); // (3600 * 24 * 7)
header('Last-Modified: '.date('D, d M Y H:i:s \G\M\T', $lastModifiedTimeStamp));
?>
First things first: Do you really need to optimize that? Is that the slowest part in your use case? Have you used xdebug to verify that? If you've done that, read on:
You cannot really optimize the first script usefully: If you need a http-request, you need a http-request. Skipping the http request could be a performance gain, though, if it is possible (i.e. if the first script can access the same files the second script would operate on).
As for the second script: Reading the whole file into memory does look like some overhead, but that is neglibable, if the files are small. The code looks very readable, I would leave it as is in that case.
If your files are big, however, you might want to use fopen() and its friends fseek() and fread()
# Do not forget to sanitize the file name here!
# An attacker could demand the last line of your password
# file or similar! ($fileName = '../../passwords.txt')
$filePointer = fopen($fileName, 'r');
$i = 1;
$chunkSize = 200;
# Read 200 byte chunks from the file and check if the chunk
# contains a newline
do {
fseek($filePointer, -($i * $chunkSize), SEEK_END);
$line = fread($filePointer, $i++ * $chunkSize);
} while (($pos = strrpos($line, "\n")) === false);
return substr($line, $pos + 1);
If the files are unchanging, you should cache the last line.
If the files are changing and you control the way they are produced, it might or might not be an improvement to reverse the order lines are written, depending on how often a line is read over its lifetime.
Edit:
Your server could figure out what it wants to write to its log, put it in memcache, and then write it to the log. The request for the last line could be fulfulled from memcache instead of file read.
The most probable source of delay is that cross-server HTTP request. If the files are small, the cost of fopen/fread/fclose is nothing compared to the whole HTTP request.
(Not long ago I used HTTP to retrieve images to dinamically generate image-based menus. Replacing the HTTP request by a local file read reduced the delay from seconds to tenths of a second.)
I assume that the obvious solution of accessing the file server filesystem directly is out of the question. If not, then it's the best and simplest option.
If not, you could use caching. Instead of getting the whole file, you just issue a HEAD request and compare the timestamp to a local copy.
Also, if you are ajax-updating a lot of clients based on the same files, you might consider looking at using comet (meteor, for example). It's used for things like chats, where a single change has to be broadcasted to several clients.