Caching PHP Simple HTML DOM Parser - php

I am using the PHP HTML DOM Parser to pull data from an external website. To reduce load and speed up page rendering time I want to cache data I pull for a certain period. How can I do this?

I wrote this file cache function which basically just replaces file_get_contents. You can specify the amount of time the cache should last for in $offset or completely override the cache with $override. If you don't want to use /tmp/, just change that directory to something you can read/write to.
function cache_get_contents($url, $offset = 600, $override = false) {
$file = '/tmp/file_cache_' . md5($url);
if (!$override && file_exists($file) && filemtime($file) > time() - $offset)
return file_get_contents($file);
$contents = file_get_contents($url);
if ($contents === false)
return false;
file_put_contents($file, $contents);
return $contents;
}

You could create local files with the HTML and then keep track of the file paths in the $SESSION. If you have the disk space and can run a database, you could use a database to do the same thing. A database connection and query on the URL you're looking for won't add much overhead at all.

One way would be to save the data to a database or local file. You could then use a timestamp column or file modification time to determine whether to continue using the cache or pull and save a fresh copy.
If you have access to some kind of memory caching (e.g. memcached) that would be ideal.

Related

creating only new files in PHP without cpu intensive code

In my cache system, I want it where if a new page is requested, a check is made to see if a file exists and if it doesn't then a copy is stored on the server, If it does exist, then it must not be overwritten.
The problem I have is that I may be using functions designed to be slow.
This is part of my current implementation to save files:
if (!file_exists($filename)){$h=fopen($filename,"wb");if ($h){fwrite($h,$c);fclose($h);}}
This is part of my implementation to load files:
if (($m=#filemtime($file)) !== false){
if ($m >= filemtime("sitemodification.file")){
$outp=file_get_contents($file);
header("Content-length:".strlen($outp),true);echo $outp;flush();exit();
}
}
What I want to do is replace this with a better set of functions meant for performance and yet still achieve the same functionality. All caching files including sitemodification.file reside on a ramdisk. I added a flush before exit in hopes that content will be outputted faster.
I can't use direct memory addressing at this time because the file sizes to be stored are all different.
Is there a set of functions I can use that can execute the code I provided faster by at least a few milliseconds, especially the loading files code?
I'm trying to keep my time to first byte low.
First, prefer is_file to file_exists and use file_put_contents:
if ( !is_file($filename) ) {
file_put_contents($filename,$c);
}
Then, use the proper function for this kind of work, readfile:
if ( ($m = #filemtime($file)) !== false && $m >= filemtime('sitemodification.file')) {
header('Content-length:'.filesize($file));
readfile($file);
}
}
You should see a little improvement but keep in mind that file accesses are slow and you check three times for files access before sending any content.

PHP, check if the file is being written to/updated by PHP script?

I have a script that re-writes a file every few hours. This file is inserted into end users html, via php include.
How can I check if my script, at this exact moment, is working (e.g. re-writing) the file when it is being called to user for display? Is it even an issue, in terms of what will happen if they access the file at the same time, what are the odds and will the user just have to wait untill the script is finished its work?
Thanks in advance!
More on the subject...
Is this a way forward using file_put_contents and LOCK_EX?
when script saves its data every now and then
file_put_contents($content,"text", LOCK_EX);
and when user opens the page
if (file_exists("text")) {
function include_file() {
$file = fopen("text", "r");
if (flock($file, LOCK_EX)) {
include_file();
}
else {
echo file_get_contents("text");
}
}
} else {
echo 'no such file';
}
Could anyone advice me on the syntax, is this a proper way to call include_file() after condition and how can I limit a number of such calls?
I guess this solution is also good, except same call to include_file(), would it even work?
function include_file() {
$time = time();
$file = filectime("text");
if ($file + 1 < $time) {
echo "good to read";
} else {
echo "have to wait";
include_file();
}
}
To check if the file is currently being written, you can use filectime() function to get the actual time the file is being written.
You can get current timestamp on top of your script in a variable and whenever you need to access the file, you can compare the current timestamp with the filectime() of that file, if file creation time is latest then the scenario occured when you have to wait for that file to be written and you can log that in database or another file.
To prevent this scenario from happening, you can change the script which is writing the file so that, it first creates temporary file and once it's done you just replace (move or rename) the temporary file with original file, this action would require very less time compared to file writing and make the scenario occurrence very rare possibility.
Even if read and replace operation occurs simultaneously, the time the read script has to wait will be very less.
Depending on the size of the file, this might be an issue of concurrency. But you might solve that quite easy: before starting to write the file, you might create a kind of "lock file", i.e. if your file is named "incfile.php" you might create an "incfile.php.lock". Once you're doen with writing, you will remove this file.
On the include side, you can check for the existance of the "incfile.php.lock" and wait until it's disappeared, need some looping and sleeping in the unlikely case of a concurrent access.
Basically, you should consider another solution by just writing the data which is rendered in to that file to a database (locks etc are available) and render that in a module which then gets included in your page. Solutions like yours are hardly to maintain on the long run ...
This question is old, but I add this answer because the other answers have no code.
function write_to_file(string $fp, string $string) : bool {
$timestamp_before_fwrite = date("U");
$stream = fopen($fp, "w");
fwrite($stream, $string);
while(is_resource($stream)) {
fclose($stream);
}
$file_last_changed = filemtime($fp);
if ($file_last_changed < $timestamp_before_fwrite) {
//File not changed code
return false;
}
return true;
}
This is the function I use to write to file, it first gets the current timestamp before making changes to the file, and then I compare the timestamp to the last time the file was changed.

How to avoid a possible missing cache file in PHP?

I have a simple caching system as
if (file_exists($cache)) {
echo file_get_contents($cache);
// if coming here when $cache is deleting, then nothing to display
}
else {
// PHP process
}
We regularly delete outdated cache files, e.g. deleting all caches after 1 hour. Although this process is very fast, but I am thinking that a cache file can be deleted right between the if statement and file_get_contents processes.
I mean when if statement checks the existence of cache file, it exists; but when file_get_contents tries to catch it, it is no longer there (deleted by simultaneous cache deleting process).
file_get_contents locks the file to avoid the undergoing delete process during the read process. But the file can be deleted when the if statement sends the PHP process to the first condition (before start of the file_get_contents).
Is there any approach to avoid this? Is the cache deleting system different?
NOTE: I did not face any practical problem, as it is not very probable to catch this event, but logically it is possible, and should happen on heavy loads.
Luckily file_get_contents return FALSE on error, so you could quick-bake it like:
if (FALSE !== ($buffer = file_get_contents())) {
echo $buffer;
return;
}
// PHP process
or similiar. It's a bit the quick and dirty way, considering you want to place the # operator to hide any warnings about non-existent files:
if (FALSE !== ($buffer = #file_get_contents())) {
The other alternative would be to lock, however that might prevent your cache-deletion to not delete the file if you have locked it.
Then left is to stall the cache your own. That means reading the file-creation time in PHP, check that it is < 5 minutes then for the file-deletion processing (5 minutes is exemplary) and then you would know that the file is already stale and for being replaced with fresh content. Re-create the file then. Otherwise read the file in, which probably is better then with readfile instead of file_get_contents and echo.
On failure, file_get_contents returns false, so what about this:
if (($output = file_get_contents($filename)) === false){
// Do the processing.
$output = 'Generated content';
// Save cache file
file_put_contents($filename, $output);
}
echo $output;
By the way, you may want to consider using fpassthru, which is more memory-efficient, especially for larger files. Using file_get_contents on large files (> 100 MB), will probably cause problems (depending on your configuration).
<?php
$fp = #fopen($filename, 'rb');
if ($fp === false){
// Generate output
} else {
fpassthru($fp);
}

fetch templates from database/string

I store my templates as files, and would like to have the opportunity to store them also in a MySql db.
My template System
//function of Template class, where $file is a path to a file
function fetch() {
ob_start();
if (is_array($this->vars)) extract($this->vars);
include($file);
$contents = ob_get_contents();
ob_end_clean();
return $contents;
}
function set($name, $value) {
$this->vars[$name] = is_object($value) ? $value->fetch() : $value;
}
usage:
$tpl = & new Template('path/to/template');
$tpl->set('titel', $titel);
Template example:
<h1><?=titel?></h1>
<p>Lorem ipsum...</p>
My approach
Selecting the the template from the database as a String
what i got is like $tpl = "<h1><?=$titel? >...";
Now I would like to pass it to the template system, so I extended my constructor and the fetch function:
function fetch() {
if (is_array($this->vars)) extract($this->vars);
ob_start();
if(is_file($file)){
include($file);
}else{
//first idea: eval ($file);
//second idea: print $file;
}
$contents = ob_get_contents();
ob_end_clean();
return $contents;
}
'eval' gives me an Parsing exception, because it interprets the whole String as php, not just the php part.
'print' is really strange: It doesn't print the staff between , but I can see it in the source code of the page. php function are beeing ignored.
So what should I try instead?
Maybe not the best solution, but its simple and it should work:
fetch your template from the db
write a file with the template
include this file
(optional: delete the file)
If you add a Timestamp column to your template table, you can use the filesystem as a cache. Just compare the timestamps of the file and the database to decide if its sufficient to reuse the file.
If you prepend '?>' to your eval, it should work.
<?php
$string = 'hello <?php echo $variable; ?>';
$variable = "world";
eval('?>' . $string);
But you should know that eval() is a rather slow thing. Its resulting op-code cannot be cached in APC (or similar). You should find a way to cache your templates on disk. For one you wouldn't have to pull them from the database every time they're needed. And you could make use of regular op-code caching (done transparently by APC).
Every time I see some half-baked home-grown "template engine", I ask myself why the author did not rely on one of the many existing template engines out there? Most of them have already solved most of the problems you could possible have. Smarty (and Twig, phpTAL, …) make it a real charme to pull template sources from wherever you like (while trying to maintain optimal performance). Do you have any special reasons for not using one of these?
I would do pretty much the same thing as tweber except I would prefer depending on the local file timestamps rather than the DB.
Something like this: Each file has a TTL ( expiration time ) of lets say 60 seconds. The real reason is to avoid hitting the DB too hard/often needlessly, you'll quickly realize just how much faster filesystem access is compared to network and mysql especially if the mysql instance is running on a remote server.
# implement a function that gets the contents of the file ( key here is the filename )
# from DB and saves them to disk.
function fectchFreshCopy( $filename ) {
# mysql_connect(); ...
}
if (is_array($this->vars)) extract($this->vars);
ob_start();
# first check if the file exists already
if( file_exits($file) ) {
# now check the timestamp of the files creation to know if it has expired:
$mod_timestamp = filemtime( $file );
if ( ( time() - $mod_timestamp ) >= 60 ) {
# then the file has expired, lets fetch a fresh copy from DB
# and save it to disk..
fetchFreshCopy();
}
}else{
# the file doesnt exist at all, fetch and save it!
fetchFreshCopy();
}
include( $file );
$contents = ob_get_contents();
ob_end_clean();
return $contents;
}
Cheers, hope thats useful

5-minute file cache in PHP

I have a very simple question: what is the best way to download a file in PHP but only if a local version has been downloaded more than 5 minute ago?
In my actual case I would like to get data from a remotely hosted csv file, for which I currently use
$file = file_get_contents($url);
without any local copy or caching. What is the simplest way to convert this into a cached version, where the end result doesn't change ($file stays the same), but it uses a local copy if it’s been fetched not so long ago (say 5 minute)?
Use a local cache file, and just check the existence and modification time on the file before you use it. For example, if $cache_file is a local cache filename:
if (file_exists($cache_file) && (filemtime($cache_file) > (time() - 60 * 5 ))) {
// Cache file is less than five minutes old.
// Don't bother refreshing, just use the file as-is.
$file = file_get_contents($cache_file);
} else {
// Our cache is out-of-date, so load the data from our remote server,
// and also save it over our cache for next time.
$file = file_get_contents($url);
file_put_contents($cache_file, $file, LOCK_EX);
}
(Untested, but based on code I use at the moment.)
Either way through this code, $file ends up as the data you need, and it'll either use the cache if it's fresh, or grab the data from the remote server and refresh the cache if not.
EDIT: I understand a bit more about file locking since I wrote the above. It might be worth having a read of this answer if you're concerned about the file locking here.
If you're concerned about locking and concurrent access, I'd say the cleanest solution would be to file_put_contents to a temporary file, then rename() it over $cache_file, which should be an atomic operation, i.e. the $cache_file will either be the old contents or the full new contents, never halfway written.
Try phpFastCache , it support files caching, and you don't need to code your cache class. easy to use on shared hosting and VPS
Here is example:
<?php
// change files to memcached, wincache, xcache, apc, files, sqlite
$cache = phpFastCache("files");
$content = $cache->get($url);
if($content == null) {
$content = file_get_contents($url);
// 300 = 5 minutes
$cache->set($url, $content, 300);
}
// use ur $content here
echo $content;
Here is a simple version which also passes a windows User-Agent string to the remote host so you don't look like a trouble-maker without proper headers.
<?php
function getCacheContent($cachefile, $remotepath, $cachetime = 120){
// Generate the cache version if it doesn't exist or it's too old!
if( ! file_exists($cachefile) OR (filemtime($cachefile) < (time() - $cachetime))) {
$options = array(
'method' => "GET",
'header' => "Accept-language: en\r\n" .
"User-Agent: Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)\r\n"
);
$context = stream_context_create(array('http' => $options));
$contents = file_get_contents($remotepath, false, $context);
file_put_contents($cachefile, $contents, LOCK_EX);
return $contents;
}
return file_get_contents($cachefile);
}
If you are using a database system of any type, you could cache this file there. Create a table for cached information, and give it at minimum the following fields:
An identifier; something you can use to retrieve the file the next time you need it. Probably something like a file name.
A timestamp from the last time you downloaded the file from the URL.
Either a path to the file, where it's stored in your local file system, or use a BLOB type field to just store the contents of the file itself in the database. I would recommend just storing the path, personally. If the file was very large, you definitely wouldn't want to put it in the database.
Now, when you run the script above next time, first check in the database for the identifier, and pull the time stamp. If the difference between the current time and the stored timestamp is greater than 5 minutes pull from the URL and update the database. Otherwise, load the file from the database.
If you don't have a database setup, you could do the same thing just using files, wherein one file, or field in a file, would contain the timestamp from when you last downloaded the file.
First, you might want to check the design pattern: Lazy loading.
The implementation should change to always load the file from local cache.
If the local cache is not existed or file time jitter longer than 5 minute, you fetch the file from server.
Pseudo code is like following:
$time = filetime($local_cache)
if ($time == false || (now() - $time) > 300000)
fetch_localcache($url) #You have to do it yourself
$file = fopen($local_cache)
Best practice for it
$cacheKey=md5_file('file.php');
You can save a copy of your file on first hit, then check with filemtime the timestamp of the last modification of the local file on following hits.
You would warp it into a cache like method:
function getFile($name) {
// code stolen from #Peter M
if ($file exists) {
if ($file time stamp older than 5 minutes) {
$file = file_get_contents($url)
}
} else {
$file = file_get_contents($url)
}
return $file;
}
I think you want some (psuedo code) logic like:
if ($file exists) {
if ($file time stamp older than 5 minutes) {
$file = file_get_contents($url)
}
} else {
$file = file_get_contents($url)
}
use $file

Categories