I have a website where the client search a term and results are retrieved through an ajax request. On php side, the called script check the date of the cache (cache are files) and if it's older than an established time it refreshes the results, else it return the cache file content: die(file_get_contents($cache_path));
The cache time is a few hours, an to refresh it takes just a few seconds, so the greatest part of the requests will end up in cache response.
So I thought that using header("location: $cache_path"); would be less stressful for the server, because it simply tells the browser to get the contents from the cache file without passing it through the script.
The downside is that the cache file path would become public (which is not biggest problem ever, because the content is the same), but, you know, it's never good to give the resources locations...
So, performance wise, is there a big difference between file_get_contents and redirecting? The average cache file size is 120kb... Any other ideas and suggestions?
You can use "internal redirect": through X-Accel-Redirect header for nginx or X-Sendfile for Apache. In this case you don't show any additional URLs to a client and don't deal with cache files in you script.
For configuration details you can read an official documentation or, of course, other SO questions (like this one).
Related
This is for a file read via a filesystem path (such as /home/someuser/public_html/somedir/somefile.php) – I am not talking about a file read via a URL.
I have found that (in PHP 7.1.32 at least) file_get_contents may return cached content rather the content now on disk, if the file has been altered by an external process (such as an FTP upload).
In the evidence trail I have, the file was updated 9 minutes previously, but file_get_contents nonetheless returned cached content. The file in question was a PHP file that would have been previously read by PHP itself through require_once (or suchlike), though (obviously due to age) for a previous HTTP request (though would also have been ‘read’ for the current request).
EDIT/CLARIFICATION: The server process will update a file on the server, based on content of another (PHP) file, if it finds the other (PHP) file has been updated (i.e. has a more recent filemtime). What was observed is that the file updated by the server process has a timestamp 9 minutes later, but has clearly been updated based on older content of the PHP file (than it currently has).
The server environment is Linux/cPanel/Apache. OPcache is enabled for PHP.
I’ve done some research but all I could find was a comment on the PHP documentation for clearstatcache that provides a clue but no answer:
Note that this function affects only file metadata. However, all the PHP file system functions do their own caching of actual file contents as well. You can use the realpath_cache_size = 0 directive in PHP.ini to disable the content caching if you like. The default content caching timeout is 120 seconds.
Content caching is not a good idea during development work and for certain kinds of applications, since your code may read in old data from a file whose contents you have just changed.
If the default content caching timeout is 120 seconds, why have I seen it cached for 9 minutes, and what is the actual setting for this in php.ini? (realpath_cache_size seems only to be relevant for mapping ‘relative’ paths to ‘absolute’ ones.)
How can I disable content caching for a specific instance of a call to file_get_contents (or other functions that read from files)?
It occurs to me that PHP might internally use its stat cache to determine the file hasn’t apparently been modified (even though it has) before returning the cached content for it, though that seems to me like too much caching for reliable operation. However, the scenario is not straightforward to reproduce, and it could be wasted effort testing this hypothesis if someone already knows it would or wouldn’t be true.
Do some of you have some more in-depth knowledge, and may be able to shed some light?
TIA 😕
It appears the problem may be caused by preserving timestamps during FTP upload. The file modification time on the server ends up at a (possibly significantly) older timestamp than when the file was actually changed on the server (i.e. when it was uploaded).
If a server process is using the timestamp to see if it needs to update some other file, it may determine that the file hasn't been updated when in fact it has, and thus not update the other file. The can result in the apparent behaviour witnessed.
I have a site where each time you upload an image it gets rendered in various frame sizes. A cron job runs every 10 minutes which looks to see if any new images have been uploaded during that time and if so it generates all the needed frames.
Since this cron runs every 10 minutes there is some time between the content (such as an article) goes live and the time the images are made available. So during that meantime a generic placeholder image with the site's logo is shown.
Since Akamai caches an image, when a site user loads a page which has an image that hasn't been rendered by the cron yet, then a static placeholder will show for that image path and Akamai will cache this. Even when the image is later rendered and is available users will still get the cached version from Akamai.
One solution is to bust the "ages" of these images when the cron has rendered them. But it takes Akamai about 8 min to come back for the new ones.
Is there any other solution where I can tell Akamai perhaps through cache expiration headers to come back every 10 seconds until a new image is received and once that's done don't come back again and keep showing the cached version?
Yes, in a way. If you combine a few steps from the server side and within the akamai settings.
Here's the concept: the Edge Server delivers the content that it has. If you use cache-control headers, from php for example, the TTL settings in the akamai configuration settings of the respective digital property blow them away and it uses those instead. Meaning you tell it how often to come to your origin server by path, file type, extension, or whatever. Then from the client side, whatever files it has it delivers to the end user and it doesn't really matter how often the Edge Servers get requested for the content unless you don't cache at that level, rolling it back up to you.
Using those configuration settings you can specify that a specific file has an exact expiration - or to not cache it at all.
So if on the server side if you specify placeholder.jpg on your page and tell akamai not to cache that image at all, then it'll come back each time the Edge Server gets a request for it. Once you have the image in place then placeholder.jpg doesn't exist on your page any more and instead there is sizeA.jpg, which would obey regular image caching times.
This may not be exactly ideal, but it's as best as you can do other than manually clearing the page and as far as I know they don't have an API call to clear a page that you can fire off (plus it take 7-10 minutes for a cache-clear to propagate through their n/w anyway).
I discovered https://developers.google.com/speed/pagespeed/ the other day and have improved my website's page speed from ~75 to ~95 now.
One of the last few things it recommends is that I:
Leverage browser caching: Setting an expiry date or a maximum age in the HTTP headers
for static resources instructs the browser to load previously downloaded resources
from local disk rather than over the network.
The cache time for my main javascript and css files is set to 2 days, Google suggests I set it to at least 1 week. They also suggest that I do the same for html and php files.
What would happen to my users if I decided to make a large website change and they had just cached my website yesterday (for 1 week)? Would they not see the changes on my website until 1 week later?
Also, since my website contains a control panel and has some dynamically generated PHP pages, is there any reason for caching any of it? Wouldn't my server still be churning through php script and generating new content every time they logged into their account?
You probably doesn't want to cache your HTML and PHP in visitors browsers. However you might want to cache that in a layer you have more control over, like PHP opcode caching with APC and a reverse proxy like Varnish.
For the static assets, like your JavaScript and CSS files, it should be safe to cache them a year or more. If you make a change to them you can just update their URL to say mystyles.css?v=123 and browsers will think it's a whole different file from mystyles.css?v=122 or even just mystyles.css.
I've looked at similar questions about caching in PHP and I'm still stumped as to how to check whether the database has changed without making a new call to the database, which would defeat the point of caching.
I understand technically how to implement caching in PHP -- using ETag and Last Modified headers, output buffering, storing static files, etc. What is tripping me up is how to determine when to serve up a new version of a page instead of a cached version. If the database content has changed, I want to show the new version and not the cached version.
For example, let's say I have a page that displays details about a product. Generally, once the product info is stored in the database, it won't change much. But occasionally there might be an edit to the product description or a price change. If the product has a new price, I don't want to show the user the old price by using a cached version of the page. For that reason, updating the cached content every hour doesn't seem sufficient. Not to mention that that's too often for the content that doesn't change, the real problem is that it won't update the content fast enough when there is a change.
So should I store something (e.g., an ETag value or a static html file) every time the product database is updated through a form in the Admin area of the application? What am I missing here?
[Note: Not interested in using a caching library here. I'd like to learn how to do it in straight PHP for now.]
Caching is a pretty complex topic, because you can cache all sort of data in various places. Usually you implement caching to relieve bottlenecks in your server structure.
In your setup you can cache data at three different locations:
1) Clientside, between client and server
You would use this method to save bandwidth and shorten loading times for the user. You can achieve this by setting cache related fields into the http header (Cache-Control, Expires, ETag and so on).
If you use Cache-Control or Expires, the decision wether to load an updated version from the server or not purely depends on the client. So even if there is a new version available, the user won't see it. On the plus side you are saving lots of cpu cycles on the server, because your php script won't be executed.
If you use ETag, you can inform the client on each request, if the version of the requested content has changed. But your php script will be executed on each request, even if the ETag is unchanged.
2) Serverside, between client and server
This kind of caching primarily reduces high cpu load on your server. It won't affect the amount of traffic generated between client and server.
You can use a client proxy like Varnish to store rendered responses on the server side. The good thing is, that you have full control over the cache. If an updated version of a requested content is available, you can simply purge the old version from the cache, so that a new version is generated from your php script and stored in your cache.
Every response that is cacheable will only be generated one time and then be served from cache to the clients.
3) In your application
If you are heavily using your database, you should consider using a fast key value store like memcached to cache query results. Of course you have to adjust your database classes for this (first ask memcached, if memcached doesn't have the result ask the database and store the result into memcached), but the performance gain will be quite impressive, because memcached is really fast.
Sometimes it even makes sense to store data solely in memcached, if the data doesn't has to be persisted permanently (php sessions for example).
I had also faced the same problem long back (I dont know if you will find my way to be correct).
Why I needed the cache :-
What my site use to do was, it use to update the database by running the script on cron.php file and index.php use to show the listing from database (this use to take ages to load )
my solution :-
every time a new list was created or updated I unlinked the cache file then on index.php page I checked if cache file exists load cache or else load content from database also at the same time write this data to the cache file so next time when user requests for index.php file
I've heard of two caching techniques for the PHP code:
When a PHP script generates output it stores it into local files. When the script is called again it check whether the file with previous output exists and if true returns the content of this file. It's mostly done with playing around the "output buffer". Somthing like this is described in this article.
Using a kind of opcode caching plugin, where the compiled PHP code is stored in memory. The most popular of this one is APC, also eAccelerator.
Now the question is whether it make any sense to use both of the techniques or just use one of them. I think that the first method is a bit complicated and time consuming in the implementation, when the second one seem to be a simple one where you just need to install the module.
I use PHP 5.3 (PHP-FPM) on Ubuntu/Debian.
BTW, are there any other methods to cache PHP code or output, which I didn't mention here? Are they worth considering?
You should always have an opcode cache like APC. Its purpose is to speed up the parsing of your code, and will be bundled into PHP in a future version. For now, it's a simple install on any server and doesn't require you write or change any code.
However, caching opcodes doesn't do anything to speed up the actual execution of your code. Your bottlenecks are usually time spent talking to databases or reading to/from disk. Caching the output of your program avoids unnecessary resource usage and can speed up responses by orders of magnitude.
You can do output caching many different ways at many different places along your stack. The first place you can do it is in your own code, as you suggested, by buffering output, writing it to a file, and reading from that file on subsequent requests.
That still requires executing your PHP code on each request, though. You can cache output at the web server level to skip that as well. Crafting a set of mod_rewrite rules will allow Apache to serve the static files instead of the PHP code when they exist, but you'll have to regenerate the cached versions manually or with a scheduled task, since your PHP code won't be running on each request to do so.
You can also stick a proxy in front of your web server and use that to cache output. Varnish is a popular choice these days and can serve hundreds of times more request per second with caching than Apache running your PHP script on the same server. The cache is created and configured at the proxy level, so when it expires, the request passes through to your script which runs as it normally would to generate the new version of the page.
You know, for me, optcache , filecache .. etc only use for reduce database calls.
They can't speed up your code. However, they improve the page load by using cache to serve your visitors.
With me, APC is good enough for VPS or Dedicated Server when I need to cache widgets, $object to save my mySQL Server.
If I have more than 2 Servers, I like to used Memcache , they are good on using memory to cache. However it is up to you, not everyone like memcached, and not everyone like APC.
For caching whole web page, I ran a lot of wordpress, and I used APC, Memcache, Filecache on some Cache Plugins like W3Total Cache. And I see ( my own exp ): Filecache is good for caching whole website, memory cache is good for caching $object
Filecache will increase your CPU if your hard drive is slow, and Memory cache is terrible if you don't have enough memory on your VPS.
An SSD HDD will be super good speed to read / write file, but Memory is always faster. However, Human can't see what is difference between these speed. You only pick one method base on your project and your server ( RAM, HDD ) or are you on a shared web hosting?
If I am on a shared hosting, without root permission, without php.ini, I like to use phpFastCache, it a simple file cache method with set, get, stats, delete only.
In Addition, I like to use .htaccess to cache static files like images, js, css or by html headers. They will help visitors speed up your page, and save your server bandwidth.
And If you can use .htaccess to redirect to static .html cache if you cache whole page is a great thing.
In future, APC or some Optcache will be bundle into PHP version, but I am sure all the cache can't speed up your code, they use to:
Reduce Database / Query calls.
Improve the speed of page load by use cache to serve.
Save your API Transactions ( like Bing ) or cURL request...
etc...
A lot of times, when it comes to PHP web applications, the database is the bottleneck. As such, one of the best things you can do is to use memcached to cache results in memory. You can also use something like xhprof to profile your code, and really dial in on what's taking the most time.
Yes, those are two different cache-techniques, and you've understood them correctly.
but beware on 1):
1.) Caching script generated output to files or proxies may render problems
if content change rapidly.
2.) x-cache exists too and is easy to install on ubuntu.
regards,
/t
I don't know if this really would work, but I came across a performance problem with a PHP script that I had. I have a plain text file that stores data as a title and a URL tab separated with each record separated by a new line. My script grabs the file at each URL and saves it to its own folder.
Then I have another page that actually displays the local files (in this case, pictures) and I use a preg_replace() to change the output of each line from the remote url to a relative one so that it can be displayed by the server. My tab separated file is now over 1 MB and it takes a few SECONDS to do the preg_replace(), so I decided to look into output caching. I couldn't find anything definitive, so I figured I would try my own hand at it and here's what I came up with:
When I request the page to view stuff locally, I try to read it from a variable in a global scope. If this is empty, it might be that this application hasn't run yet and this global needs populated. If it was empty, read from an output file (plain html file that literally shows everything to output) and save the contents to the global variable and then display the output from the global.
Now, when the script runs to update the tab separated file, it updates the output file and the global variable. This way, the portion of the script that actually does the stuff that runs slowly only runs when the data is being updated.
Now I haven't tried this yet, but theoretically, this should improve my performance a lot, although it does actually still run the script, but the data would never be out of date and I should get a much better load time.
Hope this helps.