Page cache in PHP that handles concurrency? - php

I've read previous answers here about caching in PHP, and the articles they link to. I've checked out the oft-recommended Pear Cache_Light, QuickCache, and WordPress Super Cache. (Sorry - apparently I'm allowed to hyperlink only once.)
Either none deal with concurrency issues, or none explicitly call out that they do in their documentation.
Can anyone point me in the direction of a PHP page cache that handles concurrency?
This is on a shared host, so memcache and opcode caches are unfortunately not an option. I don't use a templating engine and would like to avoid taking a dependency on one. WP Super Cache's approach is preferable - i.e. storing static files under wwwroot to let Apache serve them - but not a requirement.
Thanks!
P.S. Examples of things that should be handled automatically:
Apache / the PHP cache is in the middle of reading a cached file. The cached file becomes obsolete and deletion is attempted.
A cached file was deleted because it was obsolete. A request for that file comes in, and the file is in the process of being recreated. Another request for the file comes in during this.

It seems PEAR::Cache_Lite has some kind of security to deal with concurrency issues.
If you take a look at the manual of constructor Cache_Lite::Cache_Lite, you have those options :
fileLocking
enable / disable fileLocking. Can avoid cache corruption under bad
circumstances.
writeControl
enable / disable write control. Enable write control will lightly slow
the cache writing but not the cache
reading. Write control can detect some
corrupt cache files but maybe it's not
a perfect control.
readControl
enable / disable read control. If enabled, a control key is embeded in
cache file and this key is compared
with the one calculated after the
reading
readControlType
Type of read control (only if read control is enabled). Must be 'md5'
(for a md5 hash control (best but
slowest)), 'crc32' (for a crc32 hash
control (lightly less safe but
faster)) or 'strlen' (for a length
only test (fastest))
Which one to use is still up to you, and will depend on what kind of performance you are ready to sacrifice -- and the risk of concurrency access that probably exists in your application.
You might also want to take a look at Zend_Cache_Frontend_Output, to cache a page, using something like Zend_Cache_Backend_File as backend.
That one seems to support some kind of security as well -- the same kinf of stuff that Cache_Lite already gave you (so I won't copy-paste a second time)
As a sidenote, if your website runs on a shared host, I suppose it doesn't have that many users ? So the risks of concurrent access are probably not that high, are they ?
Anyway, I probably would not search any farther that what those tow Frameworks propose : it is already probably more than enough for the needs of your application :-)
(I've never seen any caching mecanism "more secure" than what those allow you to do... And i've never run into some catastrophic concurrency problem of that sort yet... In 3 years of PHP-development)
Anyway : have fun !

I would be tempted to modify one of the existing caches. Zend Framework's cache should be able to do the trick. If not, I would change it.
You could create a really primitive locking strategy. The database could be used to track all of the cached items, allow locking for update, allow people to wait for someone else's update to complete, ...
That would handle your ACID issues. You could set the lock for someone else's update to a very short period, or possibly have it just skip the cache altogether for that round trip depending on your server load/capacity and the cost of producing the cached content.
Jacob

Concurrent resource creation aka cache slamming / thread race can be a serious issue on busy websites. That's why I've created cache library that synchronize read/write processes/threads.
It has elegant and clear structure: interfaces -> adaptors -> classes for easy extension. At github page im explaining in details what's the problem with slamming and how The Library is resolving it.
Check it here:
https://github.com/tztztztz/php-no-slam-cache

Under Linux, generally, the file will remain "open" for read, even if it's "deleted" until the process closes the file. This is something built into the system, and can sometimes cause huge discrepancies in disk usage sizes (deleting a 3G file while it's still "open" would mean that is still allocated on the disk as in use until the process closes it) - I'm unsure as to whether the same is true under linux.
Assuming a Journalling Filesystem (most Linux Filesystems, and NTFS) - then the file should not be seen as "created" until the process closes the file. This should show up as a non-existant file!

Assuming a Journalling Filesystem (most Linux Filesystems, and NTFS) -
then the file should not be seen as "created" until the process
closes the file. This should show up as a non-existant file!
Nope, it is visible as soon as it is created, you have to lock it.
Rename is atomic though. So you could open(), write(), close(), rename(), but this will not prevent the same cache item being re-created twice at the same time.
A cached file was deleted because it was obsolete.
A request for that file comes in, and the file is in the process of being recreated. Another request for the file comes in during this.
If it is not locked, a half-complete file will be served, or two processes will try to regenerate the same file at the same time, giving "interesting" results.

You could cache pages in the database, just create a simple "name,value" table and store cached pages on it.

Related

php Apc caching or File caching for semi-static website?

i'm new in PHP and want to try caching(for the first time), so i make website and it has :
dynamic home page
dynamic portfolio page
dynamic contact page
static about page
static admin page
so i read the tutorial about caching and i try to make my own caching system:
using file cache based on the what page is requested, when the page is requested the cache system will check if there's cache in cache directory if there's no cache file yet then write all the output(html) from the php script(in this case output from output buffer) and if there's cache file that corresponds with the specific id(based on URI) then just include_once() the html file.
Then i read in CodeIgniter(i make this website using CI) says there's APC for caching, then i read again about APC, what i read about APC is that it caches the DB results, but now i'm confused which should i use
what i get so far:
file caching probably would slower if there's alot of request (i dont know if this is true or not but i read it somewhere from search engine)
APC is fast
but i'm still confused which i should use , i'm on shared hosting
The levels of caching most relevant in a PHP application:
File / Script caching - The operating system will actually do this to a large extent. When a file is opened it's added to an OS-level cache. It stays there until the file is touched or the OS needs to free memory for other processes. A homegrown PHP solution isn't a good replacement for this.
Opcode caching - In order to function, PHP needs to parse and compile a script into opcodes. A mechanism like APC will cache the opcodes of every PHP script executed by Apache, provided that the cache doesn't overflow. A homegrown PHP solution build on top of APC can partially do this, but APC already does it ... so don't bother.
Query caching - If your script accesses a lot of data that doesn't change very frequently, or wherein some latency between updates and the visibility of those updates is acceptable, caching the results from complex queries is beneficial. A homegrown PHP solution built on APC is acceptable and beneficial at this level. But a database level solution is also appropriate here, and often more appropriate.
Output caching - If your page is largely deterministic and/or the same sort of latency applicable to query caching is acceptable, you can cache the entire output of the script using output buffering and APC. A homegrown PHP solution built on APC is acceptable here, but generally not necessary. If the page is static, you're probably not saving yourself any re-computation. And if it's dynamic, it's usually preferable to just re-render the page anyway.
In a dedicated or virtual-dedicated environment you'd need install APC (or something similar) yourself. But, in a shared hosting environment, it's very likely that APC is installed. And if it weren't you couldn't install it yourself anyway.
And, due to my own uncertainty, I'd recommend not performing any query or output caching with APC in a shared environment -- I'm not sure whether APC segregates caches by virtual host. Even if it does, I wouldn't assume that my site is truly a separate virtual host.

Ignore caching of a specific file with APC

Is there a way to prevent a specific file from being opcode cached with APC? The use case is as follows:
An application that sits on the cloud, which dynamically resizes itself (spinning up and down servers as required). The config.php script must know of the new IPs as they become available or unavailable.
Since these changes happen frequently enough, and the config.php file is fairly basic, it would be ideal to not have to worry about clearing APC just for the one file.
Clearing the one file out of APC is definitely a possibility, but since you can't access APC via the command line, the solution ends up being rather inelegant.
I have a similar use case. I've asked myself the same question many times, and I have not been able to find a solution. However, my solution has been to create a quick script that takes care of clearing the APC cache for each server. Every time I rebuild the app, I need to hit the file on each server to clear the opcode cache using apc_clear_cache If you only have to clear one file, you may be better off with apc_compile_file
Hope this helps.
Yes. you should check out the apc.filter configuration directive. Another Question | PHP Docs
I don't know of a way to do what you're suggesting, but you should be able to engineer your way around it.
The obvious solution is to not store the data in a php file. Since you've already got APC, why not just keep the configuration data in APC (as cached data, not opcodes).
So whatever modifies config.php, would now do something like this:
Modify some non-php file (something.ini, or something like that)
Invalidate the APC cache entry.
When config.php needed the data, it would typically read from the cache. If the cache has been invalidated, it reads/parses the data from the ini file, updates the cache, and proceeds as usual.
At the end of the day, you're using an opcode cache to cache data. You should use a data cache instead. Luckily, APC provides both.

MySQL query cache vs caching result-sets in the application layer

I'm running a php/mysql-driven website with a lot of visits and I'm considering the possibility of caching result-sets in shared memory in order to reduce database load.
However, right now MySQL's query cache is enabled and it seems to be doing a pretty good job since if I disable query caching, the use of CPU jumps to 100% immediately.
Given that situation, I dont know if caching result-sets (or even the generated HTML code) locally in shared memory with PHP will result in any noticeable performace improvement.
Does anyone out there have any experience on this matter?
PS: Please avoid suggesting heavy-artillery solutions like memcached. Right now I'm looking for simple solutions that dont require too much time to implement, deploy and maintain.
Edit:
I see my comment about memcached deviated answers from the actual point, which is whether caching DB queries in the application layer would result in a noticeable performace impact considering that the result of those queries are already being cached at the DB level.
I know you didn't want to hear about memcached, but it is one of the best solutions for what you're trying to do. Depending on your site usage, there can be massive improvements in performance. By simply using memcached's session handler over my database session handler, I was able to cut the load in half and cut back on request serving times by over 30%.
Realistically, memcached is a simple solution. It's already integrated with PHP (if you have the extension loaded), and it requires virtually no configuration (I simply had to add memcached as a service on my linux box, which is done in one or two shell commands).
I would suggest storing session data (and anything that lends itself to caching) in memcache. For dynamic pages (such as stack overflow homepage), I would recommend caching output for a couple of seconds to prevent flooding.
A decent single box solution is file-based caching, but you have to sweep them out manually. Other than that, you could use APC, which is very fast and in-memory (still have to expire them yourself though).
As soon as you scale past one web server, though, you're going to need a shared cache, which is memcached. Why are you so adamant about not deploying this? It's not hard, and it's just going to save you time down the road. You can either start using memcache now and be done with it, or you could use one of the above methods for now and then end up switching to memcache later anyways, resulting in even more work. Plus too, you don't have to deal with running a cronjob or some other ugly hack to get cache expiration features: it does that for you.
The mysql query cache is nice, but it's not without issues. One of the big ones is it expires automatically every time the source data is changed, which you probably don't want.

Rolling and packing PHP scripts

I was just reading over this thread where the pros and cons of using include_once and require_once were being debated. From that discussion (particularly Ambush Commander's answer), I've taken away the fact(?) that any sort of include in PHP is inherently expensive, since it requires the processor to parse a new file into OP codes and so on.
This got me to thinking.
I have written a small script which will "roll" a number of Javascript files into one (appending the all contents into another file), such that it can be packed to reduce HTTP requests and overall bandwidth usage.
Typically for my PHP applications, I have one "includes.php" file which is included on each page, and that then includes all the classes and other libraries which I need. (I know this isn't probably the best practise, but it works - the __autoload feature of PHP5 is making this better in any case).
Should I apply the same "rolling" technique on my PHP files?
I know of that saying about premature optimisation being evil, but let's take this question as theoretical, ok?
There is a problem with Apache/PHP on Windows which causes the application to be extremely slow when loading or even touching too many files (page which loads approx. 50-100 files may spend few seconds only with file business). This problem appears both with including/requiring and working with files (fopen, file_get_contents etc).
So if you (or more likely anybody else, due to the age of this post) will ever run your app on apache/windows, reducing the number of loaded files is absolutely necessary for you. Combine more PHP classes into one file (an automated script for it would be useful, I haven't found one yet) or be careful to not touch any unneeded file in your app.
That would depend somewhat on whether it was more work to parse several small files or to parse one big one. If you require files on an as-needed basis (not saying you necessarily should do things that way ) then presumably for some execution paths there would be considerably less compilation required than if all your code was rolled into one big PHP file that the parser had to encode the entirety of whether it was needed or not.
In keeping with the question, this is thinking aloud more than expertise on the internals of the PHP runtime, - it doesn't sound as though there is any real world benefit to getting too involved with this at all. If you run into a serious slowdown in your PHP I would be very surprised if the use of require_once turned out to be the bottleneck.
As you've said: "premature optimisation ...". Then again, if you're worried about performance, use an opcode cache like APC, which makes this problem almost disappear.
This isn't an answer to your direct question, just about your "js packing".
If you leave your javascript files alone and allow them to be included individually in the HTML source, the browser will cache those files. Then on subsequent requests when the browser requests the same javascript file, your server will return a 304 not modified header and the browser will use the cached version. However if your "packing" the javascript files together on every request, the browser will re-download the file on every page load.

How do I implement a HTML cache for a PHP site?

What is the best way of implementing a cache for a PHP site? Obviously, there are some things that shouldn't be cached (for example search queries), but I want to find a good solution that will make sure that I avoid the 'digg effect'.
I know there is WP-Cache for WordPress, but I'm writing a custom solution that isn't built on WP. I'm interested in either writing my own cache (if it's simple enough), or you could point me to a nice, light framework. I don't know much Apache though, so if it was a PHP framework then it would be a better fit.
Thanks.
You can use output buffering to selectively save parts of your output (those you want to cache) and display them to the next user if it hasn't been long enough. This way you're still rendering other parts of the page on-the-fly (e.g., customizable boxes, personal information).
If a proxy cache is out of the question, and you're serving complete HTML files, you'll get the best performance by bypassing PHP altogether. Study how WP Super Cache works.
Uncached pages are copied to a cache folder with similar URL structure as your site. On later requests, mod_rewrite notes the existence of the cached file and serves it instead. other RewriteCond directives are used to make sure commenters/logged in users see live PHP requests, but the majority of visitors will be served by Apache directly.
The best way to go is to use a proxy cache (Squid, Varnish) and serve appropriate Cache-Control/Expires headers, along with ETags : see Mark Nottingham's Caching Tutorial for a full description of how caches work and how you can get the most performance out of a caching proxy.
Also check out memcached, and try to cache your database queries (or better yet, pre-rendered page fragments) in there.
I would recommend Memcached or APC. Both are in-memory caching solutions with dead-simple APIs and lots of libraries.
The trouble with those 2 is you need to install them on your web server or another server if it's Memcached.
APC
Pros:
Simple
Fast
Speeds up PHP execution also
Cons
Doesn't work for distributed systems, each machine stores its cache locally
Memcached
Pros:
Fast(ish)
Can be installed on a separate server for all web servers to use
Highly tested, developed at LiveJournal
Used by all the big guys (Facebook, Yahoo, Mozilla)
Cons:
Slower than APC
Possible network latency
Slightly more configuration
I wouldn't recommend writing your own, there are plenty out there. You could go with a disk-based cache if you can't install software on your webserver, but there are possible race issues to deal with. One request could be writing to the file while another is reading.
You actually could cache search queries, even for a few seconds to a minute. Unless your db is being updated more than a few times a second, some delay would be ok.
The PHP Smarty template engine (http://www.smarty.net) includes a fairly advanced caching system.
You can find details in the caching section of the Smarty manual: http://www.smarty.net/manual/en/caching.php
You seems to be looking for a PHP cache framework.
I recommend you the template system TinyButStrong that comes with a very good CacheSystem plugin.
It's simple, light, customizable (you can cache whatever part of the html file you want), very powerful ^^
Simple caching of pages, or parts of pages - the Pear::CacheLite class. I also use APC and memcache for different things, but the other answers I've seen so far are more for more complete, and complex systems. If you just need to save some effort rebuilding a part of a page - Cache_lite with a file-backed store is entirely sufficient, and very simple to implement.
Project Gazelle (an open source torrent site) provides a step by step guide on setting up Memcached on the site which you can easily use on any other website you might want to set up which will handle a lot of traffic.
Grab down the source and read the documentation.

Categories