Choosing a PHP caching technique: output caching into files vs. opcode caching

Choosing a PHP caching technique: output caching into files vs. opcode caching - php

I've heard of two caching techniques for the PHP code:
When a PHP script generates output it stores it into local files. When the script is called again it check whether the file with previous output exists and if true returns the content of this file. It's mostly done with playing around the "output buffer". Somthing like this is described in this article.
Using a kind of opcode caching plugin, where the compiled PHP code is stored in memory. The most popular of this one is APC, also eAccelerator.
Now the question is whether it make any sense to use both of the techniques or just use one of them. I think that the first method is a bit complicated and time consuming in the implementation, when the second one seem to be a simple one where you just need to install the module.
I use PHP 5.3 (PHP-FPM) on Ubuntu/Debian.
BTW, are there any other methods to cache PHP code or output, which I didn't mention here? Are they worth considering?

You should always have an opcode cache like APC. Its purpose is to speed up the parsing of your code, and will be bundled into PHP in a future version. For now, it's a simple install on any server and doesn't require you write or change any code.
However, caching opcodes doesn't do anything to speed up the actual execution of your code. Your bottlenecks are usually time spent talking to databases or reading to/from disk. Caching the output of your program avoids unnecessary resource usage and can speed up responses by orders of magnitude.
You can do output caching many different ways at many different places along your stack. The first place you can do it is in your own code, as you suggested, by buffering output, writing it to a file, and reading from that file on subsequent requests.
That still requires executing your PHP code on each request, though. You can cache output at the web server level to skip that as well. Crafting a set of mod_rewrite rules will allow Apache to serve the static files instead of the PHP code when they exist, but you'll have to regenerate the cached versions manually or with a scheduled task, since your PHP code won't be running on each request to do so.
You can also stick a proxy in front of your web server and use that to cache output. Varnish is a popular choice these days and can serve hundreds of times more request per second with caching than Apache running your PHP script on the same server. The cache is created and configured at the proxy level, so when it expires, the request passes through to your script which runs as it normally would to generate the new version of the page.

You know, for me, optcache , filecache .. etc only use for reduce database calls.
They can't speed up your code. However, they improve the page load by using cache to serve your visitors.
With me, APC is good enough for VPS or Dedicated Server when I need to cache widgets, $object to save my mySQL Server.
If I have more than 2 Servers, I like to used Memcache , they are good on using memory to cache. However it is up to you, not everyone like memcached, and not everyone like APC.
For caching whole web page, I ran a lot of wordpress, and I used APC, Memcache, Filecache on some Cache Plugins like W3Total Cache. And I see ( my own exp ): Filecache is good for caching whole website, memory cache is good for caching $object
Filecache will increase your CPU if your hard drive is slow, and Memory cache is terrible if you don't have enough memory on your VPS.
An SSD HDD will be super good speed to read / write file, but Memory is always faster. However, Human can't see what is difference between these speed. You only pick one method base on your project and your server ( RAM, HDD ) or are you on a shared web hosting?
If I am on a shared hosting, without root permission, without php.ini, I like to use phpFastCache, it a simple file cache method with set, get, stats, delete only.
In Addition, I like to use .htaccess to cache static files like images, js, css or by html headers. They will help visitors speed up your page, and save your server bandwidth.
And If you can use .htaccess to redirect to static .html cache if you cache whole page is a great thing.
In future, APC or some Optcache will be bundle into PHP version, but I am sure all the cache can't speed up your code, they use to:
Reduce Database / Query calls.
Improve the speed of page load by use cache to serve.
Save your API Transactions ( like Bing ) or cURL request...
etc...

A lot of times, when it comes to PHP web applications, the database is the bottleneck. As such, one of the best things you can do is to use memcached to cache results in memory. You can also use something like xhprof to profile your code, and really dial in on what's taking the most time.

Yes, those are two different cache-techniques, and you've understood them correctly.
but beware on 1):
1.) Caching script generated output to files or proxies may render problems
if content change rapidly.
2.) x-cache exists too and is easy to install on ubuntu.
regards,
/t

I don't know if this really would work, but I came across a performance problem with a PHP script that I had. I have a plain text file that stores data as a title and a URL tab separated with each record separated by a new line. My script grabs the file at each URL and saves it to its own folder.
Then I have another page that actually displays the local files (in this case, pictures) and I use a preg_replace() to change the output of each line from the remote url to a relative one so that it can be displayed by the server. My tab separated file is now over 1 MB and it takes a few SECONDS to do the preg_replace(), so I decided to look into output caching. I couldn't find anything definitive, so I figured I would try my own hand at it and here's what I came up with:
When I request the page to view stuff locally, I try to read it from a variable in a global scope. If this is empty, it might be that this application hasn't run yet and this global needs populated. If it was empty, read from an output file (plain html file that literally shows everything to output) and save the contents to the global variable and then display the output from the global.
Now, when the script runs to update the tab separated file, it updates the output file and the global variable. This way, the portion of the script that actually does the stuff that runs slowly only runs when the data is being updated.
Now I haven't tried this yet, but theoretically, this should improve my performance a lot, although it does actually still run the script, but the data would never be out of date and I should get a much better load time.
Hope this helps.

Related

Server file read/write concurrency issue

I have a web-service serving from a MySQL database. I would like to create cache file to improve the performance. The idea is once a while we read data from DB and generate a text file. My question is:
What if a client-side user is accessing the file while we are generating it?
We are using LAMP. In PHP there is flock() handles concurrency problem, but my understanding is that it's only for when 2 PHP processes accessing the file simultaneously. Our case is different.
I don't know whether this will cause issues at all. If so, how can I prevent it?
Thanks,

don't use locking;
if your cachefile is /tmp/cache.txt then you should always regenerate the cache to /tmp/cache2.txt and then do a
mv /tmp/cache2.txt /tmp/cache.txt
or
rename('/tmp/cache2.txt','/tmp/cache.txt')
the mv/rename operation is atomic if it happens inside the same filesystem; no locking needed

All sorts of optimisation options here;
1) Are you using the MySQL queryCache - that can take a huge load off the database to start with.
2) You could pull the file through a web proxy like squid (or Apache configured as a reverse caching proxy). I do this all the time and it's a really handy technique - generate the file by fetching it from a url using wget for example (that way you can have it in a cron job). The web proxy takes care of either delivering the same file that was there before, or regenerating it if needs be.
3) You don't want to be rolling your own file locking solution in this scenario.
Depending on your scenario, you could also consider cacheing pages in something like memcache which is fantastic for high traffic scenarios, but possibly beyond the scope of this question.

You can use A -> B switching to avoid this issue.
E.g. : Let there be two copies of this cache file A and B, program should read these via a symlink, C.
When program is building the cache, it would modify the file that is not "current" I.e. if C link to A, update B. Once update is complete, switch symlink to B.
next time, update A and switch symlink to A once update is complete.
this way clients would never read a file while it is being updated.

When a client-side access the file, it reads it as it is in that moment.
flock() is for when 2 PHP processes accessing the file simultaneously.

I would solve it like this:
While generating the new text file, save it to a temporary file (cache.tmp), that way the old file (cache.txt) is being accessed like before.
When generation is done, delete the old file and rename the new file
To avoid problems during that short period of time, your code should check wether cache.txt exists and retry for a short period of time.
Trivial but that should do the trick

Custom PHP cache

I have an application that is in need of caching large amounts of data (sometimes even MBs) over multiple page request (for the same user/session). After doing some Googling etc. I've concluded that it is likely best to implement the caching mechanism by writing cache files to disk (please correct me if you think there are better alternatives).
Now, my idea was to have a root cache folder, within which I create folders for each session ID to not overwrite any cached data used in separate sessions. Then for each block of data I will create an unique identifier which can be linked to the data whenever I want to retrieve it again. The data will then be serialized to a string format (using the default PHP 'serialize' function) after which it is written to the appropriate file.
The thing I'm not so sure on how to implement is the clean up of the cached files. At some point either the data is not needed anymore, for example when the session has expired or a number of other reasons. Since it will likely be too much overhead to check for this during each page request, I expect to have to do this externally using some kind of scheduler. However, I cannot guarantee that my application will run on a UNIX environment, so I'd have to consider other platforms as well (Windows, Mac). Is there a general solution that anyone can think of that would be cross-platform without too much hassle?
I'm also thinking that there maybe is a way to intelligently check or mark certain files to be cleaned up, without have to check all the existing files separately. I was considering maybe storing their last accessed timestamp or something, but there may be other criteria besides time that could make the cached data obsolete, such as an exception being triggered in the application (though I could say that whenever that happens the entire cache for that sessions will be emptied or something like that).
Any suggestions on these issues would be very much appreciated!

If you got MemCache installed, you can use that for caching. It is faster that file cache, and you can give it an expiration time, so it will automatically be removed from the cache after a given period of time.

Both Windows and Unix have scheduled job support - cron for Unix/Linux, and 'at' for Windows. It would be a simple matter to whip up a PHP script to scan your cache directory and apply your deletion criteria to what it finds. Last access timestamp is trivial, basing it on cached file contents or other triggers slightly less so.

Using too many PHP includes a bad idea?

My question is whether or not using multiple PHP includes() is a bad idea. The only reason I'm asking is because I always hear having too many stylesheets or scripts on a site creates more HTTP requests and slows page loading. I was wondering the same about PHP.

The detailed answer:
Every CSS or JS file referenced in a web page is actually fetched over the network by the browser, which involves often 100s of milliseconds or more of network latency. Requests to the same server are (by convention, though not mandated) serialized one or two at a time, so these delays stack up.
PHP include files, on the other hand, are all processed on the server itself. Instead of 100s of milliseconds, the local disk access will be 10s of milliseconds or less, and if cached, will be direct memory accesses which is even faster.
If you use something like http://eaccelerator.net/ or http://php.net/manual/en/book.apc.php then your PHP code will all be precompiled on the server once, and it doesn't even matter if you're including files or dumping them all in one place.
The short answer:
Don't worry about it. 99 times out of 100 with this issue, the benefits of better code organization outweigh the performance increases.

The use of includes helps with code organization, and is no hindrance in itself. If you're loading up a bunch of things you don't need, that will slow things down -- but that's another problem. Clarification: As you include pages, be aware what you're adding to the load; don't carelessly include unneeded resources.

As already said, the use of multiple PHP includes helps to keep your code organized, so it is not a bad idea. It will become a problem when too many includes are used, because the web server will have to perform an I/O operetation for each include you have.
If you have a large web application, you can boost it using a PHP accelerator, which caches data and compiled code from the PHP bytecode compiler in shared memory. If you have lots of PHP includes in a specific file they will be performed just once. Further calls to that file will hit the cache, so no PHP require will be performed.

It really depends on what you want to do, I mean if you have a piece of code that is used all the time it is really convenient to include it instead of copying and pasting all the time and that will make your code more clear and not slower, but if you include all the functions or classes you have written in your files without using them of course thats not a good practice...I would suggest using a framework (like codeigniter or something else you find convinient) because it really helps clearing this things out... good luck!

The only reason I'm asking is because
I always hear having too many
stylesheets or scripts on a site
creates more HTTP requests and slows
page loading. I was wondering the same
about PHP.
Do notice that an HTTP request is several orders of magnitude slower that a PHP include.
Several HTTP requests -> Client has to request and accept over the wire several files
Several PHP includes -> Client has to request and accept only one file
The includes, obviously, will have a server penalty. But by your question... don't sweat it. Only on really large-scale PHP projects will you face such problems.

Is a PHP-only "cache engine" ever worth it?

I wrote a rather small skeleton for my web apps and thought that I would also add a small cache for it.
It is rather simple:
If the current page exists as a file in the cache and the file isn't too old, read it out and exit instead of rebuilding the page
If the current page isn't cached/outdated recalc the page and save it
However, the bad thing about it is:
My performance tests with a page that receives 40 relatively long posts via a MySQL query said that with using the cache, it took even longer to handle a single request (1000 tests each)
How can that happen?
How can doing a MySQL query, looping through the results the first time, passing the results to the template and then looping through the results for the second time be faster than a filemtime() check and a readout?
Should I just remove the complete raw-PHP cache and relieve on the availability of some PHP cache like memcached or so?

Premature optimization is the root of all evil. If you don't need a cache, don't use a cache.
That being said, if you are content to not serve up dynamic content per request, you might want to look into using a caching proxy such as varnish and cutting out PHP and the webserver entirely. There's quite a bit of overhead to get to even your first line of PHP, and serving static files through PHP is a little dirty.
If you just want to cache elements, something like memcached or APC's cache is the way to go. APC has the advantage of being more readily available (you should have APC installed on your servers for the opcode cache if you care at all about performance) and memcached has the option of letting you have a cache that's accessible by multiple webservers (and/or multiple caches)

I don't think so, there seems to be some other problem depending on your implementation. Here are couple of great resources more about it:
http://www.mnot.net/cache_docs/
http://blog.digitalstruct.com/2008/02/27/php-performance-series-caching-techniques/

Possible -
If you are using apc and mysql query cache (default on) , then your php code is already executed and stored as opcode in apc and if you hit the same query repeatedly then the mysql query cache will cache the database results too. In this case most of the data comes from memory so your file read could be slower. The real benefit of this approach is that you will be saving the number of mysql connections but may not be performance.
Using a caching proxy like squid should be the ideal solution in your case so that the page is served directly from cache till it expires. Another optimization for the above situation would be to just cache the mysql output in memcache which would be better than hitting mysql and query cache is small in size. Lastly if you indeed want to cache the mark up generated, you could use output buffering (ob_start) and store the output directly in memcache.

I think not.
The slowest part of the application is probably the database traffic.
If you include an HTML / PHP cached file it should be faster.

Speedup a php web site

What's the best way (ways?) to speed up a php web site and how much faster it can using this or that way?

PHP isn't really the kind of language where you can do micro-optimizations, or just work on the code alone. There's really no point. Although PHP isn't particularly fast, PHP itself is rarely the bottleneck in a given web site.
You need to work out where that bottleneck is before you can fix it. There are a lot of common bottlenecks, with common solutions. It's difficult to generalize, given so few details, but there are a lot of performance hints that apply to most web sites.
The first good place to look is actually on the client side, rather than the server side. How large are your pages (including images, CSS, JavaScript and the like)? How many HTTP requests does a single page view require? Use something like Firebug (and the YSlow add-on for Firebug) to see how long your page actually takes to load, and which bits of your page cause the problem. Some general hints:
Work out ways to shrink the CSS and JavaScript - remove anything you don't need, and run the rest through a tool like YUI Compressor.
If you have multiple CSS and JavaScript files, try to combine them into a single file.
Optimize all of your images as much as possible, and see if you can combine any of those into a single file using CSS sprites or similar. PunyPNG is good for lossless images. A decent JPEG encoder (NOT Photoshop) is good for photos.
Move the CSS to the top of the page, and the JavaScript to the bottom, so the browser can render the page before the JavaScript has finished downloading.
Make sure that all of your CSS, JavaScript and HTML are being served compressed.
Make sure that you're using appropriate caching - if a file hasn't changed, there's no point in re-downloading it.
Once you've got the client side out of the way, you might have to turn your attention to the server side.
Install an opcode cache, like APC, XCache, or Zend Optimizer. It's very easy to do, and will always provide some improvement. Once you've done that, profile your pages, to find out where the time is actually being spent.
More likely than not, you'll be spending most of your time waiting for the database to return results. So, at a bare minimum:
Work out which queries are taking the longest, and work on them first. Use your head though - a query that takes five seconds on an admin page that nobody looks at is not as important as a query that takes one second on the front page.
Make sure that your query uses appropriate indexes. No common query should ever need to do a full table scan. Certain kinds of sorting or grouping may be unable to use indexes - try to avoid them, or modify the query so that it can use indexes.
Make sure that your queries aren't using temporary tables.
Use the EXPLAIN keyword - it's very useful.
Tune the database server itself. MySQL is generally not optimized for performance.
Once you've done that, it's usually best to start working out how to use caching. The best way to speed PHP code up is to reduce the amount of work it has to do.
Make sure your database's query cache is working properly.
Use something like Memcached to store frequently used results, instead of getting them from the database.
If you have enough memory, try to keep everything in Memcached, resorting to the database only when something isn't present in the cache.
If you have chunks of pages that are dynamic, but the same for all users, try caching those chunks. For example, if two users are looking at an article, the article itself is going to be exactly the same for each user, even if the rest of the page isn't. Generate the HTML for the article, and chuck it in the cache.
If you have lots of non-authenticated users, it's entirely possible that they'll all be seeing the exact same page. Two non-authenticated users looking at the above article won't just see an identical article - they'll see an identical page, right down to the login links. Set your PHP scripts up so you can use HTTP caching headers (check the last modified date, and return a 304 Not Modified if it's not been changed). Once you've done that, stick a Squid reverse-proxy in front of the webserver, and let Squid serve pages out of it's cache.
After that point, the general approach is to start using more servers, and the problem becomes one of scaling, rather than raw speed. The general plan is to make sure that your website has a shared-nothing architecture - all persistent data is stored in the database. Then, you install multiple webservers, move the database server to a separate machine, and run the entire thing behind a caching reverse proxy. To add more capacity, you add more machines.

One way: php accelerators, e.g. APC.
Another; read blog articles, e.g. performance tuning overview.

A general question i would say. Try looking for optimazation tips online...
Several parameters are involved:
I/O access (using it a lot - file_exists, is_file overheads)
Database access (optimize queries, use stored procedures, check your db cache)
Using an opcode cache (like APC)
Compressing output
Serving js/css minified and compressed (and using subdomains to deliver them to the browser)
Using memcache to cache data into memory for faster access
You can use benchmarking tools to test your environment before and after the optimizations.
Try apache bench for example.

Filesize.
A file of 500 KB takes longer to download then a file of 300 KB. So optimize and crop as much as you can.
Accelators
Self explainable: List of PHP accelerators
Server upgrade
Though this costs money, when dealing with a lot of traffic, it will have impact on how fast the .php files gets processes and how fast data will be send to the user.
I don't recommend this though since there are other (free) ways to improve speed.
Don't user external resources
If you are linking some images trough other sites, the speed of the downloading will not be in your control. Instead, if you plan on using images from others download them to your own server first (or upload them to your own provider) and load them that way.
Review and improve your code
Find short cuts, remove unnecessary code, delete unused variables, reuse others etc.
There are other ways but I believe the above information has the most impact on your speed

You should probably do some search for existing answers to this question, however...
APC for opcode caching
Memcached for object storing (to reduce the number of database queries)
Check for / optimize slow SQL queries
Measure and find bottlenecks
Don't rely on (slow) web services on each page load, etc.

Yahoo has got some good basic advice on speeding up web pages, much of it very easy to implement. You may also want to download yslow + firebug for firefox; they will help indicate possible basic bottlenecks from a client request perspective.
The rest of the advice here is good, so I wont add much else other than; don't bother optimising any code until you're 100% sure that you've found a bottleneck. I can't stress that enough. Don't waste time tweaking code or implementing new things (ie caching) because you "feel" will make things quicker, act only on real evidence (ie performance profiling).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.