I wrote a rather small skeleton for my web apps and thought that I would also add a small cache for it.
It is rather simple:
If the current page exists as a file in the cache and the file isn't too old, read it out and exit instead of rebuilding the page
If the current page isn't cached/outdated recalc the page and save it
However, the bad thing about it is:
My performance tests with a page that receives 40 relatively long posts via a MySQL query said that with using the cache, it took even longer to handle a single request (1000 tests each)
How can that happen?
How can doing a MySQL query, looping through the results the first time, passing the results to the template and then looping through the results for the second time be faster than a filemtime() check and a readout?
Should I just remove the complete raw-PHP cache and relieve on the availability of some PHP cache like memcached or so?
Premature optimization is the root of all evil. If you don't need a cache, don't use a cache.
That being said, if you are content to not serve up dynamic content per request, you might want to look into using a caching proxy such as varnish and cutting out PHP and the webserver entirely. There's quite a bit of overhead to get to even your first line of PHP, and serving static files through PHP is a little dirty.
If you just want to cache elements, something like memcached or APC's cache is the way to go. APC has the advantage of being more readily available (you should have APC installed on your servers for the opcode cache if you care at all about performance) and memcached has the option of letting you have a cache that's accessible by multiple webservers (and/or multiple caches)
I don't think so, there seems to be some other problem depending on your implementation. Here are couple of great resources more about it:
http://www.mnot.net/cache_docs/
http://blog.digitalstruct.com/2008/02/27/php-performance-series-caching-techniques/
Possible -
If you are using apc and mysql query cache (default on) , then your php code is already executed and stored as opcode in apc and if you hit the same query repeatedly then the mysql query cache will cache the database results too. In this case most of the data comes from memory so your file read could be slower. The real benefit of this approach is that you will be saving the number of mysql connections but may not be performance.
Using a caching proxy like squid should be the ideal solution in your case so that the page is served directly from cache till it expires. Another optimization for the above situation would be to just cache the mysql output in memcache which would be better than hitting mysql and query cache is small in size. Lastly if you indeed want to cache the mark up generated, you could use output buffering (ob_start) and store the output directly in memcache.
I think not.
The slowest part of the application is probably the database traffic.
If you include an HTML / PHP cached file it should be faster.
Related
I have installed redis on my server and implemented object caching for data returned within a PHP based web application. The php model essentially executes a reasonably complex query and returns a detailed array of data. I have tested the caching and found everything to be working as expected. I first check to see if the key exists in redis. If it does, redis returns the data, the model unserializes and returns the previously cached data. If the cache has expired, the model executes the sql query, returns the data and sets the key and serialized value in redis.
So here are my questions.
I'm not sure how to really benchmark this as it is all browser based. What tools are there out there that would allow me to get a reasonable benchmark to compare caching and not caching. I'm thinking of perhaps a php script that calls the api 1000 times via curl.
I implemented this in redis because I once read that caching with redis will work across multiple sessions or ip addresses accessing the site. For example, if the api is accessed 1000 time an hour by multiple ip addresses/users, I am assuming this approach will reduce the load on the mysql server and let redis do the work of returning the cached data instead. Can anyone shed some light on this? Are my assumptions valid?
All comments are welcome!
Thanks!
Dave
To benchmark the web site, I would use something like Siege rather than writing a specific PHP script.
Regarding Redis usage, caching things in in-memory stores like memcached or Redis is now very common. Both memcached and Redis are suitable for this purpose, although for pure caching, memcached is arguably easier to setup. 1000 times an hour represents only 3.6 TPS - any data store (including MySQL) will support such traffic without any issue. Now, multiply this traffic by 100 or 1000, and the caching layer (memcached or Redis) will become mandatory to protect your database.
To use Redis for caching, you may want to check the EXPIRE command and have a look at the maxmemory-policy parameter in the configuration file.
I have done extensive testing of cache backends for the Zend_Cache library. The tests were done using multiple php-cli processes and randomized data and considered read performance, write performance and cache tag cleaning performance. If testing just the cache backend the web server performance is not relevant so I recommend testing via CLI to simplify the testing. Also, testing with only one process will not give you an accurate picture of a backend's characteristics under heavy load.
MySQL is very fast itself and if you are doing single-record indexed queries then MySQL's own query cache will be very fast. I'd only recommend adding an additional caching layer for things that are slow (aggregated results of multiple queries or generating chunks of HTML). You can use Zend_Cache without including the entire Zend framework so I highly recommend you check out both Cm_Cache_Backend_Redis and Cm_Cache_Backend_File.
I've heard of two caching techniques for the PHP code:
When a PHP script generates output it stores it into local files. When the script is called again it check whether the file with previous output exists and if true returns the content of this file. It's mostly done with playing around the "output buffer". Somthing like this is described in this article.
Using a kind of opcode caching plugin, where the compiled PHP code is stored in memory. The most popular of this one is APC, also eAccelerator.
Now the question is whether it make any sense to use both of the techniques or just use one of them. I think that the first method is a bit complicated and time consuming in the implementation, when the second one seem to be a simple one where you just need to install the module.
I use PHP 5.3 (PHP-FPM) on Ubuntu/Debian.
BTW, are there any other methods to cache PHP code or output, which I didn't mention here? Are they worth considering?
You should always have an opcode cache like APC. Its purpose is to speed up the parsing of your code, and will be bundled into PHP in a future version. For now, it's a simple install on any server and doesn't require you write or change any code.
However, caching opcodes doesn't do anything to speed up the actual execution of your code. Your bottlenecks are usually time spent talking to databases or reading to/from disk. Caching the output of your program avoids unnecessary resource usage and can speed up responses by orders of magnitude.
You can do output caching many different ways at many different places along your stack. The first place you can do it is in your own code, as you suggested, by buffering output, writing it to a file, and reading from that file on subsequent requests.
That still requires executing your PHP code on each request, though. You can cache output at the web server level to skip that as well. Crafting a set of mod_rewrite rules will allow Apache to serve the static files instead of the PHP code when they exist, but you'll have to regenerate the cached versions manually or with a scheduled task, since your PHP code won't be running on each request to do so.
You can also stick a proxy in front of your web server and use that to cache output. Varnish is a popular choice these days and can serve hundreds of times more request per second with caching than Apache running your PHP script on the same server. The cache is created and configured at the proxy level, so when it expires, the request passes through to your script which runs as it normally would to generate the new version of the page.
You know, for me, optcache , filecache .. etc only use for reduce database calls.
They can't speed up your code. However, they improve the page load by using cache to serve your visitors.
With me, APC is good enough for VPS or Dedicated Server when I need to cache widgets, $object to save my mySQL Server.
If I have more than 2 Servers, I like to used Memcache , they are good on using memory to cache. However it is up to you, not everyone like memcached, and not everyone like APC.
For caching whole web page, I ran a lot of wordpress, and I used APC, Memcache, Filecache on some Cache Plugins like W3Total Cache. And I see ( my own exp ): Filecache is good for caching whole website, memory cache is good for caching $object
Filecache will increase your CPU if your hard drive is slow, and Memory cache is terrible if you don't have enough memory on your VPS.
An SSD HDD will be super good speed to read / write file, but Memory is always faster. However, Human can't see what is difference between these speed. You only pick one method base on your project and your server ( RAM, HDD ) or are you on a shared web hosting?
If I am on a shared hosting, without root permission, without php.ini, I like to use phpFastCache, it a simple file cache method with set, get, stats, delete only.
In Addition, I like to use .htaccess to cache static files like images, js, css or by html headers. They will help visitors speed up your page, and save your server bandwidth.
And If you can use .htaccess to redirect to static .html cache if you cache whole page is a great thing.
In future, APC or some Optcache will be bundle into PHP version, but I am sure all the cache can't speed up your code, they use to:
Reduce Database / Query calls.
Improve the speed of page load by use cache to serve.
Save your API Transactions ( like Bing ) or cURL request...
etc...
A lot of times, when it comes to PHP web applications, the database is the bottleneck. As such, one of the best things you can do is to use memcached to cache results in memory. You can also use something like xhprof to profile your code, and really dial in on what's taking the most time.
Yes, those are two different cache-techniques, and you've understood them correctly.
but beware on 1):
1.) Caching script generated output to files or proxies may render problems
if content change rapidly.
2.) x-cache exists too and is easy to install on ubuntu.
regards,
/t
I don't know if this really would work, but I came across a performance problem with a PHP script that I had. I have a plain text file that stores data as a title and a URL tab separated with each record separated by a new line. My script grabs the file at each URL and saves it to its own folder.
Then I have another page that actually displays the local files (in this case, pictures) and I use a preg_replace() to change the output of each line from the remote url to a relative one so that it can be displayed by the server. My tab separated file is now over 1 MB and it takes a few SECONDS to do the preg_replace(), so I decided to look into output caching. I couldn't find anything definitive, so I figured I would try my own hand at it and here's what I came up with:
When I request the page to view stuff locally, I try to read it from a variable in a global scope. If this is empty, it might be that this application hasn't run yet and this global needs populated. If it was empty, read from an output file (plain html file that literally shows everything to output) and save the contents to the global variable and then display the output from the global.
Now, when the script runs to update the tab separated file, it updates the output file and the global variable. This way, the portion of the script that actually does the stuff that runs slowly only runs when the data is being updated.
Now I haven't tried this yet, but theoretically, this should improve my performance a lot, although it does actually still run the script, but the data would never be out of date and I should get a much better load time.
Hope this helps.
What is the advantage of PHP data cache ?
Where i can use that , is it good only for browser search or
it is good for data export to csv or text also .
How can i achieve data cache using PHP?
Theres a Bunch of Caches for PHP, for starters i would recommend APC http://php.net/manual/de/book.apc.php since scv and text are just strings every Cache Backend including APC is perfectly for caching them.
If you're generating data and you want to prevent computation of the same data over and over again, a good cache to look into is Memcached. Memcached is a piece of software that you run on your server. It stores anything you give it in key/value format in memory, and returns those values when you ask for them on subsequent requests. It isn't persistent (if your server goes down, everything is wiped out), though this can be useful for debugging or management purposes.
Companies like Digg and Facebook, which both rely heavily on PHP, use Memcached extensively to make sure their respective sites are fast.
Personally, I use Memcached to store things like URL routing information (40ms/request speed increase), feed caching (1-3sec/request speed increase), and social graph caching (300-400ms/request speed increase). Depending on what kind of computation you're performing, you can see various types of increases. Generally, for reasonably sized data sets (i.e.: a 1000+ line CSV file), you'll see pretty substantial increases in speed. Keep in mind, though, that Memcached uses RAM and not disk space for storage, so you can easily run out of memory if it is not properly configured. Placing Memcached on a separate server can help to alleviate this, especially on servers with PHP scripts that use a lot of memory.
Hope this helps!
My setup:
4 webservers
Static content server (NFS mount)
2 db servers
2 "do magic" servers
An additional 8 machines designated multi-purpose.
I'm writing a wrapper for three caching mechanisms so that they can be used in a somewhat normalized manner: Filesystem, Memcached and APC. I'm trying to come up with examples for use (and what to actually put in each cache).
File System
Handles content that we generate and then statically serve. RSS feeds, old report data, user specific pages, etc... This is all cached to the static server.
Memcached
PHP session data, MySQL query results, generally things that need to be available across our systems. We have 8 machines that can be included in the server pool.
APC
I have no idea. The two "do magic" servers are not part of any distributed system, so it seems likely that they could cache query results in APC and work from there. Past that, I can't think of anything.
Query Caching
Given the nature of our SQL use, query caching reduces performance. I've disabled this.
In general, what types of data should be stored where? Does this setup even make sense?
Is there any use for an APC data cache in a distributed system (I can't think of one)?
Is there something that I'm missing that would make things easier or more efficient?
Edit: I've figured out what Pascal was saying, finally. I had it stuck in my head that I would only be moving a portion of my config / whatever to APC, and still loading the rest of the file from disk. Any other suggestions?
I'm using the same kind of caching mecanism for some projects ; and we are using APC + memcached as caching systems.
There are two/three main differences between APC and memcached, when it comes to caching data :
APC access is a bit faster (something like 5 times faster than memcached, if I remember correctly), as it's only local -- i.e. no network involved.
Using APC, your cache is duplicated on each server ; using memcached, there is no duplication accross servers
whic means that memcached ensures that all servers have the same version of the data ; while data stored in APC can be different on each server
We generally use :
APC for data that has to be accessed very often, is quick to generate, and :
either is not modified often
or it doesn't matter if it's not identical on all servers
memcached for data that takes more time to generate, and/or is less used.
Or for data for which modifications must be visible immediatly (i.e. when there is a write to the DB, the cached entry is regenerated too)
For instance, we could :
Use APC to store configuration variables :
Don't change often
Are accessed very often
Are small
Use memcached for content of articles (for a CMS application, for example) :
Not that small, and there are a lot of them, which means it might need more memory than we have on one server alone
Pretty hard/heavy to generate
A couple of sidenotes :
If several servers try to write to the same file that's shared via NFS, there can be problems, as there is no locking mecanism on NFS (as far as I remember)
APC can be used to cache data, yes -- but the most important reason to use it is it's opcode caching feature (can save a large amount of CPU on the PHP servers)
Entries in memcached are limited in size : you cannot store an entry that's bigger than 1M (I've sometimes run into that problem -- rarely, but it's not good when it happens ^^ )
What's the best way (ways?) to speed up a php web site and how much faster it can using this or that way?
PHP isn't really the kind of language where you can do micro-optimizations, or just work on the code alone. There's really no point. Although PHP isn't particularly fast, PHP itself is rarely the bottleneck in a given web site.
You need to work out where that bottleneck is before you can fix it. There are a lot of common bottlenecks, with common solutions. It's difficult to generalize, given so few details, but there are a lot of performance hints that apply to most web sites.
The first good place to look is actually on the client side, rather than the server side. How large are your pages (including images, CSS, JavaScript and the like)? How many HTTP requests does a single page view require? Use something like Firebug (and the YSlow add-on for Firebug) to see how long your page actually takes to load, and which bits of your page cause the problem. Some general hints:
Work out ways to shrink the CSS and JavaScript - remove anything you don't need, and run the rest through a tool like YUI Compressor.
If you have multiple CSS and JavaScript files, try to combine them into a single file.
Optimize all of your images as much as possible, and see if you can combine any of those into a single file using CSS sprites or similar. PunyPNG is good for lossless images. A decent JPEG encoder (NOT Photoshop) is good for photos.
Move the CSS to the top of the page, and the JavaScript to the bottom, so the browser can render the page before the JavaScript has finished downloading.
Make sure that all of your CSS, JavaScript and HTML are being served compressed.
Make sure that you're using appropriate caching - if a file hasn't changed, there's no point in re-downloading it.
Once you've got the client side out of the way, you might have to turn your attention to the server side.
Install an opcode cache, like APC, XCache, or Zend Optimizer. It's very easy to do, and will always provide some improvement. Once you've done that, profile your pages, to find out where the time is actually being spent.
More likely than not, you'll be spending most of your time waiting for the database to return results. So, at a bare minimum:
Work out which queries are taking the longest, and work on them first. Use your head though - a query that takes five seconds on an admin page that nobody looks at is not as important as a query that takes one second on the front page.
Make sure that your query uses appropriate indexes. No common query should ever need to do a full table scan. Certain kinds of sorting or grouping may be unable to use indexes - try to avoid them, or modify the query so that it can use indexes.
Make sure that your queries aren't using temporary tables.
Use the EXPLAIN keyword - it's very useful.
Tune the database server itself. MySQL is generally not optimized for performance.
Once you've done that, it's usually best to start working out how to use caching. The best way to speed PHP code up is to reduce the amount of work it has to do.
Make sure your database's query cache is working properly.
Use something like Memcached to store frequently used results, instead of getting them from the database.
If you have enough memory, try to keep everything in Memcached, resorting to the database only when something isn't present in the cache.
If you have chunks of pages that are dynamic, but the same for all users, try caching those chunks. For example, if two users are looking at an article, the article itself is going to be exactly the same for each user, even if the rest of the page isn't. Generate the HTML for the article, and chuck it in the cache.
If you have lots of non-authenticated users, it's entirely possible that they'll all be seeing the exact same page. Two non-authenticated users looking at the above article won't just see an identical article - they'll see an identical page, right down to the login links. Set your PHP scripts up so you can use HTTP caching headers (check the last modified date, and return a 304 Not Modified if it's not been changed). Once you've done that, stick a Squid reverse-proxy in front of the webserver, and let Squid serve pages out of it's cache.
After that point, the general approach is to start using more servers, and the problem becomes one of scaling, rather than raw speed. The general plan is to make sure that your website has a shared-nothing architecture - all persistent data is stored in the database. Then, you install multiple webservers, move the database server to a separate machine, and run the entire thing behind a caching reverse proxy. To add more capacity, you add more machines.
One way: php accelerators, e.g. APC.
Another; read blog articles, e.g. performance tuning overview.
A general question i would say. Try looking for optimazation tips online...
Several parameters are involved:
I/O access (using it a lot - file_exists, is_file overheads)
Database access (optimize queries, use stored procedures, check your db cache)
Using an opcode cache (like APC)
Compressing output
Serving js/css minified and compressed (and using subdomains to deliver them to the browser)
Using memcache to cache data into memory for faster access
You can use benchmarking tools to test your environment before and after the optimizations.
Try apache bench for example.
Filesize.
A file of 500 KB takes longer to download then a file of 300 KB. So optimize and crop as much as you can.
Accelators
Self explainable: List of PHP accelerators
Server upgrade
Though this costs money, when dealing with a lot of traffic, it will have impact on how fast the .php files gets processes and how fast data will be send to the user.
I don't recommend this though since there are other (free) ways to improve speed.
Don't user external resources
If you are linking some images trough other sites, the speed of the downloading will not be in your control. Instead, if you plan on using images from others download them to your own server first (or upload them to your own provider) and load them that way.
Review and improve your code
Find short cuts, remove unnecessary code, delete unused variables, reuse others etc.
There are other ways but I believe the above information has the most impact on your speed
You should probably do some search for existing answers to this question, however...
APC for opcode caching
Memcached for object storing (to reduce the number of database queries)
Check for / optimize slow SQL queries
Measure and find bottlenecks
Don't rely on (slow) web services on each page load, etc.
Yahoo has got some good basic advice on speeding up web pages, much of it very easy to implement. You may also want to download yslow + firebug for firefox; they will help indicate possible basic bottlenecks from a client request perspective.
The rest of the advice here is good, so I wont add much else other than; don't bother optimising any code until you're 100% sure that you've found a bottleneck. I can't stress that enough. Don't waste time tweaking code or implementing new things (ie caching) because you "feel" will make things quicker, act only on real evidence (ie performance profiling).