Whenever I needed to cache some information I relied on timestamps and MySQL, storing the data into a database and fetching it that way. I just read about APC.
APC is so much easier but is it worth converting my previous cache methods to switch to APC besides just less SQL's going through and cleaner code?
If you already have a database running and doing most of your things the first step to improve your performance is to peroperly tune the database. MySQL, properly configured, is very fast.
Obviously at some point in time it isn't fast enough anymore and one needs further caches. When caching one thing to consider is that your data might not be consistent anymore. Meaning that you might update data in your primary store (the database) but others stll read an outdated cache entry
Now you've mentoned APC as a possible solution: APC is two related but different things:
An opcode cache for the PHP scrip
A shared memorz cache for PHP user data
An opcode cache works by storing the compiled PHP script in memory. So when requesting a site the PHP interpreter doesn't have to read the file from disk and analyze the code but can directly execute it. This gives a major boost and is always a good thing.
A shared memory cache takes any PHP variable (well, there are a few exceptions ...) and stores it in shared memory in the system, so all PHP processes on the same machine might read it. So if you store the result of a database query inside APC you save time as access to shared memory is very fast compared to querying a database (sending the query to a different machine, parsing it, executing it, sending the result back ...) but as said in the begginning you have to mind that the data might be outdated. And also mind that all data is stored in memory. So depending on the amount of avilable RAM there are limitations in what can be stored. Another big downside of this is that the data is stored in memory only. This means whenerver the system goes down the cache will be empty and everything in there will be lost.
To answer literally to the question, yes. Mysql is not a cache, APC is, and thus, is better.
Mysql is an storage option to implement a cache on top of it, but you are implementing the cache with those timestamps you mention and whatever logic you are doing with them. APC is a complete implementation of a cache, both for data and for code.
Performance wise, accessing the local APC cache will always be infinitely faster than accessing a mysql database. Keyword there is local, APC is not distributed (as far as I know), so if you want to share your cache, you'll need an external cache system, such as memcached.
Generally, APC will be much, much faster than MySQL, so it's well worth the time to look into it and consider switching from one system to the other. And, as you mention, you will be firing less SQL queries to the database.
More information can be found via Google, I came across the following:
http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/
Related
Fictious Background
I get 100 hits a minute for "the hottest car"
The hottest car always changes by the minute and its currently: "A Pinto"
Everytime I receive what the current hottest car is, I save it into a MySQL Database.
Situation
Everytime I get a hit for "Whats the hottest car", I need to return the answer. I feel confident that retrieving the answer from a file vs. a DB will be faster and less work for the processor due to PHP storing the file into memory. My concern is that if I get a new file, how do I make sure i'm returning the information in the new file and not the old information stored in memory.
P.S. If my assumptions are wrong and there is a faster way, please let me know.
Thanks
Be careful with your assumptions. It can be tempting to assume that file access is faster, since you just have to read a file instead of connect to, query, and retrieve from a database. But bear in mind that databases are designed from the ground up for this kind of fast access to rapidly-changing information, and they have a lot of optimizations built in.
So look into caching, it is often a win, but I would not assume that file access is always faster. You can of course profile the different approaches to see if you have a bottleneck.
Your assumption is most probably going to be invalid. MySQL has a query cache which keeps your query in memory. Even if not I don't think you should but using the filesystem or unless you are using /dev/shm because that's mapped to memory. I would use a library like Cache_Lite to ease the pain of caching.
But if you want to make your site really fast you should install APC. You should always install APC if you want your website to be fast because caching the compiled bytecode of PHP scripts. Or use an in-memory databases redis or memcached, because these are even better in memory databases. redis is the easiest to install only using make and you don't need any ROOT permission either.
P.S: You should check out this redis tutorial because it is really powerful in-memory database.
I'm pretty sure it would be a bad idea to store that information in a file. The biggest problem is file read locks. If one person tries to get the file while another person is getting it, there's a conflict and a fatal error.
You really should go the database route, especially if you're planning on persisting the older "hottest cars". And if performance is a concern, you should look into PHP caching (see #Andrew's comment).
Instead of using a DB/file it should be the fastest way if you directly access the memory: http://www.php.net/manual/en/book.shmop.php
Memcache, memcache, memcache!
http://us2.php.net/memcached
I'm working on some old(ish) software in PHP that maintains a $cache array to reduce the number of SQL queries. I was thinking of just putting memcached in its place and I'm wondering whether or not to get rid of the internal caching. Would there still be a worthwihle performance increase if I keep the internal caching, or would memcached suffice?
According to this blog post, the PHP internal array is way faster than any other method:
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400
It seems likely that memcache (which is implemented on the metal) would be faster than some php interpreted caching scheme.
However: if it's not broken, don't fix it.
If you remove the custom caching code, you might have to deal with other code that depends on the cache. I can't speak for the quality of the code you have to maintain but it seems like one of those "probably not worth it" things.
Let me put it this way: Do you trust the original developer(s) to have written code that will still work if you rip out the caching? (I probably wouldn't)
So unless the existing caching is giving you problems I would recommend against taking it out.
There's an advantage in using memcache vs local caching if:
1) you have mulitple webservers running off the same database, and have memcache set up to run across multiple nodes
2) the database does not implement query result caching or is very slow to access
Otherwise, unless the caching code is very poor, you shouldn't expect to see much performance benefit.
HTH
C.
I'm running a php/mysql-driven website with a lot of visits and I'm considering the possibility of caching result-sets in shared memory in order to reduce database load.
However, right now MySQL's query cache is enabled and it seems to be doing a pretty good job since if I disable query caching, the use of CPU jumps to 100% immediately.
Given that situation, I dont know if caching result-sets (or even the generated HTML code) locally in shared memory with PHP will result in any noticeable performace improvement.
Does anyone out there have any experience on this matter?
PS: Please avoid suggesting heavy-artillery solutions like memcached. Right now I'm looking for simple solutions that dont require too much time to implement, deploy and maintain.
Edit:
I see my comment about memcached deviated answers from the actual point, which is whether caching DB queries in the application layer would result in a noticeable performace impact considering that the result of those queries are already being cached at the DB level.
I know you didn't want to hear about memcached, but it is one of the best solutions for what you're trying to do. Depending on your site usage, there can be massive improvements in performance. By simply using memcached's session handler over my database session handler, I was able to cut the load in half and cut back on request serving times by over 30%.
Realistically, memcached is a simple solution. It's already integrated with PHP (if you have the extension loaded), and it requires virtually no configuration (I simply had to add memcached as a service on my linux box, which is done in one or two shell commands).
I would suggest storing session data (and anything that lends itself to caching) in memcache. For dynamic pages (such as stack overflow homepage), I would recommend caching output for a couple of seconds to prevent flooding.
A decent single box solution is file-based caching, but you have to sweep them out manually. Other than that, you could use APC, which is very fast and in-memory (still have to expire them yourself though).
As soon as you scale past one web server, though, you're going to need a shared cache, which is memcached. Why are you so adamant about not deploying this? It's not hard, and it's just going to save you time down the road. You can either start using memcache now and be done with it, or you could use one of the above methods for now and then end up switching to memcache later anyways, resulting in even more work. Plus too, you don't have to deal with running a cronjob or some other ugly hack to get cache expiration features: it does that for you.
The mysql query cache is nice, but it's not without issues. One of the big ones is it expires automatically every time the source data is changed, which you probably don't want.
I'm working on a PHP content management system and, in testing, have noticed that quite a few of the system's MySQL tables are queried on almost every page but are very rarely written to. What I'm wondering is will this start to weigh heavily on the database as site traffic increases, and how can I solve/prevent this?
My initial thoughts were to start storing some of the more static data in files (using PHP serialization) but does this actually reduce server load? What I'm worried about is that I'd be simply transferring the high load from the database to the file system!
If somebody could clue me in on the better approach, that would be great. In case the volume of data itself has a large effect, I've detailed some of the data I'll be storing below:
Full list of Countries (including ISO country codes)
Site options (skin, admin email, support URLs etc.)
Usergroups (including permissions)
You have to remember that reading a table from a database on a powerful server and on a fast connection is likely to be faster than reading it from disk on your local machine. The database will cache the entirety of these small, regularly accessed tables in memory.
By implementing the same functionality yourself in the file system, there is only a small possible speed up, but a huge chance to mess it up and make it slower.
It's probably best to stick with using the database.
Optimize your queries (using mysql slow query log) and EXPLAIN function.
If tables are really rarely written to you can use native MySQL caching. You have nothing to change in you code, just enable mysql caching in my.conf.
Try out using template engine like Smarty (smarty.net). It has it's own caching system that works pretty well and will REALLY reduce server load.
You can also use Memcache, but it is really worth using only with really high load websites. (I think that Smarty will be enough.)
Databases are much better at handling large data volumes than the native file system.
Don't worry about optimizing your site to reduce server load, until you actually have a server load problem. :-)
The tables you mentioned (countries and users) will normally be cached in memory by MySQL directly unless you are expecting quite a few millions of records in these tables.
In case where these tables will not fit in memory, you may want to consider a general-purpose distributed memory caching system, such as memcached.
If your database is properly indexed, it will be much faster to query data from the database. If you want to speed that up, look into memcached or similar.
Databases are exactly for this purpose.. To store and provide data. Filesystem is for scripts and programming.
If you encounter load problems, consider using Memcached or another utility for database.
You may also consider trying to cache different parts of your page directly into database as whole sections (eg. a sidebar, that doesn't change too much, generated header section, ..)
you could cache output (flush(), ob_flush() etc.) to a file and include that instead of having multiple MySQL reads. caching is definitely faster than accessing MySQL multiple time.
reading a static file is much faster than adding overhead via php and mysql processing.
You need to evaluate the performance via load testing to avoid prematurely optimising.
It would be foolish and quite possibly increase overall load to store data in files with serialization, databases are really good at retrieving data.
If after analysis there is a true performance hit (which I doubt unless you are talking about massive loading), then caching is a better solution.
It's more important to have a well designed system that facilitates changes as needs arise.
Here's a link to a couple script that will essentially do what dusoft is talking about and cache the output buffer to a file:
http://www.addedbytes.com/articles/caching-output-in-php/
Used this way, it's more of a bolt-on-after-the-fact type of solution, but this same behavior can certainly be implemented in a more integrated fashion if considered earlier in the process. Many frameworks also have this kind of thing built in.
I am using memcache for cacheing objects, but would like to add in addition an opcode accelerator like APC. Since they both involve cacheing, I am not sure if they will be "stepping on each others toes", i.e. I am not sure if memcache is already an OP code accelerator.
Can someone clarify? I would like to use them both - bit for different things. memcache for cacheing my objects and APC for code acceleration
Memcache is more along the lines of a distributed object cache vs something like APC or XCache, which stores PHP bytecode in memory so you avoid having to parse it each time. Their main purposes are different.
For example, if you had a very CPU intensive database query that people often requested, you could cache the resulting object in memcache and then refer to it instead of re-running that query all the time.
APC & XCache do have similar object caching features, but you are limited to the host machine. What if you wanted 10 different servers to all have access to that one object without having to re-do the query for each server? You'd just direct them to your memcache server and away you go. You still get a benefit if you only have a single server because using memcache will help you scale in the future if you need to branch out to more boxes.
The main thing to consider is if you think your app is going to need to scale. Memcache has more overhead since you have to use a TCP connection to access it, versus just a function call for APC/Xcache shared objects.
However, Memcache has the following benefits:
Faster than the disk or re-running query.
Scales to multiple servers.
Works with many different languages, your objects are not locked into PHP + APC/Xcache only.
All processes/languages have access to the same objects, so you don't have to worry if your PHP child processes have an empty object cache or not. This may not be as big a deal if you're running PHP-FPM though.
In most cases, I would recommend caching your objects in memcache as it's not much harder & is more flexible for the future.
Keep in mind that this is only regarding caching objects. Memcache does NOT have any bytecode or PHP acceleration features, which is why I would run it side-by-side with APC or Xcache
yes you can use them both together at the same time.