Im using Zend_cache to cache results of some complicated db queries, services etc.
My site is social, that means that there is a lot of user interaction.
Im able to cache users data here and there as well. But taht means, that i will have nearly tens of thousands cache files (with 10 000 users). Is this approach to cache almost everything coming from db still good for performance? Or there are some limits of filesystem?
Was looking for some article around, didnt find.
Thanks for an advice!
JarouĊĦek
The question you should be asking is if the overhead of creating/populating/maintaining that cache exceeds the cost of generating the cacheable data in the first place.
If it costs you $1 to generate some data, $10 to cache it, and $0.8 to retrieve from cache, then you'd have to be able to retrieve that data from cache 50 times to break even.
If you only access the cached data 10 times before it expires/invalidates, then you're losing $8.
Related
I'm very close to finishing my application. I'm currently using file cache on laravel to cache basically all of the data that is saved in my tables every time a record is saved, updated or deleted. These tables over a period of time perhaps in 3 years or less will have over 2 million records. I was wondering if there are any pitfalls that I need to be aware of in file caching all of my records. Does anyone foresee a problem with file-caching hundreds of thousands of records over a period of time?
Currently, the way how my cache system works is when I save a new record, delete or make an update to a record it resets the cache for the table/record in question. Each of my tables/and most records has it's own cache. Does anyone foresee any problems with this design with a very large database?
Regards,
Darren
You have millions of data, which is a bit concerning if the application has high hits per second. In that case, more hits mean increased disk IO. Any bottleneck in the disk IO should impact the overall performance of the application.
From my personal experience, the best approach is to decide which data should be cached and which should not, what layers and architecture of caching should be used, etc. For example, you may not want to cache any data that is very dynamic. For instance, user carts, balance, view count, etc. As these data are always changing, caching them usually results in (close to) doubling the resource intake rather than increasing the actual throughput in practical scenarios. If a set of data is rarely fetched, it may not need caching at all.
On the other hand, if some part of your data has a very high hit ratio, for instance, home page elements, top posts, best-selling products, etc. then those should be cached using a fast and high availability caching mechanism such as object stores. In this way, high demand can be met in case of thousands of concurrent hits for those data without impacting the disk IO or database performance.
Simply put, different segments of the application data may or may not need different approaches for caching.
I'm programming a reddit-like website.
The user can display items from categories of its choice.
For this I'm querying a JOIN of the categories he subscribed to and the items.
Hardcore query
First solution : store the data on disk in a "categories_1-2-4-7-10.json" and serve it to the users browsing the same categories.
Cons : takes space on disk, heavy load.
I'm thinking about a new solution : views. But I don't really know how they work, do they regenerate often enough to be a heavyload on the server?
View would let me query data that already has been JOINED
Further : I'm only making a view for the frontpage items. I don't need to optimize later pages as they're not as frequently accessed.
It's a bad idea to store things to disk and then load them for a site. Disk operations are insanely slow compared to in memory operations.
You can still store JSON documents, but consider storing them in a caching layer.
Something like Redis, which is the new hotness these days (http://redis.io/) or Couchbase (http://www.couchbase.com/)
Store everything in memory and the site will be much faster.
As far as how often to regenerate your views ... a good idea is to give them an expiration time. Read about how that might work with caching in general. You would set a category view to exist in the cache for maybe 1 minute. After a minute the item leaves memory and you make a database query to put a newer version back in. Rinse and repeat.
I'm hoping to develop a LAMP application that will centre around a small table, probably less than 100 rows, maybe 5 fields per row. This table will need to have the data stored within accessed rapidly, maybe up to once a second per user (though this is the 'ideal', in practice, this could probably drop slightly). There will be a number of updates made to this table, but SELECTs will far outstrip UPDATES.
Available hardware isn't massively powerful (it'll be launched on a VPS with perhaps 512mb RAM) and it needs to be scalable - there may only be 10 concurrent users at launch, but this could raise to the thousands (and, as we all hope with these things, maybe 10,000s, but this level there will be more powerful hardware available).
As such I was wondering if anyone could point me in the right direction for a starting point - all the data retrieved will be the same for all users, so I'm trying to investigate if there is anyway of sharing this data across all users, rather than performing 10,000 identical selects a second. Soooo:
1) Would the mysql_query_cache cache these results and allow access to the data, WITHOUT requiring a re-select for each user?
2) (Apologies for how broad this question is, I'd appreciate even the briefest of reponses greatly!) I've been looking into the APC cache as we already use this for an opcode cache - is there a method of caching the data in the APC cache, and just doing one MYSQL select per second to update this cache - and then just accessing the APC for each user? Or perhaps an alternative cache?
Failing all of this, I may look into having a seperate script which handles the queries and outputs the data, and somehow just piping this one script's data to all users. This isn't a fully formed thought and I'm not sure of the implementation, but perhaps a combo of AJAX to pull the outputted data from... "Somewhere"... :)
Once again, apologies for the breadth of these question - a couple of brief pointers from anyone would be very, very greatly appreciated.
Thanks again in advance
If you're doing something like an AJAX chat which polls the server constantly, you may want to look at node.js instead, which keeps an open connection between server and browser. This way, you can have changes pushed to the user when they happen and you won't need to do all that redundant checking once per second. This can scale very well to thousands of users and is written in javascript on the server-side, so not too difficult.
The problem with using the MySQL cache is that the entire table cache gets invalidated on any write to that table. You're better off using a caching solution like memcached or APC if you're trying to control that behavior more precisely. And yes, APC would be able to cache that information.
One other thing to keep in mind is that you need to know when to invalidate the cache as well, so you don't have stale data.
You can use apc,xcache or memcache for database query caching or you can use vanish or squid for gateway caching...
Think you are the proud owner of Facebook, then
which data you want to store in app layer [memcached/ APC] and which data in MySQL cache ?
Please explain also why you think so.
[I want to have an idea on which data to cache where]
For memcache, store session data. You have to typically query from a large table or from the filesystem to get it, depending on how it's stored. Putting that on memory removes hitting the disk for a relatively small amount data (that is typically critical to one's web application).
For your database cache, put stuff in there that is not changing so often. We're talking about wall posts, comments, etc. They are queried a lot and rarely change, all things considered. You may also want to consider doing a flat file cache, so you can purge individual files with greater ease, and divide it up as you see fit.
I generally don't directly cache any arbitrary data with APC, usually I will just let it cache stuff automatically and get lessened memory loads.
This is only one way to do it, but as far as the industry goes, this is a somewhat well-used model.
I don't really have any experience with caching at all, so this may seem like a stupid question, but how do you know when to cache your data? I wasn't even able to find one site that talked about this, but it may just be my searching skills or maybe too many variables to consider?
I will most likely be using APC. Does anyone have any examples of what would be the least amount of data you would need in order to cache it? For example, let's say you have an array with 100 items and you use a foreach loop on it and perform some simple array manipulation, should you cache the result? How about if it had a 1000 items, 10000 items, etc.?
Should you be caching the results of your database query? What kind of queries should you be caching? I assume a simple select and maybe a couple joins statement to a mysql db doesn't need caching, or does it? Assuming the mysql query cache is turned on, does that mean you don't need to cache in the application layer, or should you still do it?
If you instantiate an object, should you cache it? How to determine whether it should be cached or not? So a general guide on what to cache would be nice, examples would also be really helpful, thanks.
When you're looking at caching data that has been read from the database in APC/memcache/WinCache/redis/etc, you should be aware that it will not be updated when the database is updated unless you explicitly code to keep the database and cache in synch. Therefore, caching is most effective when the data from the database doesn't change often, but also requires a more complex and/or expensive query to retrieve that data from the database (otherwise, you may as well read it from the database when you need it)... so expensive join queries that return the same data records whenever they're run are prime candidates.
And always test to see if queries are faster read from the database than from cache. Correct database indexing can vastly improve database access times, especially as most databases maintain their own internal cache as well, so don't use APC or equivalent to cache data unless the database overheads justify it.
You also need to be aware of space usage in the cache. Most caches are a fixed size and you don't want to overfill them... so don't use them to store large volumes of data. Use the apc.php script available with APC to monitor cache usage (though make sure that it's not publicly accessible to anybody and everybody that accesses your site.... bad security).
When holding objects in cache, the object will be serialized() when it's stored, and unserialized() when it's retrieved, so there is an overhead. Objects with resource attributes will lose that resource; so don't store your database access objects.
It's sensible only to use cache to store information that is accessed by many/all users, rather than user-specific data. For user session information, stick with normal PHP sessions.
The simple answer is that you cache data when things get slow. Obviously for any medium to large sized application, you need to do much more planning than just a wait and see approach. But for the vast majority of websites out there, the question to ask yourself is "Are you happy with the load time". Of course if you are obsessive about load time, like myself, you are going to want to try to make it even faster regardless.
Next, you have to identify what specifically is the cause of the slowness. You assumed that your application code was the source but its worth examining if there are other external factors such as large page file size, excessive requests, no gzip, etc. Use a site like http://tools.pingdom.com/ or an extension like yslow as a start for that. (quick tip make sure keepalives and gzip are working).
Assuming the problem is the duration of execution of your application code, you are going to want to profile your code with something like xdebug (http://www.xdebug.org/) and view the output with kcachegrind or wincachegrind. That will let you know what parts of your code are taking long to run. From there you will make decisions on what to cache and how to cache it (or make improvements in the logic of your code).
There are so many possibilities for what the problem could be and the associated solutions, that it is not worth me guessing. So, once you identify the problem you may want to post a new question related to solving that specific problem. I will say that if not used properly, the mysql query cache can be counter productive. Also, I generally avoid the APC user cache in favor of memcached.