Think you are the proud owner of Facebook, then
which data you want to store in app layer [memcached/ APC] and which data in MySQL cache ?
Please explain also why you think so.
[I want to have an idea on which data to cache where]
For memcache, store session data. You have to typically query from a large table or from the filesystem to get it, depending on how it's stored. Putting that on memory removes hitting the disk for a relatively small amount data (that is typically critical to one's web application).
For your database cache, put stuff in there that is not changing so often. We're talking about wall posts, comments, etc. They are queried a lot and rarely change, all things considered. You may also want to consider doing a flat file cache, so you can purge individual files with greater ease, and divide it up as you see fit.
I generally don't directly cache any arbitrary data with APC, usually I will just let it cache stuff automatically and get lessened memory loads.
This is only one way to do it, but as far as the industry goes, this is a somewhat well-used model.
Related
I am currently programming a php site, which atm needs to query a large amount of data (about 4 - 5MB) everytime. I already have a session going and wanted to ask, if its good practice to store that data in the session variable?
The current plan is to also maintain a table in the Database containing when a table has changed last. If that timestamp would be newer, then the data would be queried again, if not, use the data of the session variable as its still consistent...
Is this a good way to avoid querying too much data? And what speed impacts would the site have when a session is about 5MB in size?
Thanks in advance!
It's not really good practice (it will make PHP chew far more memory than it really should), but I'm not sure how it will affect performance.
I suppose the real question is this: Why do you need to store so much in the session? If it's information that is meant to be accessible between sessions, then you should be storing it in a database and loading it 'at need'.
If it's binary data (images, files, etc.) that are only relevant while the session is valid, then store it in a temporary file for the user (look at tempnam() and sys_get_temp_dir()), then store the temporary filename on the session.
No, it's not good practice to do this.
Points to consider:
By defailt, the session data is stored on disk in a temp folder. Every time you call session_start() (ie every page load), it will have to load the whole of that data into memory and populate it into the session array. If you're loading large amounts of data, this could have performance implications.
Also, since you're loading this large chunk of data every time, it means that each page load will take more memory. This reduces the number of concurrent users that your server can support.
If you're doing this for caching purposes to reduce hits to your DB, there are much better solutions available. APCu, Memcache, Redis and others can all do a much better job of caching your data than your proposed custom-written solution. There are also wrapper libraries available that make it even easier and allow you to mix and match between caching solutions. If you're using a framework like Laravel or Symphony, there may be caching classes built into your framework. Alternatively, you could try a stand-alone library like phpFastCache. But also, don't forget that modern DB engines have their own caching mechanisms built in, so repeated calls to the same or similar queries should be reasonbly fast anyway.
Users in my webgame are having certain player information cached in the $_SESSION of PHP.
Each time they load the game it checks if the session exists, if not they get the player information from a MySQL database and then it gets stored in the $_SESSION.
Now my problem is, what if the player information gets updated by another process or player? They can't update the $_SESSION cache of the other player.
I know memcached is most probably the solution for this, but I'm not sure if I should take the time for something like this. $_SESSION cache is doing well for me, except for this.
I was thinking about creating a MySQL table for it which get read at every request and if there's a record for the player that it recreates the cache.
One other solution would be to create a file in a directory with the id of the player in the name of the file. Every request PHP will check with file_exist if it should clear the cache or not.
What would you guys do? It gets executed every request so it's pretty important to get this optimized.
From a design standpoint alone I'd avoid the file_exists and directory approach. Sure 'file_exists' is fast, but it won't scale well... What happens if a use changes their name?
If you're using APC (and you should) you could APC's user memory cache. As long as you're on a single server it should give you similar performance benifits as memcached without the need for a separate memory caching server process. If a user entry changes frequently, you could run into fragmemntation issues with APC though. In that case, time to bite the bullet and go with memcached--you can even store your session data in memcached for a performance boost.
Also, neither APC or your file_exists solution will scale to multiple load balanced servers--you'd need a DB solution or memcached for that.
The way you exposed it, is not about how fast is one vs the other, the SESSION approach is just not valid because of your concurrency issue.
If your data can change concurrently, then your data storage needs to be able to handle that concurrency and whatever caching layer you want to use needs to behave accordingly to the nature of your problem.
If it is only about cache, and you dont want to install memcache(d), you can go with a mysql table in memory. It is not as fast as memcached, but still a fine solution. And make sure to create proper indexes on all your tables (maybe that is the better solution, no cache, just select it from your table).
CREATE TABLE t (i INT) ENGINE = MEMORY;
I am considering enabling Memcache support for my large-scale REST service. However I have some questions regarding best approaches for these key-value stores.
The setup:
A database wrapper which has functions for select, update and etc.
A REST framework which contains all the API functions (getUser, createUser and etc.)
In my head, the ideal approach would be to integrate the Memcache in the database wrapper so, for example, every SQL query would get md5-hashed and saved in the cache (this is btw what most online resources suggests). However, there is obviously a problem with this approach: if a search query has been cached, and one of the users from the search result has been updated after the cached result, this wont reflect in the next request (because it is now in the cache).
As I see it I have several ways of handeling this:
Implement the Memcache in the REST framework for each function (getUser, createUser etc) and thereby explicit handle the updating of the cache etc. if users gets updated. This could end up in redundant code.
Let the cached values expire very quickly and live with the fact that some requests shows old cached values.
Do a more advanced implementation of the Memcache in the database wrapper so that I can identify which parts(e.g. users) to update in e.g. a search request.
Could you guide me to which of the following, or a complete another approach, to take?
Thanks in advance.
Enabling cache for a web application is not something to take lightly.
Maybe you have done that already bit... I recommend you first come up with a goal based on business needs or forcast (ex: must accept 1000 requests per seconds) then properly stress-test your system to have numbers before you start changing anything and then identify your bottleneck.
http://en.wikipedia.org/wiki/Performance_tuning
I usually use profiling tools such as HXProf (by facebook).
https://github.com/facebook/xhprof
Caching all your data to mirror your database might not be the best approach.
Find out how big you can allocate for your cache. If your architecture only allow you to allocate 100MB for your memcache, then it will affect your decision about what you cache and how long you cache it.
The best cache is to cache forever. But we all know that data changes. You can start by caching data that is requested often and requires the most resources to fetch.
Always try to make sure you are not working on improving something that will get you low improvement.
Without understanding your architecture in depth, it would be hazardous for anyone to recommend a caching strategy that best fit your needs.
Maybe you should cache the resutling output of your web services instead? Using a reverse proxy for example (What #Darrel is talking about) or using output buffering...
http://en.wikipedia.org/wiki/Reverse_proxy
http://php.net/manual/en/book.outcontrol.php
Optimize your database queries before you think about caching. Make sure your use a PHP Op cache (like APC) and all those things that are standard practice.
http://phplens.com/lens/php-book/optimizing-debugging-php.php
http://blog.digitalstruct.com/2008/01/31/performance-tuning-overview/
If you want to cache data and prevent stale/old data from being served, the trick is to identify your data (primary key maybe?) and when the data is updated or deleted, you delete or update the cache for that identifyer.
<?php
// After inserting into DB, you can also put it in the cache
$memcache->set($userId, $userData);
// After updating or deleting the user, you update or delete the data
$memcache->delete($userId);
A lot of site will show stale data. When I am on stackoverflow and my reputation is increased and then I got in the stackoverflow chat, the reputation shown is my old reputation. When I got a reputation of 20 (reputation required to chat) I still could not chat for another 5 minutes because the chat system had my old reputation data and did not yet know my reputation had increased enough to allow me to chat. Some data can be stale while other type of data should never be stale. Consider that when caching data.
Conclusion
Your approaches can all be valid depending on the factors that I talk about above. In fact, you can use a combination of those for all the different type of data you want to cache and how long it is acceptable to show old data for them. Maybe the categories or list of countries (since they do not change often) can be cached for a long time while the reputation (or whatever data changes all the time for all users) should be cached for a short period only.
I don't really have any experience with caching at all, so this may seem like a stupid question, but how do you know when to cache your data? I wasn't even able to find one site that talked about this, but it may just be my searching skills or maybe too many variables to consider?
I will most likely be using APC. Does anyone have any examples of what would be the least amount of data you would need in order to cache it? For example, let's say you have an array with 100 items and you use a foreach loop on it and perform some simple array manipulation, should you cache the result? How about if it had a 1000 items, 10000 items, etc.?
Should you be caching the results of your database query? What kind of queries should you be caching? I assume a simple select and maybe a couple joins statement to a mysql db doesn't need caching, or does it? Assuming the mysql query cache is turned on, does that mean you don't need to cache in the application layer, or should you still do it?
If you instantiate an object, should you cache it? How to determine whether it should be cached or not? So a general guide on what to cache would be nice, examples would also be really helpful, thanks.
When you're looking at caching data that has been read from the database in APC/memcache/WinCache/redis/etc, you should be aware that it will not be updated when the database is updated unless you explicitly code to keep the database and cache in synch. Therefore, caching is most effective when the data from the database doesn't change often, but also requires a more complex and/or expensive query to retrieve that data from the database (otherwise, you may as well read it from the database when you need it)... so expensive join queries that return the same data records whenever they're run are prime candidates.
And always test to see if queries are faster read from the database than from cache. Correct database indexing can vastly improve database access times, especially as most databases maintain their own internal cache as well, so don't use APC or equivalent to cache data unless the database overheads justify it.
You also need to be aware of space usage in the cache. Most caches are a fixed size and you don't want to overfill them... so don't use them to store large volumes of data. Use the apc.php script available with APC to monitor cache usage (though make sure that it's not publicly accessible to anybody and everybody that accesses your site.... bad security).
When holding objects in cache, the object will be serialized() when it's stored, and unserialized() when it's retrieved, so there is an overhead. Objects with resource attributes will lose that resource; so don't store your database access objects.
It's sensible only to use cache to store information that is accessed by many/all users, rather than user-specific data. For user session information, stick with normal PHP sessions.
The simple answer is that you cache data when things get slow. Obviously for any medium to large sized application, you need to do much more planning than just a wait and see approach. But for the vast majority of websites out there, the question to ask yourself is "Are you happy with the load time". Of course if you are obsessive about load time, like myself, you are going to want to try to make it even faster regardless.
Next, you have to identify what specifically is the cause of the slowness. You assumed that your application code was the source but its worth examining if there are other external factors such as large page file size, excessive requests, no gzip, etc. Use a site like http://tools.pingdom.com/ or an extension like yslow as a start for that. (quick tip make sure keepalives and gzip are working).
Assuming the problem is the duration of execution of your application code, you are going to want to profile your code with something like xdebug (http://www.xdebug.org/) and view the output with kcachegrind or wincachegrind. That will let you know what parts of your code are taking long to run. From there you will make decisions on what to cache and how to cache it (or make improvements in the logic of your code).
There are so many possibilities for what the problem could be and the associated solutions, that it is not worth me guessing. So, once you identify the problem you may want to post a new question related to solving that specific problem. I will say that if not used properly, the mysql query cache can be counter productive. Also, I generally avoid the APC user cache in favor of memcached.
I am wondering if it is viable to store cached items in Session variables, rather than creating a file-based caching solution? Because it is once per user, it could reduce some extra calls to the database if a user visits more than one page. But is it worth the effort?
If the data you are caching (willing to cache) does not depend on the user, why would you store in the session... which is attached to a user ?
Considering sessions are generally stored in files, it will not optimise anything in comparaison of using files yourself.
And if you have 10 users on the site, you will have 10 times the same data in cache ? I do not think this is the best way to cache things ;-)
For data that is the same fo all users, I would really go with another solution, be it file-based or not (even for data specific to one user, or a group of users, I would probably not store it in session -- except if very small, maybe)
Some things you can look about :
Almost every framework provides some kind of caching mecanism. For instance :
PEAR::Cache_Lite
Zend_Cache
You can store cached data using lots of backend ; for instance :
files
shared memory (using something like APC, for example)
If you have several servers and loads of data, memcached
(some frameworks provide classes to work with those ; switching from one to the other can even be as simple as changing a couple of lines in a config file ^^ )
Next question is : what do you need to cache ? For how long ? but that's another problem, and only you can answer that ;-)
It can be, but it depends largely on what you're trying to cache, as well as some other circumstances.
Is the information likely to change?
Is it a problem if slightly outdated information is shown?
How heavy is the load the query imposes on the database?
What is the latency to the database server? (shouldn't be an issue on local network)
Should the information be cached on a per user basis, or globally for the entire application?
Amount of data involved
etc.
Performance gain can be significant in some cases. On a particular ASP.NET / SQL Server site I've worked on, adding a simple caching mechanism (at application level) reduced the CPU load on the web server by a factor 3 (!) and at the same time prevented a whole bunch of database timeout issues when accessing a certain table.
It's been a while since I've done anything serious in PHP, but I think your only option there is to do this at the session level. Most of my considerations above are still valid however. As for effort; it should take very little effort to implement, assuming your code is sufficiently structured.
Session should only really be used strictly for user specific data. If you're using it to cache things that should be common across multiple sessions, you're duplicating a lot of data needlessly. Why not just use the Cache that comes with ASP.NET (you can use inProcess, rather than SQL if your concern is DB roundtrips, since you'll be storing Cached data in memory)