How does memcached behave when data has changed? - php

Reading this brief example of using memcached with PHP, I was wondering how memcached knows when a request for data needs to actually come from the database instead of coming from the cache.

It doesn't. It comes down to your caching strategy. That is so with all forms of cache, a tradeoff between getting the latest data and getting some data quickly. If you need to have the data up to date, invalidate (delete) the cache when updating the original. If performance is more important, let the cache expire by itself, at which point it will be renewed. Or something somewhere in-between. It depends on your restrictions and goals.

It doesn't, you code does this. In most cases you will do something like this:
key = /* build cache key somehow */
data = memcache.get(key)
if data is null:
data = /* read data from database */
cached.set(key, data)
// now you can use the data

I think you need to program that logic.
e.g. When you update the database then update the memcached value associated with that key, or make that key expire.

Related

Handling huge amount of MySQL connection

I have a JS script that does one simple thing - an ajax request to my server. On this server I establish a PDO connection, execute one prepared statement:
SELECT * FROM table WHERE param1 = :param1 AND param2 = :param2;
Where table is the table with 5-50 rows, 5-15 columns, with data changing once each day on average.
Then I echo the json result back to the script and do something with it, let's say I console log it.
The problem is that the script is run ~10,000 times a second. Which gives me that much connections to the database, and I'm getting can't connect to the database errors all the time in server logs. Which means sometimes it works, when DB processes are free, sometimes not.
How can I handle this?
Probable solutions:
Memcached - it would also be slow, it's not created to do that. The performance would be similar, or worse, to the database.
File on server instead of the database - great solution, but the structure would be the problem.
Anything better?
For such a tiny amount of data that is changed so rarely, I'd make it just a regular PHP file.
Once you have your data in the form of array, dump it in the php file using var_export(). Then just include this file and use a simple loop to search data.
Another option is to use Memcached, which was created exactly this sort of job and on a fast machine with high speed networking, memcached can easily handle 200,000+ requests per second, which is high above your modest 10k rps.
You can even eliminate PHP from the tract, making Nginx directly ask Memcached for the stored valaues, using ngx_http_memcached_module
If you want to stick with current Mysql-based solution, you can increase max_connections number in mysql configuration, however, making it above 200 would may require some OS tweaking as well. But what you should not is to make a persistent connection, that will make things far worse.
You need to leverage a cache. There is no reason at all to go fetch the data from the database every time the AJAX request is made for data that is this slow-changing.
A couple of approaches that you could take (possibly even in combination with each other).
Cache between application and DB. This might be memcache or similar and would allow you to perform hash-based lookups (likely based on some hash of parameters passed) to data stored in memory (perhaps JSON representation or whatever data format you ultimately return to the client).
Cache between client and application. This might take the form of web-server-provided cache, a CDN-based cache, or similar that would prevent the request from ever even reaching your application given an appropriately stored, non-expired item in the cache.
Anything better? No
Since you output the same results many times, the sensible solution is to cache results.
My educated guess is your wrong assumption -- that memcached is not built for this -- is based off you planning on storing each record separately
I implemented a simple caching mechanism for you to use :
<?php
$memcached_port = YOUR_MEMCACHED_PORT;
$m = new Memcached();
$m->addServer('localhost', $memcached_port);
$key1 = $_GET['key1'];
$key2 = $_GET['key2'];
$m_key = $key1.$key2; // generate key from unique values , for large keys use MD5 to hash the unique value
$data = false;
if(!($data = $m->get($m_key))) {
// fetch $data from your database
$expire = 3600; // 1 hour, you may use a unix timestamp value if you wish to expire in a specified time of day
$m->set($m_key,$data,$expire); // push to memcache
}
echo json_encode($data);
What you do is :
Decide on what signifies a result ( what set of input parameters )
Use that for the memcache key ( for example if it's a country and language the key would be $country.$language )
check if the result exists:
if it does pull the data you stored as an array and output it.
if it doesn't exist or is outdated :
a. pull the data needed
b. put the data in an array
c. push the data to memcached
d. output the data
There are more efficient ways to cache data, but this is the simplest one, and sounds like your kind of code.
10,000 requests/second still don't justify the effort needed to create server-level caching ( nginx/whatever )
In an ideally tuned world a chicken with a calculator would be able to run facebook .. but who cares ? (:

What is faster? File_exist or MySQL query?

Users in my webgame are having certain player information cached in the $_SESSION of PHP.
Each time they load the game it checks if the session exists, if not they get the player information from a MySQL database and then it gets stored in the $_SESSION.
Now my problem is, what if the player information gets updated by another process or player? They can't update the $_SESSION cache of the other player.
I know memcached is most probably the solution for this, but I'm not sure if I should take the time for something like this. $_SESSION cache is doing well for me, except for this.
I was thinking about creating a MySQL table for it which get read at every request and if there's a record for the player that it recreates the cache.
One other solution would be to create a file in a directory with the id of the player in the name of the file. Every request PHP will check with file_exist if it should clear the cache or not.
What would you guys do? It gets executed every request so it's pretty important to get this optimized.
From a design standpoint alone I'd avoid the file_exists and directory approach. Sure 'file_exists' is fast, but it won't scale well... What happens if a use changes their name?
If you're using APC (and you should) you could APC's user memory cache. As long as you're on a single server it should give you similar performance benifits as memcached without the need for a separate memory caching server process. If a user entry changes frequently, you could run into fragmemntation issues with APC though. In that case, time to bite the bullet and go with memcached--you can even store your session data in memcached for a performance boost.
Also, neither APC or your file_exists solution will scale to multiple load balanced servers--you'd need a DB solution or memcached for that.
The way you exposed it, is not about how fast is one vs the other, the SESSION approach is just not valid because of your concurrency issue.
If your data can change concurrently, then your data storage needs to be able to handle that concurrency and whatever caching layer you want to use needs to behave accordingly to the nature of your problem.
If it is only about cache, and you dont want to install memcache(d), you can go with a mysql table in memory. It is not as fast as memcached, but still a fine solution. And make sure to create proper indexes on all your tables (maybe that is the better solution, no cache, just select it from your table).
CREATE TABLE t (i INT) ENGINE = MEMORY;

How does memcache with MySQL work?

I am trying to understand (and probably deploy) memcached in our env.
We have 4 web servers on loadbalancer running a big web app developed in PHP. We are already using APC.
I want to see how memcached works? At least, may be I don't understand how caching works.
We have some complex dynamic queries that combine several tables to pull data. Each time, the data is going to be from different client databases and data keeps changing. From my understanding, if some data is stored in cache, and if the request is same next time, the same data is returned. (Or I may be completely wrong here).
How does this whole memcache (or for that matter, any caching stuff works)?
Cache, in general, is a very fast key/value storage engine where you can store values (usually serialized) by a predetermined key, so you can retrieve the stored values by the same key.
In relation to MySQL, you would write your application code in such a way, that you would check for the presence of data in cache, before issuing a request to the database. If a match was found (matching key exists), you would then have access to the data associated to the key. The goal is to not issue a request to the more costly database if it can be avoided.
An example (demonstrative only):
$cache = new Memcached();
$cache->addServer('servername', 11211);
$myCacheKey = 'my_cache_key';
$row = $cache->get($myCacheKey);
if (!$row) {
// Issue painful query to mysql
$sql = "SELECT * FROM table WHERE id = :id";
$dbo->prepare($sql);
$stmt->bindValue(':id', $someId, PDO::PARAM_INT);
$row = $stmt->fetch(PDO::FETCH_OBJ);
$cache->set($myCacheKey, serialize($row));
}
// Now I have access to $row, where I can do what I need to
// And for subsequent calls, the data will be pulled from cache and skip
// the query altogether
var_dump(unserialize($row));
Check out PHP docs on memcached for more info, there are some good examples and comments.
There are several examples on how memcache works. Here is one of the links.
Secondly, Memcache can work with or without MySQL.
It caches your objects which are in PHP, now whether it comes from MySQL, or anywhere else, if its an PHP Object, it can be stored in MemCache.
APC gives you some more functionality than Memcache. Other than storing/caching PHP objects, it also caches PHP-executable-machine-readable-opcodes so that your PHP files won't go through the processes of loading in memory-> Being Comiled, rather, it directly runs the already compiled opcode from the memory.
If your data keeps changing(between requests) then caching is futile, because that data is going to be stale. But most of the times(I bet even in your cache) multiple requests to database result in same data set in which case a cache(in memory) is very useful.
P.S: I did a quick google search and found this video about memcached which has rather good quality => http://www.bestechvideos.com/2009/03/21/railslab-scaling-rails-episode-8-memcached. The only problem could be that it talks about Ruby On Rails(which I also don't use that much, but is very easy to understand). Hopefully it is going to help you grasp the concept a little better.

Is memcached all about timing?

I beleive the standard way to cache something with mamcached is to insert the object into the cache for a set period of time e.g.
memcli:set(key, rows_array, 5 * 60)
Is there not a better way to cache things where the cache will check inside the database to see if the data has changed rather than relying on a timer which could cause sync issues?
I'm going to use PHP.
The cache will not check the database, because that is contrary to the idea of caching.
Instead you could either update or remove objects from the cache when you change their value in the database.
If data is subject to be modified in the database, not controlled by your application where you can implement your cache in write-through fashion, then that data is probably not a good candidate for caching (assuming you can live with the stale data until it is evicted).
The cache certainly does not check the database. You have to do that yourself. You know how the tables in the database are related to each other and what tables you need to pull information from to render a page. Do that (or don't if you are satisfied with what you told mamcached).
Is there not a better way to cache
things where the cache will check
inside the database to see if the data
has changed rather than relying on a
timer which could cause sync issues?
That timer is not because of checking of database, but to free up memory(evicting data from the cache).
From google app engine(python):
Memcache is typically used with the
following pattern: The application
receives a query from the user or the
application. The application checks
whether the data needed to satisfy
that query is in memcache. If the data
is in memcache, the application uses
that data. If the data is not in
memcache, the application queries the
datastore and stores the results in
memcache for future requests. The
pseudocode below represents a typical
memcache request:
def get_data():
data = memcache.get("key")
if data is not None:
return data
else:
data = self.query_for_data()
memcache.add("key", data, 60)
return data
On update of key(database) you will also have to do an update to memcache key.
You may want to consider using redis. While not persistant (semi), this will offer some data storage mechanism and has a fast performance.
http://www.redis.io

retrieving data from memcache

I am starting to learn the benefits of memcache, and would like to implement it on my project. I have understood most of it, such as how data can be retrieved by a key and so on.
Now I get it that I can put a post with all of its details in memcache and call the key POST:123, that is OK, and I can do it for each post.
But how to deal with the case when I query the table posts to get the list of all posts with their titles. Can this be done with memcache, or should this always be queried from the table?
Memcache is a key-value cache, so as you've described, it is typically used when you know exactly what data you want to retrieve (ie., it is not used for querying and returning an unknown list of results based on some filters).
Generally, the goal is not to replace all database queries with memcache calls, especially if the optimization isn't necessary.
Assuming the data won't fit in a single value, and you did have a need for quick-access to all the data in your table, you could consider dumping it into keys based on some chunk-value, like a timestamp range.
However, it's probably best to either keep the database query or load it into memory directly (if we're talking about a single server that writes all the updates).
You could have a key called "posts" with a value of "1,2,3,4,10,12" and then update, retrieve it every time new post is created, updated, deleted.
In any case, memcached is not a database replacement, it's a key/value storage (fast and scalable at it). So you have to decide what data to store in DB and what offload to memcached.
Alternatively you could use "cachemoney" plugin which will do read/write through caching of your AcriveRecord in memory (even in memcached)
Look at the cache_money gem from Twitter. This gem adds caching at the ActiveRecord level.
The find calls by id will go through the cache. You can add support for indexing by other fields in your table (i.e. title in your case)
class Post < ActiveRecord::Base
index :title
end
Now the find calls that filter by title will go through cache.
Post.all(:conditions => {:title => "xxxx"})
Basically what you should do it check the cache first to see if it has the information you need. If there is nothing in the cache (or if the cached data is stale), then you should query the table and place the returned data in the cache, as well as return it to the front-end.
Does that help?
You cannot enumerate memcached contents - it's not designed to do that the way you try. You need to store the complete result set of SQL queries instead of single records.
So, yes: Do a SELECT title FROM posts and store the result in memcached (eg as POSTS:ALL_UNORDERED). This is how it is designed. And yes, if querying single records like SELECT * FROM posts WHERE id = XX then store that. Just keep in mind to create unique memcached keys for each query you perform.
As far as I know you can't query for multiple keys and values. But if you need to access the same list of posts often, why don't you store this in with a key like "posts" in your memcache.

Categories