I am trying to understand (and probably deploy) memcached in our env.
We have 4 web servers on loadbalancer running a big web app developed in PHP. We are already using APC.
I want to see how memcached works? At least, may be I don't understand how caching works.
We have some complex dynamic queries that combine several tables to pull data. Each time, the data is going to be from different client databases and data keeps changing. From my understanding, if some data is stored in cache, and if the request is same next time, the same data is returned. (Or I may be completely wrong here).
How does this whole memcache (or for that matter, any caching stuff works)?
Cache, in general, is a very fast key/value storage engine where you can store values (usually serialized) by a predetermined key, so you can retrieve the stored values by the same key.
In relation to MySQL, you would write your application code in such a way, that you would check for the presence of data in cache, before issuing a request to the database. If a match was found (matching key exists), you would then have access to the data associated to the key. The goal is to not issue a request to the more costly database if it can be avoided.
An example (demonstrative only):
$cache = new Memcached();
$cache->addServer('servername', 11211);
$myCacheKey = 'my_cache_key';
$row = $cache->get($myCacheKey);
if (!$row) {
// Issue painful query to mysql
$sql = "SELECT * FROM table WHERE id = :id";
$dbo->prepare($sql);
$stmt->bindValue(':id', $someId, PDO::PARAM_INT);
$row = $stmt->fetch(PDO::FETCH_OBJ);
$cache->set($myCacheKey, serialize($row));
}
// Now I have access to $row, where I can do what I need to
// And for subsequent calls, the data will be pulled from cache and skip
// the query altogether
var_dump(unserialize($row));
Check out PHP docs on memcached for more info, there are some good examples and comments.
There are several examples on how memcache works. Here is one of the links.
Secondly, Memcache can work with or without MySQL.
It caches your objects which are in PHP, now whether it comes from MySQL, or anywhere else, if its an PHP Object, it can be stored in MemCache.
APC gives you some more functionality than Memcache. Other than storing/caching PHP objects, it also caches PHP-executable-machine-readable-opcodes so that your PHP files won't go through the processes of loading in memory-> Being Comiled, rather, it directly runs the already compiled opcode from the memory.
If your data keeps changing(between requests) then caching is futile, because that data is going to be stale. But most of the times(I bet even in your cache) multiple requests to database result in same data set in which case a cache(in memory) is very useful.
P.S: I did a quick google search and found this video about memcached which has rather good quality => http://www.bestechvideos.com/2009/03/21/railslab-scaling-rails-episode-8-memcached. The only problem could be that it talks about Ruby On Rails(which I also don't use that much, but is very easy to understand). Hopefully it is going to help you grasp the concept a little better.
Related
I have a JS script that does one simple thing - an ajax request to my server. On this server I establish a PDO connection, execute one prepared statement:
SELECT * FROM table WHERE param1 = :param1 AND param2 = :param2;
Where table is the table with 5-50 rows, 5-15 columns, with data changing once each day on average.
Then I echo the json result back to the script and do something with it, let's say I console log it.
The problem is that the script is run ~10,000 times a second. Which gives me that much connections to the database, and I'm getting can't connect to the database errors all the time in server logs. Which means sometimes it works, when DB processes are free, sometimes not.
How can I handle this?
Probable solutions:
Memcached - it would also be slow, it's not created to do that. The performance would be similar, or worse, to the database.
File on server instead of the database - great solution, but the structure would be the problem.
Anything better?
For such a tiny amount of data that is changed so rarely, I'd make it just a regular PHP file.
Once you have your data in the form of array, dump it in the php file using var_export(). Then just include this file and use a simple loop to search data.
Another option is to use Memcached, which was created exactly this sort of job and on a fast machine with high speed networking, memcached can easily handle 200,000+ requests per second, which is high above your modest 10k rps.
You can even eliminate PHP from the tract, making Nginx directly ask Memcached for the stored valaues, using ngx_http_memcached_module
If you want to stick with current Mysql-based solution, you can increase max_connections number in mysql configuration, however, making it above 200 would may require some OS tweaking as well. But what you should not is to make a persistent connection, that will make things far worse.
You need to leverage a cache. There is no reason at all to go fetch the data from the database every time the AJAX request is made for data that is this slow-changing.
A couple of approaches that you could take (possibly even in combination with each other).
Cache between application and DB. This might be memcache or similar and would allow you to perform hash-based lookups (likely based on some hash of parameters passed) to data stored in memory (perhaps JSON representation or whatever data format you ultimately return to the client).
Cache between client and application. This might take the form of web-server-provided cache, a CDN-based cache, or similar that would prevent the request from ever even reaching your application given an appropriately stored, non-expired item in the cache.
Anything better? No
Since you output the same results many times, the sensible solution is to cache results.
My educated guess is your wrong assumption -- that memcached is not built for this -- is based off you planning on storing each record separately
I implemented a simple caching mechanism for you to use :
<?php
$memcached_port = YOUR_MEMCACHED_PORT;
$m = new Memcached();
$m->addServer('localhost', $memcached_port);
$key1 = $_GET['key1'];
$key2 = $_GET['key2'];
$m_key = $key1.$key2; // generate key from unique values , for large keys use MD5 to hash the unique value
$data = false;
if(!($data = $m->get($m_key))) {
// fetch $data from your database
$expire = 3600; // 1 hour, you may use a unix timestamp value if you wish to expire in a specified time of day
$m->set($m_key,$data,$expire); // push to memcache
}
echo json_encode($data);
What you do is :
Decide on what signifies a result ( what set of input parameters )
Use that for the memcache key ( for example if it's a country and language the key would be $country.$language )
check if the result exists:
if it does pull the data you stored as an array and output it.
if it doesn't exist or is outdated :
a. pull the data needed
b. put the data in an array
c. push the data to memcached
d. output the data
There are more efficient ways to cache data, but this is the simplest one, and sounds like your kind of code.
10,000 requests/second still don't justify the effort needed to create server-level caching ( nginx/whatever )
In an ideally tuned world a chicken with a calculator would be able to run facebook .. but who cares ? (:
I have many users polling my php script on an apache server and the mysql query they run is a
"SELECT * FROM `table` WHERE `id`>'example number'
Where example number can vary from user to user, but has a known lower bound which is updated every 10 minute.
The server is getting polled twice a second by each user.
Can memcache by used? It's not crucial that the user is displayed the most uptodate information, if it's a second behind or so that is fine.
The site has 200 concurrent users at peak times. It's hugely inefficient and costing a lot of resources.
To give an accurate answer, I will need more information
Whether the query is pulling personalised information.
Whether the polling request got the 'example number' coming along with the request.
looking at the way you have structured your question , it doesn't seem like the user is polling for any personalised information. So I assume the 'example number' is also coming as a part of the polling request.
I agree to #roberttstephens and #Filippos Karapetis , That you can use ideal solutions
Redis
NoSQL
Tune the MySQL
Memcache
But as you guys have the application already out there in the wild, implementing above solutions will have a cost, so these are the practical solutions I would recommend.
Add indexes for your table wrt to relevant columns. [first thing to check /do]
Enable mysql query caching.
Use a reverse proxy - eg : varnish . [assumption 'example number' comes as a part of the request]
To intersect the requests even before it hits your application server so that the MySQL query , MemCache/ Redis lookup doesn't happen.
Make sure that you are setting specific cache headers set on the response so that varnish caches it.
So, of the 200 concurrent requests , if 100 of them are querying for same number varnish takes the hit. [it is the same advantage that memcache can also offer].
Implementation wise it doesn't cost much in terms of development / testing efforts.
I understand this is not the answer to the exact question . But I am sure this could solve your problem.
If the 'example number' doesn't come as a part of the request , and you have to fetch it from the DB [by looking at the user table may be..] Then #roberttstephens approach is the way to go. just to give you the exact picture , I have refactored the code a little.
`addServer('localhost', 11211);
$inputNumber = 12345;
$cacheKey = "poll:".$inputNumber;
$result = $m->get($cacheKey);
if ($result) {
return unserialize($result);
}
$sth = $dbh->prepare("SELECT column1, column2 FROM poll WHERE id = $inputNumber");
$sth->execute();
$poll_results = $sth->fetch(PDO::FETCH_OBJ);
$m->set($cacheKey, serialize($poll_results));`
In my opinion, you're trying to use the wrong tool for the job here.
memcached is a key/value storage, so you can make it store and retrieve several values with a given set of keys very quickly. However, you don't seem to know the keys you want in advance, since you're looking for all records where the id is GREATER THAN a number, rather than a collection of IDs. So, in my opinion, memcached won't be appropriate to use in your scenario.
Here are your options:
Option 1: keep using MySQL and tune it properly
MySQL is quite fast if you tune it properly. You can:
add the appropriate indexes to each table
use prepared statements, which can help performance-wise in your case, as users are doing the same query over and over with different parameters
use query caching
Here's a guide with some hints on MySQL tuning, and mysqltuner, a Perl script that can guide you through the options needed to optimize your MySQL database.
Option 2: Use a more advanced key-value storage
There are alternatives to memcached, with the most known one being redis. redis does allow more flexibility, but it's more complex than memcached. For your scenario, you could use the redis zrange command to retrieve the results you want - have a look at the available redis commands for more information.
Option 3: Use a document storage NoSQL database
You can use a document storage NoSQL database, with the most known example being MongoDB.
You can use more complex queries in MongoDB (e.g. use operators, like "greater than", which you require) than you can do in memcached. Here's an example of how to search through results in a mongo collection using operators (check example 2).
Have a look at the PHP MongoDB manual for more information.
Also, this SO question is an interesting read regarding document storage NoSQL databases.
You can absolutely use memcached to cache the results. You could instead create a cache table in mysql with less effort.
In either case, you would need to create an id for the cache, and retrieve the results based on that id. You could use something like entity_name:entity_id, or namespace:entity_name:entity_id, or whatever works for you.
Keep in mind, memcached is another service running on the server. You have to install it, set it up to start on reboot (or you should at least), allocate memory, etc. You'll also need php-memcached.
With that said, please view the php documentation on memcached. http://php.net/manual/en/memcached.set.php . Assuming your poll id is 12345, you could use memcached like so.
<?php
// Get your results however you normally would.
$sth = $dbh->prepare("SELECT column1, column2 FROM poll WHERE id = 12345");
$sth->execute();
$poll_results = $sth->fetch(PDO::FETCH_OBJ);
// Set up memcached. It should obviously be installed, configured, and running by now.
$m = new Memcached();
$m->addServer('localhost', 11211);
$m->set('poll:12345', serialize($poll_results));
This example doesn't have any error checking or anything, but this should explain how to do it. I also don't have a php, mysql, or memcached instance running right now, so the above hasn't been tested.
I am new to memcached and just started using that. I have few questions:
I have implemented MemCached in my php database class, where I am storing resultset (arrays) in memcache. My question is that as it is for website, say if 4 user access the same page and same query execution process, then what would memcache do? As per my understanding for 1 user, it will fetch from DB, for rest 3 system will use Memcache.? is that right?
4 users mean it objects of memcache will generate? but all will use same memory? IS same applies to 2 different pages on website? as bith pages will be using
$obj = memcached->connect(parameter);
I have run a small test. But results are starnge, when I execute query with normal mysql statements, execution time is lower than when my code uses memcached? why is that? if thats the case why every where its is written memcache is fast.?
please give some example to effectively test memcached execution time as compare to mormal mysql_fetch_object.
Memcache does not work "automatically". It is only a key => value map. You need to determine how it is used and implement it.
The preferred method is:
A. Attempt to get from memcache
B. If A. failed, get from db, add to memcache
C. Return result
D. If you ever update that data, expire all associated keys
This will not prevent the same query executing on the db multiple times. If 2 users both get the same data at the same time, and everything is executed nearly at the same time as well, both attempts to fetch from memcache will fail and add to memcache. And that is usually ok.
In code, it will create as many connections as current users since it is run from php which gets executed for each user. You might also connect multiple times (if you're not careful with your code) so it could be way more times.
Many times, the biggest lag for both memcache AND sql is actually network latency. If sql is on the same machine and memcache on a different machine, you will likely see slower times for memcache.
Also, many frameworks/people do not correctly implement multi-get. So, if you have 100 ids and you get by id from memcache, it will do 100 single gets rather than 1 multi-get. That is a huge slow down.
Memcache is fast. SQL with query caching for simple selects is also fast. Typically, you use memcache when:
the queries you are running are complicated/slow
OR
it is more cost effective to use memcache then have everyone hit the SQL server
OR
you have so many users that the database is not sufficient to keep up with the load
OR
you want to try out a technology because you think it's cool or nice to have on your resume.
You can use any variety of profiling software such as xdebug or phprof.
Alternatively, you can do this although less reliable due to other things happening on your server:
$start = microtime(true);
// do foo
echo microtime(true) - $start;
$start = microtime(true);
// do bar
echo microtime(true) - $start;
You have two reasons to use memcache:
1 . Offload your database server
That is, if you have a high load on your database server because you keep querying the same thing over and over again and the internal mysql cache is not working as fast as expected. Or your might have issues regarding write performance that is clugging your server, then memcache will help you offload mysql in a consistent and better way.
In the event that you myself server is NOT stressed, there could be no advantage to using memcached if it is mostly for performance gain. Memcached is still a server, you still have to connect to it and talk to it, so the network aspect is still maintained.
2 . Share data between users without relying to database
In another scenario, you might want to share some data or state between users of your web application without relying on files or on a sql server. Using memcached, you can set a value from a user's perspective and load it from another user.
Good examples of that would be chat logs between users, you don'T want to store everything in a database because it makes a lot of writes and reads and you want to share the data and don't care to lose everything in case an error comes around and the server restarts...
I hope my answer is satisfactory.
Good luck
Yes that is right. Bascially this is called caching and is unrelated to Memcached itself.
I do not understand fully. If all 4 users connect to the same memchache daemon, they will use shared memory, yes.
You have not given any code, so it is hard to tell. There can be many reasons, so I would not jump to conclusions with so little information given.
You need to metric your network traffic with deep packet inspection to effectively test and compare both execution times. I can not give an example for that in this answer. You might be okay with just using microtime and log whether cache was hit (result was already in cache) or missed (not yet in cache, need to take from the database).
I would like to know if it's possible to store a "ressource" within memcache, I'm currently trying the following code but apparently it's not correct:
$result = mysql_query($sSQL);
$memcache->set($key, $result, 0, $ttl);
return $result;
I have to disagree with zerkms. Just because MySQL has a caching system (actually, it has several), doesn't mean that there's no benefit to optimizing your database access. MySQL's Query Cache is great, but it still has limitations:
it's not suitable for large data sets
queries have to be identical (character for character)
it does not support prepared statements or queries using user-defined functions, temporary tables, or tables with column-level privileges
cache results are cleared every time the table is modified, regardless of whether the result set is affected
unless it resides on the same machine as the web server it still incurs unnecessary network overhead
Even with a remote server, Memcached is roughly 23% faster than MQC. And using APC's object cache, you can get up to a 990% improvement over using MQC alone.
So there are plenty of reasons to cache database result sets outside of MySQL's Query Cache. After all, you cache result data locally in a PHP variable when you need to access it multiple times in the same script. So why wouldn't you extend this across multiple requests if the result set doesn't change?
And just because the server is fast enough doesn't mean you shouldn't strive to write efficient code. It's not like it takes that much effort to cache database results—especially when accelerators like APC and Memcached were designed for this exact purpose. (And I wouldn't dismiss this question as such a "strange idea" when some of the largest sites on the internet use Memcached in conjunction with MySQL.)
That said, zerkms is correct in that you have to fetch the results first, then you can cache the data using APC or Memcached. There is however another option to caching query results manually, which is to use the Mysqlnd query result cache plugin. This is a client-side cache of MySQL query results.
The Mysqlnd query result cache plugin lets you transparently cache your queries using APC, Memcached, sqlite, or a user-specified data source. However, this plugin currently shares the same limitation as MQC in that prepared statements can't be cached.
Why do you need so? Mysql has its own performant query cache
but if you still want to follow your strange idea - you need to fetch all the data into array (with mysql_fetch_assoc or whatever) and after that store that array into the memcached.
I beleive the standard way to cache something with mamcached is to insert the object into the cache for a set period of time e.g.
memcli:set(key, rows_array, 5 * 60)
Is there not a better way to cache things where the cache will check inside the database to see if the data has changed rather than relying on a timer which could cause sync issues?
I'm going to use PHP.
The cache will not check the database, because that is contrary to the idea of caching.
Instead you could either update or remove objects from the cache when you change their value in the database.
If data is subject to be modified in the database, not controlled by your application where you can implement your cache in write-through fashion, then that data is probably not a good candidate for caching (assuming you can live with the stale data until it is evicted).
The cache certainly does not check the database. You have to do that yourself. You know how the tables in the database are related to each other and what tables you need to pull information from to render a page. Do that (or don't if you are satisfied with what you told mamcached).
Is there not a better way to cache
things where the cache will check
inside the database to see if the data
has changed rather than relying on a
timer which could cause sync issues?
That timer is not because of checking of database, but to free up memory(evicting data from the cache).
From google app engine(python):
Memcache is typically used with the
following pattern: The application
receives a query from the user or the
application. The application checks
whether the data needed to satisfy
that query is in memcache. If the data
is in memcache, the application uses
that data. If the data is not in
memcache, the application queries the
datastore and stores the results in
memcache for future requests. The
pseudocode below represents a typical
memcache request:
def get_data():
data = memcache.get("key")
if data is not None:
return data
else:
data = self.query_for_data()
memcache.add("key", data, 60)
return data
On update of key(database) you will also have to do an update to memcache key.
You may want to consider using redis. While not persistant (semi), this will offer some data storage mechanism and has a fast performance.
http://www.redis.io