I have many users polling my php script on an apache server and the mysql query they run is a
"SELECT * FROM `table` WHERE `id`>'example number'
Where example number can vary from user to user, but has a known lower bound which is updated every 10 minute.
The server is getting polled twice a second by each user.
Can memcache by used? It's not crucial that the user is displayed the most uptodate information, if it's a second behind or so that is fine.
The site has 200 concurrent users at peak times. It's hugely inefficient and costing a lot of resources.
To give an accurate answer, I will need more information
Whether the query is pulling personalised information.
Whether the polling request got the 'example number' coming along with the request.
looking at the way you have structured your question , it doesn't seem like the user is polling for any personalised information. So I assume the 'example number' is also coming as a part of the polling request.
I agree to #roberttstephens and #Filippos Karapetis , That you can use ideal solutions
Redis
NoSQL
Tune the MySQL
Memcache
But as you guys have the application already out there in the wild, implementing above solutions will have a cost, so these are the practical solutions I would recommend.
Add indexes for your table wrt to relevant columns. [first thing to check /do]
Enable mysql query caching.
Use a reverse proxy - eg : varnish . [assumption 'example number' comes as a part of the request]
To intersect the requests even before it hits your application server so that the MySQL query , MemCache/ Redis lookup doesn't happen.
Make sure that you are setting specific cache headers set on the response so that varnish caches it.
So, of the 200 concurrent requests , if 100 of them are querying for same number varnish takes the hit. [it is the same advantage that memcache can also offer].
Implementation wise it doesn't cost much in terms of development / testing efforts.
I understand this is not the answer to the exact question . But I am sure this could solve your problem.
If the 'example number' doesn't come as a part of the request , and you have to fetch it from the DB [by looking at the user table may be..] Then #roberttstephens approach is the way to go. just to give you the exact picture , I have refactored the code a little.
`addServer('localhost', 11211);
$inputNumber = 12345;
$cacheKey = "poll:".$inputNumber;
$result = $m->get($cacheKey);
if ($result) {
return unserialize($result);
}
$sth = $dbh->prepare("SELECT column1, column2 FROM poll WHERE id = $inputNumber");
$sth->execute();
$poll_results = $sth->fetch(PDO::FETCH_OBJ);
$m->set($cacheKey, serialize($poll_results));`
In my opinion, you're trying to use the wrong tool for the job here.
memcached is a key/value storage, so you can make it store and retrieve several values with a given set of keys very quickly. However, you don't seem to know the keys you want in advance, since you're looking for all records where the id is GREATER THAN a number, rather than a collection of IDs. So, in my opinion, memcached won't be appropriate to use in your scenario.
Here are your options:
Option 1: keep using MySQL and tune it properly
MySQL is quite fast if you tune it properly. You can:
add the appropriate indexes to each table
use prepared statements, which can help performance-wise in your case, as users are doing the same query over and over with different parameters
use query caching
Here's a guide with some hints on MySQL tuning, and mysqltuner, a Perl script that can guide you through the options needed to optimize your MySQL database.
Option 2: Use a more advanced key-value storage
There are alternatives to memcached, with the most known one being redis. redis does allow more flexibility, but it's more complex than memcached. For your scenario, you could use the redis zrange command to retrieve the results you want - have a look at the available redis commands for more information.
Option 3: Use a document storage NoSQL database
You can use a document storage NoSQL database, with the most known example being MongoDB.
You can use more complex queries in MongoDB (e.g. use operators, like "greater than", which you require) than you can do in memcached. Here's an example of how to search through results in a mongo collection using operators (check example 2).
Have a look at the PHP MongoDB manual for more information.
Also, this SO question is an interesting read regarding document storage NoSQL databases.
You can absolutely use memcached to cache the results. You could instead create a cache table in mysql with less effort.
In either case, you would need to create an id for the cache, and retrieve the results based on that id. You could use something like entity_name:entity_id, or namespace:entity_name:entity_id, or whatever works for you.
Keep in mind, memcached is another service running on the server. You have to install it, set it up to start on reboot (or you should at least), allocate memory, etc. You'll also need php-memcached.
With that said, please view the php documentation on memcached. http://php.net/manual/en/memcached.set.php . Assuming your poll id is 12345, you could use memcached like so.
<?php
// Get your results however you normally would.
$sth = $dbh->prepare("SELECT column1, column2 FROM poll WHERE id = 12345");
$sth->execute();
$poll_results = $sth->fetch(PDO::FETCH_OBJ);
// Set up memcached. It should obviously be installed, configured, and running by now.
$m = new Memcached();
$m->addServer('localhost', 11211);
$m->set('poll:12345', serialize($poll_results));
This example doesn't have any error checking or anything, but this should explain how to do it. I also don't have a php, mysql, or memcached instance running right now, so the above hasn't been tested.
Related
I have a JS script that does one simple thing - an ajax request to my server. On this server I establish a PDO connection, execute one prepared statement:
SELECT * FROM table WHERE param1 = :param1 AND param2 = :param2;
Where table is the table with 5-50 rows, 5-15 columns, with data changing once each day on average.
Then I echo the json result back to the script and do something with it, let's say I console log it.
The problem is that the script is run ~10,000 times a second. Which gives me that much connections to the database, and I'm getting can't connect to the database errors all the time in server logs. Which means sometimes it works, when DB processes are free, sometimes not.
How can I handle this?
Probable solutions:
Memcached - it would also be slow, it's not created to do that. The performance would be similar, or worse, to the database.
File on server instead of the database - great solution, but the structure would be the problem.
Anything better?
For such a tiny amount of data that is changed so rarely, I'd make it just a regular PHP file.
Once you have your data in the form of array, dump it in the php file using var_export(). Then just include this file and use a simple loop to search data.
Another option is to use Memcached, which was created exactly this sort of job and on a fast machine with high speed networking, memcached can easily handle 200,000+ requests per second, which is high above your modest 10k rps.
You can even eliminate PHP from the tract, making Nginx directly ask Memcached for the stored valaues, using ngx_http_memcached_module
If you want to stick with current Mysql-based solution, you can increase max_connections number in mysql configuration, however, making it above 200 would may require some OS tweaking as well. But what you should not is to make a persistent connection, that will make things far worse.
You need to leverage a cache. There is no reason at all to go fetch the data from the database every time the AJAX request is made for data that is this slow-changing.
A couple of approaches that you could take (possibly even in combination with each other).
Cache between application and DB. This might be memcache or similar and would allow you to perform hash-based lookups (likely based on some hash of parameters passed) to data stored in memory (perhaps JSON representation or whatever data format you ultimately return to the client).
Cache between client and application. This might take the form of web-server-provided cache, a CDN-based cache, or similar that would prevent the request from ever even reaching your application given an appropriately stored, non-expired item in the cache.
Anything better? No
Since you output the same results many times, the sensible solution is to cache results.
My educated guess is your wrong assumption -- that memcached is not built for this -- is based off you planning on storing each record separately
I implemented a simple caching mechanism for you to use :
<?php
$memcached_port = YOUR_MEMCACHED_PORT;
$m = new Memcached();
$m->addServer('localhost', $memcached_port);
$key1 = $_GET['key1'];
$key2 = $_GET['key2'];
$m_key = $key1.$key2; // generate key from unique values , for large keys use MD5 to hash the unique value
$data = false;
if(!($data = $m->get($m_key))) {
// fetch $data from your database
$expire = 3600; // 1 hour, you may use a unix timestamp value if you wish to expire in a specified time of day
$m->set($m_key,$data,$expire); // push to memcache
}
echo json_encode($data);
What you do is :
Decide on what signifies a result ( what set of input parameters )
Use that for the memcache key ( for example if it's a country and language the key would be $country.$language )
check if the result exists:
if it does pull the data you stored as an array and output it.
if it doesn't exist or is outdated :
a. pull the data needed
b. put the data in an array
c. push the data to memcached
d. output the data
There are more efficient ways to cache data, but this is the simplest one, and sounds like your kind of code.
10,000 requests/second still don't justify the effort needed to create server-level caching ( nginx/whatever )
In an ideally tuned world a chicken with a calculator would be able to run facebook .. but who cares ? (:
I have 1 mysql table which is controlled strictly by admin. Data entry is very low but query is high in that table. Since the table will not change content much I was thinking to use mysql query cache with PHP but got confused (when i googled about it) with memcached.
What is the basic difference between memcached and mysqlnd_qc ?
Which is most suitable for me as per below condition ?
I also intend to extend the same for autcomplete box, which will be suitable in such case ?
My queries will return less than 30 rows mostly of very few bytes data and will have same SELECT queries. I am on a single server and no load sharing will be done. Thankyou in advance.
If your query is always the same, i.e. you do SELECT title, stock FROM books WHERE stock > 5 and your condition never changes to stock > 6 etc., I would suggest using MySQL Query Cache.
Memcached is a key-value store. Basically it can cache anything if you can turn it into key => value. There are a lot of ways you can implement caching with it. You could query your 30 rows from database, then cache it row by row but I don't see a reason to do that here if you're returning the same set of rows over and over. The most basic example I can think of for memcached is:
// Run the query
$result = mysql_query($con, "SELECT title, stock FROM books WHERE stock > 5");
// Fetch result into one array
$rows = mysqli_fetch_all($result);
// Put the result into memcache.
$memcache_obj->add('my_books', serialize($rows), false, 30);
Then do a $memcache_obj->get('my_books'); and unserialize it to get the same results.
But since you're using the same query over and over. Why add the complication when you can let MySQL handle all the caching for you? Remember that if you go with memcached option, you need to setup memcached server as well as implementing logic to check if the result is already in cache or not, or if the records have been changed in the database.
I would recommend using MySQL query cache over memcached in this case.
One thing you need to be careful with MySQL query cache, though, is that your query must be exactly the same, no extra blank spaces, comments whatsoever. This is because MySQL does no parsing to determine compare the query string from cache at all. Any extra character somewhere in the query means a different query.
Peter Zaitsev explained very well about MySQL Query Cache at http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/, worth taking a look at it. Make sure you don't need anything that MySQL Query Cache does not support as Peter Zaitsev mentioned.
If the queries run fast enough and does not really slows your application, do not cache it. With a table this small, MySQL will keep it in it's own cache. If your application and database are on the same server, the benefit will be very small, maybe even not measurable at all.
So, for your 3rd question, it also depends on how you query the underlying tables. Most of the time, it is sufficient to let MySQL cache it internally. An other approach is to generate all the possible combinations and store these, so mysql does not need to compute the matching rows and returns the right one straight away.
As a general rule: build your application without caching and only add caches for things that do not change often if a) the computation for the resultset is complex and timeconsuming or b) you have multiple application instances calling the database over a network. In those cases caching results in better performance.
Also, if you run PHP in a web server like Apache, caching inside your program does not add much benefit as it only uses the cache for the current page. An external cache (like memcache)- is then needed to cache over multiple results.
What is the basic difference between memcached and mysqlnd_qc ?
There is rather nothing common at all between them
Which is most suitable for me as per below condition ?
mysql query cache
I also intend to extend the same for autcomplete box, which will be suitable in such case ?
Sphinx Search
I have a web app which is pretty CPU intensive ( it's basically a collection of dictionaries, but they are not just simple dictionaries, they do a lot of stuff, anyway this is not important ). So in a CPU intensive web app you have the scaling problem, too many simultaneous users and you get pretty slow responses.
The flow of my app is this:
js -> ajax call -> php -> vb6 dll -> vb6 code queries the dictionaries and does CPU intensive stuff -> reply to php -> reply to js -> html div gets updated with the new content. Obviously in a windows env with IIS 7.5. PHP acts just as a way of accessing the .dll and does nothing else.
The content replied/displayed is html formatted text.
The app has many php files which call different functions in the .dll.
So in order to avoid the calling of the vb6 dll for each request, which is the CPU intensive part, I'm thinking of doing this:
example ajax request:
php file: displayconjugationofword.php
parameter: word=lol&tense=2&voice=active
So when a user makes the above request to displayconjugationofword.php, i call the vb6 dll, then just before giving back the reply to the client, I can add in a MYSQL table the request data like this:
filename, request, content
displayconjugationofword.php, word=blahblah&tense=2&voice=active, blahblahblah
so next time that a user makes the EXACT same ajax request, the displayconjugationofword.php code, instead of calling the vb6 dll, checks first the mysql table to see if the request exists there and if it does, it fetches it from there.
So this mysql table will gradually grow in size, reaching up to 3-4 million rows and as it grows the chance of something requested being in the db, grows up too, which theoretically should be faster than doing the cpu intensive calls ( each anywhere from 50 to 750ms long ).
Do you think this is a good method of achieving what I want? or when the mysql table reaches 3-4 million entries, it will be slow too ?
thank you in advance for your input.
edit
i know about iis output caching but i think it's not useful in my case because:
1) AFAIK it only caches the .php file when it becomes "hot" ( many queries ).
2) i do have some .php files which call the vb6 but the reply is random each time.
I love these situations/puzzles! Here are the questions that I'd ask first, to determine what options are viable:
Do you have any idea/sense of how many of these queries are going to be repeated in a given hour, day, week?
Because... the 'more common caching technique' (i.e the technique I've seen and/or read about the most) is to use something like APC or, for scalability, something like Memcache. What I've seen, though, is that these are usually used for < 12 hour-long caches. That's just what I've seen. Benefit: auto-cleanup of unused items.
Can you give an estimate of how long a single 'task' might take?
Because... this will let you know if/when the cache becomes unproductive - that is, when the caching mechanism is slower than the task.
Here's what I'd propose as a solution - do it all from PHP (no surprise). In your work-flow, this would be both PHP points: js -> ajax call -> php -> vb6 dll -> vb6 code queries the dictionaries and does CPU intensive stuff -> reply to php -> reply to js -> html div...
Something like this:
Create a table with columns: __id, key, output, count, modified
1.1. The column '__id' is the auto-increment column (ex. INT(11) AUTO_INCREMENT) and thus is also the PRIMARY INDEX
1.2 The column 'modified' is created like this in MySQL: modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
1.3 'key' = CHAR(32) which is the string length of MD5 hashes. 'key' also has a UNIQUE INDEX (very important!! for 3.3 below)
1.4 'output' = TEXT since the VB6 code will be more than a little
1.5 'count' = INT(8) or so
Hash the query-string ("word=blahblah&tense=2&voice=active"). I'm thinking something like:$key = md5(var_export($_GET, TRUE)); Basically, hash whatever will give a unique output. Deducing from the example given, perhaps it might be best to lowercase the 'word' if case doesn't matter.
Run a conditional on the results of a SELECT for the key. In pseudo-code:
3.1. $result = SELECT output, count FROM my_cache_table_name WHERE key = "$key"
3.2. if (empty($result)) {
$output = result of running VB6 task
$count = 1
else
$count = $result['count'] + 1
3.3. run query 'INSERT INTO my_cache_table_name (key, output, count) VALUES ($key, $output, $count) ON DUPLICATE KEY UPDATE count = $count'
3.4. return $output as "reply to js"
Long-term, you will not only have a cache but you will also know what queries are being run the least and can prune them if needed. Personally, I don't think such a query will ever be all that time-consuming. And there are certainly things you might do to optimize the cache/querying (that's beyond me).
So what I'm not stating directly is this: the above will work (and is pretty much what you suggested). By adding a 'count' column, you will be able to see what queries are done a lot and/or a little and can come back and prune if/as needed.
If you want to see how long queries are taking, you might create another table that holds 'key', 'duration', and 'modified' (like above). Before 3.1 and 3.3, get the microtime(). If this is a cache-hit, subtract the microtime()s and store in this new table where 'key' = $key and 'duration' = 2nd microtime() - 1st microtime(). Then you can come back later, sort by 'modified DESC', and see how long queries are taking. If you have a TON of data and still the latest 'duration' is not bad, you can pull this whole duration-recording mechanism. Or, if bored, only store the duration when $key ends in a letter (just to cut down on it's load on the server)
I'm not an expert but this is an interesting logic problem. Hopefully what I've set out below will help or at least stimulate comments that may or may not make it useful.
to an extent, the answer is going to depend on how many queries you are likely to have, how many at once and whether the mysql indexing will be faster than your definitive solution.
A few thoughts then:
It would be possible to pass caching requests on to another server easily which would allow essentially infinite scaling.
Humans being as they are, most word requests are likely to involve only a few thousand words so you will, probably, find that most of the work being done is repeat work fairly soon. It makes sense then to create an indexable database.
Hashing has in the past been suggested as a good way to speed indexing of data. Whether this is useful or not will to an extent depend on the length of your answer.
If you are very clever, you could determine the top 10000 or so likely questions and responses and store them in a separate table for even faster responses. (Guru to comment?)
Does your dll already do caching of requests? If so then any further work will probably slow your service down.
This solution is amenable to simple testing using JS or php to generate multiple requests to test response speeds using or not using caching. Whichever you decide, I reckon you should test it with a large amount of sample data.
In order to get maximum performance for your example you need to follow basic cache optimization principle.
I'm not sure if you application logic allows it but if it is it will give you a huge benefit: you need to distinguish requests which can be cached (static) from those which return dynamic (random) responses. Use some file naming rule or provide some custom http header or request parameter - i.e. any part of the request which can be used to judge whether to cache it or not.
Speeding up static requests. The idea is to process incoming requests and send back reply as early as possible (ideally even before a web server comes into play). I suggest you to use output caching since it will do what you intend to do in php&mysql internally in much more performant way. Some options are:
Use IIS output caching feature (quick search shows it can cache queries basing on requested file name and query string).
Place a caching layer in front of a web server. Varnish (https://www.varnish-cache.org/) is a flexible and powerful opensource tool, and you can configure caching strategy optimally depending on the size of your data (use memory vs. disk, how much mem can be used etc).
Speeding up dynamic requests. If they are completely random internally (no dll calls which can be cached) then there's not much to do. If there are some dll calls which can be cached, do it like you described: fetch data from cache, if it's there, you're good, if not, fetch it from dll and save to cache.
But use something more suitable for the task of caching - key/value storage like Redis or memcached are good. They are blazingly fast. Redis may be a better option since the data can be persisted to disk (while memcached drops entire cache on restart so it needs to be refilled).
I am new to memcached and just started using that. I have few questions:
I have implemented MemCached in my php database class, where I am storing resultset (arrays) in memcache. My question is that as it is for website, say if 4 user access the same page and same query execution process, then what would memcache do? As per my understanding for 1 user, it will fetch from DB, for rest 3 system will use Memcache.? is that right?
4 users mean it objects of memcache will generate? but all will use same memory? IS same applies to 2 different pages on website? as bith pages will be using
$obj = memcached->connect(parameter);
I have run a small test. But results are starnge, when I execute query with normal mysql statements, execution time is lower than when my code uses memcached? why is that? if thats the case why every where its is written memcache is fast.?
please give some example to effectively test memcached execution time as compare to mormal mysql_fetch_object.
Memcache does not work "automatically". It is only a key => value map. You need to determine how it is used and implement it.
The preferred method is:
A. Attempt to get from memcache
B. If A. failed, get from db, add to memcache
C. Return result
D. If you ever update that data, expire all associated keys
This will not prevent the same query executing on the db multiple times. If 2 users both get the same data at the same time, and everything is executed nearly at the same time as well, both attempts to fetch from memcache will fail and add to memcache. And that is usually ok.
In code, it will create as many connections as current users since it is run from php which gets executed for each user. You might also connect multiple times (if you're not careful with your code) so it could be way more times.
Many times, the biggest lag for both memcache AND sql is actually network latency. If sql is on the same machine and memcache on a different machine, you will likely see slower times for memcache.
Also, many frameworks/people do not correctly implement multi-get. So, if you have 100 ids and you get by id from memcache, it will do 100 single gets rather than 1 multi-get. That is a huge slow down.
Memcache is fast. SQL with query caching for simple selects is also fast. Typically, you use memcache when:
the queries you are running are complicated/slow
OR
it is more cost effective to use memcache then have everyone hit the SQL server
OR
you have so many users that the database is not sufficient to keep up with the load
OR
you want to try out a technology because you think it's cool or nice to have on your resume.
You can use any variety of profiling software such as xdebug or phprof.
Alternatively, you can do this although less reliable due to other things happening on your server:
$start = microtime(true);
// do foo
echo microtime(true) - $start;
$start = microtime(true);
// do bar
echo microtime(true) - $start;
You have two reasons to use memcache:
1 . Offload your database server
That is, if you have a high load on your database server because you keep querying the same thing over and over again and the internal mysql cache is not working as fast as expected. Or your might have issues regarding write performance that is clugging your server, then memcache will help you offload mysql in a consistent and better way.
In the event that you myself server is NOT stressed, there could be no advantage to using memcached if it is mostly for performance gain. Memcached is still a server, you still have to connect to it and talk to it, so the network aspect is still maintained.
2 . Share data between users without relying to database
In another scenario, you might want to share some data or state between users of your web application without relying on files or on a sql server. Using memcached, you can set a value from a user's perspective and load it from another user.
Good examples of that would be chat logs between users, you don'T want to store everything in a database because it makes a lot of writes and reads and you want to share the data and don't care to lose everything in case an error comes around and the server restarts...
I hope my answer is satisfactory.
Good luck
Yes that is right. Bascially this is called caching and is unrelated to Memcached itself.
I do not understand fully. If all 4 users connect to the same memchache daemon, they will use shared memory, yes.
You have not given any code, so it is hard to tell. There can be many reasons, so I would not jump to conclusions with so little information given.
You need to metric your network traffic with deep packet inspection to effectively test and compare both execution times. I can not give an example for that in this answer. You might be okay with just using microtime and log whether cache was hit (result was already in cache) or missed (not yet in cache, need to take from the database).
I am trying to understand (and probably deploy) memcached in our env.
We have 4 web servers on loadbalancer running a big web app developed in PHP. We are already using APC.
I want to see how memcached works? At least, may be I don't understand how caching works.
We have some complex dynamic queries that combine several tables to pull data. Each time, the data is going to be from different client databases and data keeps changing. From my understanding, if some data is stored in cache, and if the request is same next time, the same data is returned. (Or I may be completely wrong here).
How does this whole memcache (or for that matter, any caching stuff works)?
Cache, in general, is a very fast key/value storage engine where you can store values (usually serialized) by a predetermined key, so you can retrieve the stored values by the same key.
In relation to MySQL, you would write your application code in such a way, that you would check for the presence of data in cache, before issuing a request to the database. If a match was found (matching key exists), you would then have access to the data associated to the key. The goal is to not issue a request to the more costly database if it can be avoided.
An example (demonstrative only):
$cache = new Memcached();
$cache->addServer('servername', 11211);
$myCacheKey = 'my_cache_key';
$row = $cache->get($myCacheKey);
if (!$row) {
// Issue painful query to mysql
$sql = "SELECT * FROM table WHERE id = :id";
$dbo->prepare($sql);
$stmt->bindValue(':id', $someId, PDO::PARAM_INT);
$row = $stmt->fetch(PDO::FETCH_OBJ);
$cache->set($myCacheKey, serialize($row));
}
// Now I have access to $row, where I can do what I need to
// And for subsequent calls, the data will be pulled from cache and skip
// the query altogether
var_dump(unserialize($row));
Check out PHP docs on memcached for more info, there are some good examples and comments.
There are several examples on how memcache works. Here is one of the links.
Secondly, Memcache can work with or without MySQL.
It caches your objects which are in PHP, now whether it comes from MySQL, or anywhere else, if its an PHP Object, it can be stored in MemCache.
APC gives you some more functionality than Memcache. Other than storing/caching PHP objects, it also caches PHP-executable-machine-readable-opcodes so that your PHP files won't go through the processes of loading in memory-> Being Comiled, rather, it directly runs the already compiled opcode from the memory.
If your data keeps changing(between requests) then caching is futile, because that data is going to be stale. But most of the times(I bet even in your cache) multiple requests to database result in same data set in which case a cache(in memory) is very useful.
P.S: I did a quick google search and found this video about memcached which has rather good quality => http://www.bestechvideos.com/2009/03/21/railslab-scaling-rails-episode-8-memcached. The only problem could be that it talks about Ruby On Rails(which I also don't use that much, but is very easy to understand). Hopefully it is going to help you grasp the concept a little better.