I have a JS script that does one simple thing - an ajax request to my server. On this server I establish a PDO connection, execute one prepared statement:
SELECT * FROM table WHERE param1 = :param1 AND param2 = :param2;
Where table is the table with 5-50 rows, 5-15 columns, with data changing once each day on average.
Then I echo the json result back to the script and do something with it, let's say I console log it.
The problem is that the script is run ~10,000 times a second. Which gives me that much connections to the database, and I'm getting can't connect to the database errors all the time in server logs. Which means sometimes it works, when DB processes are free, sometimes not.
How can I handle this?
Probable solutions:
Memcached - it would also be slow, it's not created to do that. The performance would be similar, or worse, to the database.
File on server instead of the database - great solution, but the structure would be the problem.
Anything better?
For such a tiny amount of data that is changed so rarely, I'd make it just a regular PHP file.
Once you have your data in the form of array, dump it in the php file using var_export(). Then just include this file and use a simple loop to search data.
Another option is to use Memcached, which was created exactly this sort of job and on a fast machine with high speed networking, memcached can easily handle 200,000+ requests per second, which is high above your modest 10k rps.
You can even eliminate PHP from the tract, making Nginx directly ask Memcached for the stored valaues, using ngx_http_memcached_module
If you want to stick with current Mysql-based solution, you can increase max_connections number in mysql configuration, however, making it above 200 would may require some OS tweaking as well. But what you should not is to make a persistent connection, that will make things far worse.
You need to leverage a cache. There is no reason at all to go fetch the data from the database every time the AJAX request is made for data that is this slow-changing.
A couple of approaches that you could take (possibly even in combination with each other).
Cache between application and DB. This might be memcache or similar and would allow you to perform hash-based lookups (likely based on some hash of parameters passed) to data stored in memory (perhaps JSON representation or whatever data format you ultimately return to the client).
Cache between client and application. This might take the form of web-server-provided cache, a CDN-based cache, or similar that would prevent the request from ever even reaching your application given an appropriately stored, non-expired item in the cache.
Anything better? No
Since you output the same results many times, the sensible solution is to cache results.
My educated guess is your wrong assumption -- that memcached is not built for this -- is based off you planning on storing each record separately
I implemented a simple caching mechanism for you to use :
<?php
$memcached_port = YOUR_MEMCACHED_PORT;
$m = new Memcached();
$m->addServer('localhost', $memcached_port);
$key1 = $_GET['key1'];
$key2 = $_GET['key2'];
$m_key = $key1.$key2; // generate key from unique values , for large keys use MD5 to hash the unique value
$data = false;
if(!($data = $m->get($m_key))) {
// fetch $data from your database
$expire = 3600; // 1 hour, you may use a unix timestamp value if you wish to expire in a specified time of day
$m->set($m_key,$data,$expire); // push to memcache
}
echo json_encode($data);
What you do is :
Decide on what signifies a result ( what set of input parameters )
Use that for the memcache key ( for example if it's a country and language the key would be $country.$language )
check if the result exists:
if it does pull the data you stored as an array and output it.
if it doesn't exist or is outdated :
a. pull the data needed
b. put the data in an array
c. push the data to memcached
d. output the data
There are more efficient ways to cache data, but this is the simplest one, and sounds like your kind of code.
10,000 requests/second still don't justify the effort needed to create server-level caching ( nginx/whatever )
In an ideally tuned world a chicken with a calculator would be able to run facebook .. but who cares ? (:
Related
I have the following web site:
The user inputs some data and based on it the server generates a lot of results, that need to be displayed back to the user. I am calculating the data with php, storing it in a MySQL DB and display it in Datatables with server side processing. The data needs to be saved for a limited time - on every whole hour the whole table with it is DROPPED and re-created.
The maximum observed load is: 7000 sessions/users per day, with max of 400 users at a single time. Every hour we have over 50 milion records inserted in the main table. We are using a Dedicated server with Intel i7 and 24GB ram, HDD disk.
The problem is that when more people (>100 at a time) use the site, the MySQL cannot handle the load and MySQL + hard disk become the bottleneck. The user has to wait minutes even for a few thousand results. The disk is HDD and for now there is not an option to put SSD.
The QUESTION(S):
Can replacing MySQL with Redis improve the performance and how much?
How to store the produced data in redis, so i can retrieve it for 1 user and sort it by any of the values and filter it?
I have the following data in php
$user_data = array (
array("id"=>1, "session"="3124", "set"=>"set1", "int1"=>1, "int2"=>11, "int3"=>111, "int4"=>1111),
array("id"=>2, "session"="1287", "set"=>"set2", "int1"=>2, "int2"=>22, "int3"=>222, "int4"=>2222)...
)
$user_data can be an array with length from 1 to 1-2milion (I am calculating it and inserting in the DB in chunks of 10000)
I need to store in redis data for at least 400 such users and be able to retrieve data for particular user in chunks of 10/20 for the pagination. I also need to be able to sort by any of the fields set (string), int1, int2... (i have around 22 int fields) and also filter by any of the integer fields ( similar to sql WHERE clause 9000 < int4 < 100000 ).
Also can redis make something similar to SQLs WHERE set LIKE '%value%'?
Probably Redis is a good fit for you problem, if you can hold all your data in memory. But you must re-think your data structure. Redis is very different than a relational database, and there is no direct migration.
As for you questions.
Probably it can help with performance. How much, it will depends on your use-case and data structure. Your constraint will not be hard-disk anymore, but maybe something else.
Redis have no concept similar to ORDER BY, or WHERE as SQL. You will be responsible to maintain your indices and filters.
I would create a HSET for every "record" and then, use several ZSET to create indexes of that records. (if you really need to order on any field, then you'll need one ZSET per field)
As for filters, the ZSET used for indexes, will probably be useful to filter ranges of int values.
Unfortunately for LIKE query, I really don't have a answer. When I need advanced search capabilities, I usually use ElasticSearch (in combination with redis and/or mysql)
1. Can replacing MySQL with Redis improve the performance and how much?
Yes, Redis can improve your basic read/write performance due to the fact that it stores the information directly in memory. This post describes a performance increase by a factor of 3, but the post is dated in 2009 so the numbers may have changed since.
However, this performance gain is only relevant as long as you have enough memory. Once you exceed the allotted amount of memory, your server will start swapping to disk, drastically reducing Redis performance.
Another thing to keep in mind is that information stored in Redis is not guaranteed to be persistent by default--the data set is only stored every 60 seconds or if at least 1000 keys change. Other changes will be lost on a server restart or power loss.
2. How to store the produced data in redis, so i can retrieve it for 1 user and sort it by any of the values and filter it?
Redis data store and has a different approach from traditional relational databases. It does not offer complex sorting, but basic sorting can be done through sorted sets and the SORT command. That will have to be done by the PHP server.
Redis does have any searching support--it will have to be implemented by your PHP server.
3. Conclusion
In my opinion, the best way to handle what you are asking is to use a Redis server for caching and the MySQL server for storing information that you need to be persistent (if you don't have any information that has to be persistent, you can just have the Redis server).
You said that
The data needs to be saved for a limited time - on every whole hour
the whole table with it is DROPPED and re-created.
which is perfect for Redis. Redis supports a TTL through the EXPIRE command on keys, which automatically deletes a key after a set amount of time. This way you don't need to drop and re-create any tables--Redis does it for you.
I have a web app which is pretty CPU intensive ( it's basically a collection of dictionaries, but they are not just simple dictionaries, they do a lot of stuff, anyway this is not important ). So in a CPU intensive web app you have the scaling problem, too many simultaneous users and you get pretty slow responses.
The flow of my app is this:
js -> ajax call -> php -> vb6 dll -> vb6 code queries the dictionaries and does CPU intensive stuff -> reply to php -> reply to js -> html div gets updated with the new content. Obviously in a windows env with IIS 7.5. PHP acts just as a way of accessing the .dll and does nothing else.
The content replied/displayed is html formatted text.
The app has many php files which call different functions in the .dll.
So in order to avoid the calling of the vb6 dll for each request, which is the CPU intensive part, I'm thinking of doing this:
example ajax request:
php file: displayconjugationofword.php
parameter: word=lol&tense=2&voice=active
So when a user makes the above request to displayconjugationofword.php, i call the vb6 dll, then just before giving back the reply to the client, I can add in a MYSQL table the request data like this:
filename, request, content
displayconjugationofword.php, word=blahblah&tense=2&voice=active, blahblahblah
so next time that a user makes the EXACT same ajax request, the displayconjugationofword.php code, instead of calling the vb6 dll, checks first the mysql table to see if the request exists there and if it does, it fetches it from there.
So this mysql table will gradually grow in size, reaching up to 3-4 million rows and as it grows the chance of something requested being in the db, grows up too, which theoretically should be faster than doing the cpu intensive calls ( each anywhere from 50 to 750ms long ).
Do you think this is a good method of achieving what I want? or when the mysql table reaches 3-4 million entries, it will be slow too ?
thank you in advance for your input.
edit
i know about iis output caching but i think it's not useful in my case because:
1) AFAIK it only caches the .php file when it becomes "hot" ( many queries ).
2) i do have some .php files which call the vb6 but the reply is random each time.
I love these situations/puzzles! Here are the questions that I'd ask first, to determine what options are viable:
Do you have any idea/sense of how many of these queries are going to be repeated in a given hour, day, week?
Because... the 'more common caching technique' (i.e the technique I've seen and/or read about the most) is to use something like APC or, for scalability, something like Memcache. What I've seen, though, is that these are usually used for < 12 hour-long caches. That's just what I've seen. Benefit: auto-cleanup of unused items.
Can you give an estimate of how long a single 'task' might take?
Because... this will let you know if/when the cache becomes unproductive - that is, when the caching mechanism is slower than the task.
Here's what I'd propose as a solution - do it all from PHP (no surprise). In your work-flow, this would be both PHP points: js -> ajax call -> php -> vb6 dll -> vb6 code queries the dictionaries and does CPU intensive stuff -> reply to php -> reply to js -> html div...
Something like this:
Create a table with columns: __id, key, output, count, modified
1.1. The column '__id' is the auto-increment column (ex. INT(11) AUTO_INCREMENT) and thus is also the PRIMARY INDEX
1.2 The column 'modified' is created like this in MySQL: modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
1.3 'key' = CHAR(32) which is the string length of MD5 hashes. 'key' also has a UNIQUE INDEX (very important!! for 3.3 below)
1.4 'output' = TEXT since the VB6 code will be more than a little
1.5 'count' = INT(8) or so
Hash the query-string ("word=blahblah&tense=2&voice=active"). I'm thinking something like:$key = md5(var_export($_GET, TRUE)); Basically, hash whatever will give a unique output. Deducing from the example given, perhaps it might be best to lowercase the 'word' if case doesn't matter.
Run a conditional on the results of a SELECT for the key. In pseudo-code:
3.1. $result = SELECT output, count FROM my_cache_table_name WHERE key = "$key"
3.2. if (empty($result)) {
$output = result of running VB6 task
$count = 1
else
$count = $result['count'] + 1
3.3. run query 'INSERT INTO my_cache_table_name (key, output, count) VALUES ($key, $output, $count) ON DUPLICATE KEY UPDATE count = $count'
3.4. return $output as "reply to js"
Long-term, you will not only have a cache but you will also know what queries are being run the least and can prune them if needed. Personally, I don't think such a query will ever be all that time-consuming. And there are certainly things you might do to optimize the cache/querying (that's beyond me).
So what I'm not stating directly is this: the above will work (and is pretty much what you suggested). By adding a 'count' column, you will be able to see what queries are done a lot and/or a little and can come back and prune if/as needed.
If you want to see how long queries are taking, you might create another table that holds 'key', 'duration', and 'modified' (like above). Before 3.1 and 3.3, get the microtime(). If this is a cache-hit, subtract the microtime()s and store in this new table where 'key' = $key and 'duration' = 2nd microtime() - 1st microtime(). Then you can come back later, sort by 'modified DESC', and see how long queries are taking. If you have a TON of data and still the latest 'duration' is not bad, you can pull this whole duration-recording mechanism. Or, if bored, only store the duration when $key ends in a letter (just to cut down on it's load on the server)
I'm not an expert but this is an interesting logic problem. Hopefully what I've set out below will help or at least stimulate comments that may or may not make it useful.
to an extent, the answer is going to depend on how many queries you are likely to have, how many at once and whether the mysql indexing will be faster than your definitive solution.
A few thoughts then:
It would be possible to pass caching requests on to another server easily which would allow essentially infinite scaling.
Humans being as they are, most word requests are likely to involve only a few thousand words so you will, probably, find that most of the work being done is repeat work fairly soon. It makes sense then to create an indexable database.
Hashing has in the past been suggested as a good way to speed indexing of data. Whether this is useful or not will to an extent depend on the length of your answer.
If you are very clever, you could determine the top 10000 or so likely questions and responses and store them in a separate table for even faster responses. (Guru to comment?)
Does your dll already do caching of requests? If so then any further work will probably slow your service down.
This solution is amenable to simple testing using JS or php to generate multiple requests to test response speeds using or not using caching. Whichever you decide, I reckon you should test it with a large amount of sample data.
In order to get maximum performance for your example you need to follow basic cache optimization principle.
I'm not sure if you application logic allows it but if it is it will give you a huge benefit: you need to distinguish requests which can be cached (static) from those which return dynamic (random) responses. Use some file naming rule or provide some custom http header or request parameter - i.e. any part of the request which can be used to judge whether to cache it or not.
Speeding up static requests. The idea is to process incoming requests and send back reply as early as possible (ideally even before a web server comes into play). I suggest you to use output caching since it will do what you intend to do in php&mysql internally in much more performant way. Some options are:
Use IIS output caching feature (quick search shows it can cache queries basing on requested file name and query string).
Place a caching layer in front of a web server. Varnish (https://www.varnish-cache.org/) is a flexible and powerful opensource tool, and you can configure caching strategy optimally depending on the size of your data (use memory vs. disk, how much mem can be used etc).
Speeding up dynamic requests. If they are completely random internally (no dll calls which can be cached) then there's not much to do. If there are some dll calls which can be cached, do it like you described: fetch data from cache, if it's there, you're good, if not, fetch it from dll and save to cache.
But use something more suitable for the task of caching - key/value storage like Redis or memcached are good. They are blazingly fast. Redis may be a better option since the data can be persisted to disk (while memcached drops entire cache on restart so it needs to be refilled).
I am new to memcached and just started using that. I have few questions:
I have implemented MemCached in my php database class, where I am storing resultset (arrays) in memcache. My question is that as it is for website, say if 4 user access the same page and same query execution process, then what would memcache do? As per my understanding for 1 user, it will fetch from DB, for rest 3 system will use Memcache.? is that right?
4 users mean it objects of memcache will generate? but all will use same memory? IS same applies to 2 different pages on website? as bith pages will be using
$obj = memcached->connect(parameter);
I have run a small test. But results are starnge, when I execute query with normal mysql statements, execution time is lower than when my code uses memcached? why is that? if thats the case why every where its is written memcache is fast.?
please give some example to effectively test memcached execution time as compare to mormal mysql_fetch_object.
Memcache does not work "automatically". It is only a key => value map. You need to determine how it is used and implement it.
The preferred method is:
A. Attempt to get from memcache
B. If A. failed, get from db, add to memcache
C. Return result
D. If you ever update that data, expire all associated keys
This will not prevent the same query executing on the db multiple times. If 2 users both get the same data at the same time, and everything is executed nearly at the same time as well, both attempts to fetch from memcache will fail and add to memcache. And that is usually ok.
In code, it will create as many connections as current users since it is run from php which gets executed for each user. You might also connect multiple times (if you're not careful with your code) so it could be way more times.
Many times, the biggest lag for both memcache AND sql is actually network latency. If sql is on the same machine and memcache on a different machine, you will likely see slower times for memcache.
Also, many frameworks/people do not correctly implement multi-get. So, if you have 100 ids and you get by id from memcache, it will do 100 single gets rather than 1 multi-get. That is a huge slow down.
Memcache is fast. SQL with query caching for simple selects is also fast. Typically, you use memcache when:
the queries you are running are complicated/slow
OR
it is more cost effective to use memcache then have everyone hit the SQL server
OR
you have so many users that the database is not sufficient to keep up with the load
OR
you want to try out a technology because you think it's cool or nice to have on your resume.
You can use any variety of profiling software such as xdebug or phprof.
Alternatively, you can do this although less reliable due to other things happening on your server:
$start = microtime(true);
// do foo
echo microtime(true) - $start;
$start = microtime(true);
// do bar
echo microtime(true) - $start;
You have two reasons to use memcache:
1 . Offload your database server
That is, if you have a high load on your database server because you keep querying the same thing over and over again and the internal mysql cache is not working as fast as expected. Or your might have issues regarding write performance that is clugging your server, then memcache will help you offload mysql in a consistent and better way.
In the event that you myself server is NOT stressed, there could be no advantage to using memcached if it is mostly for performance gain. Memcached is still a server, you still have to connect to it and talk to it, so the network aspect is still maintained.
2 . Share data between users without relying to database
In another scenario, you might want to share some data or state between users of your web application without relying on files or on a sql server. Using memcached, you can set a value from a user's perspective and load it from another user.
Good examples of that would be chat logs between users, you don'T want to store everything in a database because it makes a lot of writes and reads and you want to share the data and don't care to lose everything in case an error comes around and the server restarts...
I hope my answer is satisfactory.
Good luck
Yes that is right. Bascially this is called caching and is unrelated to Memcached itself.
I do not understand fully. If all 4 users connect to the same memchache daemon, they will use shared memory, yes.
You have not given any code, so it is hard to tell. There can be many reasons, so I would not jump to conclusions with so little information given.
You need to metric your network traffic with deep packet inspection to effectively test and compare both execution times. I can not give an example for that in this answer. You might be okay with just using microtime and log whether cache was hit (result was already in cache) or missed (not yet in cache, need to take from the database).
Basically, one part of some metrics that I would like to track is the amount of impressions that certain objects receive on our marketing platform.
If you imagine that we display lots of objects, we would like to track each time an object is served up.
Every object is returned to the client through a single gateway/interface. So if you imagine that a request comes in for a page with some search criteria, and then the search request is proxied to our Solr index.
We then get 10 results back.
Each of these 10 results should be regarded as an impression.
I'm struggling to find an incredibly fast and accurate implementation.
Any suggestions on how you might do this? You can throw in any number of technologies. We currently use, Gearman, PHP, Ruby, Solr, Redis, Mysql, APC and Memcache.
Ultimately all impressions should eventually be persisted to mysql, which I could do every hour. But I'm not sure how to store the impressions in memory fast without effecting the load time of the actual search request.
Ideas (I just added option 4 and 5)
Once the results are returned to the client, the client then requests a base64 encoded URI on our platform which contains the ID's of all of the objects that they have been served. This object is then passed to gearman, which then saves the count to redis. Once an hour, redis is flushed and the count is increments for each object in mysql.
After the results have been returned from Solr, loop over, and save directly to Redis. (Haven't benchmarked this for speed). Repeat the flushing to mysql every hour.
Once the items are returned from Solr, send all the ID's in a single job to gearman, which will then submit to Redis..
new idea Since the most number of items returned will be around 20, I could set a X-Application-Objects header with a base64 header of the ID's returned. These ID's (in the header) could then be stripped out by nginx, and using a custom LUA nginx module, I could write the ID's directly to Redis from nginx. This might be overkill though. The benefit of this though is that I can tell nginx to return the response object immediately while it's writing to redis.
new idea Use fastcgi_finish_request() in order to flush the request back to nginx, but then insert the results into Redis.
Any other suggestions?
Edit to Answer question:
The reliability of this data is not essential. So long as it is a best guess. I wouldn't want to see a swing of say 30% dropped impressions. But I would allow a tolerance of 10% -/+ acurracy.
I see your two best options as:
Using the increment command I redis to incremenent counters as you pull the dis. Use the Id as a key and increment it in Redis. Redis can easily handle hundreds of thousands of increments per second, so that should be fast enough to do without any noticeable client impact. You could even pipeline each request if the PHP language binding supports it. I think it does.
Use redis as a plain cache. In this option you would simply use a Redis list and do an rpush of a string containing the IDs separated by eg. a comma. You might use the hour of the day as the key. Then you can have a separate process pull it out by grabbing the previous hour and massaging it however you want to into MySQL. I'd you put an expires on keys you can have them cleaned out after a period of time, or just delete the keys with the post-processing process.
You can also use a read slave to do the exporting to MySQL from if you have very high redis traffic or just want to offload it and get as a bonus a backup of it. If you do that you can set the master redis instance to not flush to disk, increasing write performance.
For some additional options regarding a more extended use of redis' features for this sort of tracking see this answer You could also avoid the MySQL portion and pull the data from redis, keeping the overall system simpler.
I would do something like #2, and hand the data off to the fastest queue you can to update Redis counters. I'm not that familiar with Gearman, but I bet it's slow for this. If your Redis client supports asynchronous writes, I'd use that, or put this in a queue on a separate thread. You don't want to slow down your response waiting to update the counters.
I have considered about it for a long time.
I think I can't use all the API/PHP extension (e.g. memcache, APC, Xcache) that need to install something in my remote Linux server, as my web host server is a shared server, what I just can do is to place files/scripts in the httpdocs folder.
Is there any suggestion for me that can let me programmatically use caching and access the memory?
Actually what I aim at is to find a "place" to save some data, that can be accessed in higher speed than entering the DB to fetch data, and also to reduce the loading to DB.
That means, it is not a must to use memory, if someone can give any other effective suggestions. e.g. will using text file be a good choice?(actually I am just guessing it)
The PHP version of mine is 5.2.17. And I am using MySQL DB.
Hope someone can give me suggestions
Flat files will always be the EASIEST way for caching, but it will be slower than accessing data directly from memory. You can use MySQL tables that are stored in memory. you need to change the engine used by tables to memory. NOTE that this will work only if your db is on the same server as web server.
Set up an in memory table with two columns key and value. variable name will be a key and its contents are values. if you need to cache array, objects then serialize the data before storing it.
If you need to limit the size of in memory table add one more column hitCount. for each read increase the count by one. while inserting new row, check for max number of rows and if its reached a limit delete the row with lowest hitCount.
To check which one is faster (file caching or in memory cache) use following code
<?php
function getTime()
{
$a = explode (' ',microtime());
return(double) $a[0] + $a[1];
}
?>
<?php
$Start = getTime();
//Data fetching tasks comes here
$end = getTime();
echo "time taken = ".number_format(($End - $Start),2)."seconds";
?>
If possible let us know how efficient it is... Thanks
You could very easily just use flat text files as a cache if your DB queries are expensive. Just like you would use memcache with a key/value system, you can use filenames as keys and the context of the files as values.
Here's an example that caches the output of a single page in a file; you could adapt it to suit your needs: http://www.snipe.net/2009/03/quick-and-dirty-php-caching/
Flat file is the easiest way to cache business logic, queries etc on a shared server.
To cache any DB requests your best bet is to fetch the results, serialize them and store them in a file with a possible expiry date (if required). When you need to fetch those results again just pull in the file and unserialize the previously serialized data.
Also if the data is user based cookies and sessions will work too, for as long as the user stays on the application at least. If your pulling a lot of data it would still be better to go with the first option and just save the files based on a user/session id.
Depends on the size of data to cahce.
Based on the restriction of your server environment:
Use flat file( or maybe sqlite db) to cache your data for large data set (e.g., user
preference, user activity logs.)
Use share memory to cache your data for the smaller data set (e.g., system counter, system
status.)
Hope this helps.