I have 1 mysql table which is controlled strictly by admin. Data entry is very low but query is high in that table. Since the table will not change content much I was thinking to use mysql query cache with PHP but got confused (when i googled about it) with memcached.
What is the basic difference between memcached and mysqlnd_qc ?
Which is most suitable for me as per below condition ?
I also intend to extend the same for autcomplete box, which will be suitable in such case ?
My queries will return less than 30 rows mostly of very few bytes data and will have same SELECT queries. I am on a single server and no load sharing will be done. Thankyou in advance.
If your query is always the same, i.e. you do SELECT title, stock FROM books WHERE stock > 5 and your condition never changes to stock > 6 etc., I would suggest using MySQL Query Cache.
Memcached is a key-value store. Basically it can cache anything if you can turn it into key => value. There are a lot of ways you can implement caching with it. You could query your 30 rows from database, then cache it row by row but I don't see a reason to do that here if you're returning the same set of rows over and over. The most basic example I can think of for memcached is:
// Run the query
$result = mysql_query($con, "SELECT title, stock FROM books WHERE stock > 5");
// Fetch result into one array
$rows = mysqli_fetch_all($result);
// Put the result into memcache.
$memcache_obj->add('my_books', serialize($rows), false, 30);
Then do a $memcache_obj->get('my_books'); and unserialize it to get the same results.
But since you're using the same query over and over. Why add the complication when you can let MySQL handle all the caching for you? Remember that if you go with memcached option, you need to setup memcached server as well as implementing logic to check if the result is already in cache or not, or if the records have been changed in the database.
I would recommend using MySQL query cache over memcached in this case.
One thing you need to be careful with MySQL query cache, though, is that your query must be exactly the same, no extra blank spaces, comments whatsoever. This is because MySQL does no parsing to determine compare the query string from cache at all. Any extra character somewhere in the query means a different query.
Peter Zaitsev explained very well about MySQL Query Cache at http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/, worth taking a look at it. Make sure you don't need anything that MySQL Query Cache does not support as Peter Zaitsev mentioned.
If the queries run fast enough and does not really slows your application, do not cache it. With a table this small, MySQL will keep it in it's own cache. If your application and database are on the same server, the benefit will be very small, maybe even not measurable at all.
So, for your 3rd question, it also depends on how you query the underlying tables. Most of the time, it is sufficient to let MySQL cache it internally. An other approach is to generate all the possible combinations and store these, so mysql does not need to compute the matching rows and returns the right one straight away.
As a general rule: build your application without caching and only add caches for things that do not change often if a) the computation for the resultset is complex and timeconsuming or b) you have multiple application instances calling the database over a network. In those cases caching results in better performance.
Also, if you run PHP in a web server like Apache, caching inside your program does not add much benefit as it only uses the cache for the current page. An external cache (like memcache)- is then needed to cache over multiple results.
What is the basic difference between memcached and mysqlnd_qc ?
There is rather nothing common at all between them
Which is most suitable for me as per below condition ?
mysql query cache
I also intend to extend the same for autcomplete box, which will be suitable in such case ?
Sphinx Search
Related
Currently i m using shared hosting domain for my site .But we have currently near about 11,00,000 rows in one of the tables.So its taking a lot of time to load the webpage.So we want to implement the database caching techniques like APC or memcache for our site.But in shared domain we dont have those facilities available,we have only eaccelerator.But eaccelerator does not cache db calls,If i m not wrong.So considering all these points we want to move to VPS and in this case.which database caching technique we need to use APC or memcache to decrease the page load time...Please guide on VPS and better caching technique of two
we have similar website and we use APC
APC will cache the opcode as well the html that is generated. This helps to avoid unrequired hits to the page
you should also enable caching on mysql to cache results of your query
I had a task where i needed to fetch rows from a database table that had more than 100.000 record. it was a scrollable page. So what i did was to fetch the first 50 records and cache the next 50 in the first call. and on scroll down events i wrote an ajax request to check if the data is available in cache; if not i fetched it from the database and also cached the next 50. It worked pretty well and solved the inconvenient load time.
if you have a similar scenario you might benefit from this approach.
ps: I used memcache.
From your comment I take it you're doing a LIKE %..% query and want to paginate the result. First of all, investigate whether FULLTEXT indices are an option for you, as they should perform better. If that's not an option, you can add a simple cache like so:
Treat each unique search term as an id, i.e. if in your URL you have ..?search=foobar, then "foobar" is the id of the result set. Keep that in all your links, e.g. ..?search=foobar&page=2.
If the result set does not yet exist (see below), create it:
Query the database with your slow query.
Get all the results into an array. Don't overdo it, you don't want to be storing hundreds of megabytes.
Create a unique filename per query, e.g. sha1($query), or maybe sha1(strtolower($query)).
serialize the data and store it in the file.
Get the data from the file, unserialize it, display the portion of the array corresponding to the requested page.
Occasionally, delete old cached results. You can do that with something like if (rand(0, 100) == 1) .., which will run the cleanup job every 100 queries on average. Strike a balance between server load and data freshness. Cache invalidation is a topic whole books can be written about, BTW.
That's a simple poor man's cache implementation. It's not great, but if you have absolutely nothing else to work with, it's better than running slow queries over and over.
APC is Alternative PHP Cache and works only with PHP. Whereas Memcahced will work independently with any language.
In my application, I try to grab all the data I need in as few queries as possible. This usually leads to large queries with many joins. This places limits on what you can cache using software like Memcache or Redis (as far as I know). With large queries, you don't know what parts might already be cached. It seems like you have to query everything in smaller parts so that these small parts can be cached individually. The idea would be that you only have to do dozens of small queries in order to populate caches and that most of the time you would hit the caches rather than query. Is this how high traffic PHP/MySQL websites handle this? Is there a good way to cache effectively even if you have large queries with many joins?
Example:
SELECT user.name, user.birthday
FROM follower
INNER JOIN user ON (user.id = follower.user)
WHERE follower.following = '1'
The results of this query include the names and birthdays of any users following user 1. The results of this query could be cached, but that would only be useful when getting followers of user 1.
The alternative:
SELECT follower.user
FROM follower
WHERE follower.following = '1'
For each result with ? populated by follower.user from the previous query:
SELECT name, birthday FROM user where user.id = ?
In this case, we can check to see if user ?'s name and birthday are cached before querying for them from MySQL. If they aren't cached, or some are cached and some not, then grab the missing ones and cache them. You could also cache the list of follower IDs and then none of the queries need to be run the next time. The difference is that the name and birthdays of the users will be useful to any other user that ends up need information about these followers in any other context.
Am I missing something on caching with larger queries? Or is the second way the right way?
The correct answer is: It depends.
Caching is a way of optimizing a recognized use pattern by shortcutting producing repeatedly expensive data with re-using the data from a previous run.
So the first question you should answer is: It there an observed repeated use pattern that has a noticable "expensive" step of producing data? If not: Don't use caching that you still do not need, wait until you can observe something.
The second question you should be able to answer is: Can you measure how long it takes with and without cache, and is the difference noticable?
And the third important question to answer is: How can you clean the cache from outdated information if the original data gets changed, and you want that new data to be displayed instantly?
So in your case you are asking if using a cache for plenty of small, but seemingly more universal queries that then get combined is more beneficial than caching one big query. There is no theoretical answer, because it depends on how much faster a cache hit for a big result is compared to multiple cache hits for the combined result. Making multiple requests to the cache may very well be SLOWER than fetching the data from the original source, and combining the data into the needed complex result might also be slower than fetching ONE complex result directly from the cache.
Also, if using multiple cache entries for a combined result, you'll now have to deal with plenty of cases where only parts of the information are outdated, while others are not. So the result just gets more unreliable - you cannot really be sure if every part of the result is up to date, or how old it is.
#Sven you make the point! I add few more raw suggestions.
#Barakat big queries usually are not a big deal for MySql, well designed db, indexes and tuning the engine parameters usually give high performances.
Do many little queries induces a lot of overhead (cached or not), I usually avoid that.
If your big query gives big results (hundred/thousands of row), may be you can avoid it paging the results or limit the answers to best scores.
A very simple and effecting tool to tune your mysql server is MysqlTuner.pl, because you can use the MySql internal cache without worry about coherence!
I have many users polling my php script on an apache server and the mysql query they run is a
"SELECT * FROM `table` WHERE `id`>'example number'
Where example number can vary from user to user, but has a known lower bound which is updated every 10 minute.
The server is getting polled twice a second by each user.
Can memcache by used? It's not crucial that the user is displayed the most uptodate information, if it's a second behind or so that is fine.
The site has 200 concurrent users at peak times. It's hugely inefficient and costing a lot of resources.
To give an accurate answer, I will need more information
Whether the query is pulling personalised information.
Whether the polling request got the 'example number' coming along with the request.
looking at the way you have structured your question , it doesn't seem like the user is polling for any personalised information. So I assume the 'example number' is also coming as a part of the polling request.
I agree to #roberttstephens and #Filippos Karapetis , That you can use ideal solutions
Redis
NoSQL
Tune the MySQL
Memcache
But as you guys have the application already out there in the wild, implementing above solutions will have a cost, so these are the practical solutions I would recommend.
Add indexes for your table wrt to relevant columns. [first thing to check /do]
Enable mysql query caching.
Use a reverse proxy - eg : varnish . [assumption 'example number' comes as a part of the request]
To intersect the requests even before it hits your application server so that the MySQL query , MemCache/ Redis lookup doesn't happen.
Make sure that you are setting specific cache headers set on the response so that varnish caches it.
So, of the 200 concurrent requests , if 100 of them are querying for same number varnish takes the hit. [it is the same advantage that memcache can also offer].
Implementation wise it doesn't cost much in terms of development / testing efforts.
I understand this is not the answer to the exact question . But I am sure this could solve your problem.
If the 'example number' doesn't come as a part of the request , and you have to fetch it from the DB [by looking at the user table may be..] Then #roberttstephens approach is the way to go. just to give you the exact picture , I have refactored the code a little.
`addServer('localhost', 11211);
$inputNumber = 12345;
$cacheKey = "poll:".$inputNumber;
$result = $m->get($cacheKey);
if ($result) {
return unserialize($result);
}
$sth = $dbh->prepare("SELECT column1, column2 FROM poll WHERE id = $inputNumber");
$sth->execute();
$poll_results = $sth->fetch(PDO::FETCH_OBJ);
$m->set($cacheKey, serialize($poll_results));`
In my opinion, you're trying to use the wrong tool for the job here.
memcached is a key/value storage, so you can make it store and retrieve several values with a given set of keys very quickly. However, you don't seem to know the keys you want in advance, since you're looking for all records where the id is GREATER THAN a number, rather than a collection of IDs. So, in my opinion, memcached won't be appropriate to use in your scenario.
Here are your options:
Option 1: keep using MySQL and tune it properly
MySQL is quite fast if you tune it properly. You can:
add the appropriate indexes to each table
use prepared statements, which can help performance-wise in your case, as users are doing the same query over and over with different parameters
use query caching
Here's a guide with some hints on MySQL tuning, and mysqltuner, a Perl script that can guide you through the options needed to optimize your MySQL database.
Option 2: Use a more advanced key-value storage
There are alternatives to memcached, with the most known one being redis. redis does allow more flexibility, but it's more complex than memcached. For your scenario, you could use the redis zrange command to retrieve the results you want - have a look at the available redis commands for more information.
Option 3: Use a document storage NoSQL database
You can use a document storage NoSQL database, with the most known example being MongoDB.
You can use more complex queries in MongoDB (e.g. use operators, like "greater than", which you require) than you can do in memcached. Here's an example of how to search through results in a mongo collection using operators (check example 2).
Have a look at the PHP MongoDB manual for more information.
Also, this SO question is an interesting read regarding document storage NoSQL databases.
You can absolutely use memcached to cache the results. You could instead create a cache table in mysql with less effort.
In either case, you would need to create an id for the cache, and retrieve the results based on that id. You could use something like entity_name:entity_id, or namespace:entity_name:entity_id, or whatever works for you.
Keep in mind, memcached is another service running on the server. You have to install it, set it up to start on reboot (or you should at least), allocate memory, etc. You'll also need php-memcached.
With that said, please view the php documentation on memcached. http://php.net/manual/en/memcached.set.php . Assuming your poll id is 12345, you could use memcached like so.
<?php
// Get your results however you normally would.
$sth = $dbh->prepare("SELECT column1, column2 FROM poll WHERE id = 12345");
$sth->execute();
$poll_results = $sth->fetch(PDO::FETCH_OBJ);
// Set up memcached. It should obviously be installed, configured, and running by now.
$m = new Memcached();
$m->addServer('localhost', 11211);
$m->set('poll:12345', serialize($poll_results));
This example doesn't have any error checking or anything, but this should explain how to do it. I also don't have a php, mysql, or memcached instance running right now, so the above hasn't been tested.
I just had this idea and thinks it's a good solution for this problem but I ask if there are some downsides to this method. I have a webpage that often queries database, as much as 3-5 queries per page load. Each query is making a dozen(literally) joins and then each of these queries results are used for another queries to construct PHP objects. Needless to say the load times are ridiculous even on the cloud but it's the way it works now.
I thought about storing the already constructed objects as JSON, or in MongoDB - BSON format. Will it be a good solution to use MongoDB as a cache engine of this type? Here is the example of how I think it will work:
When the user opens the page, if there is no data in Mongo with the proper ID, the queries to MySQL fire, each returning data that is being converted to a properly constructed object. The object is sent to the views and is converted to JSON and saved in Mongo.
If there was data in Mongo with the corresponding ID, it is being sent to PHP and converted.
When some of the data changes in MySQL (administrator edits/deletes content) a delete function is fired that will delete the edited/deleted object in MongoDB as well.
Is it a good way to use MongoDB? What are the downsides of this method? Is it better to use Redis for this task? I also need NoSQL for other elements of the project, that's why I'm considering to use one of these two instead of memcache.
MongoDB as a cache for frequent joins and queries from MySQL has some information, but it's totally irrelevant.
I think you would be better off using memcached or Redis to cache the query results. MongoDB is more of a full database than a cache. While both memcached and Redis are optimized for caching.
However, you could implement your cache as a two-level cache. Memcached, for example, does not guarantee that data will stay in the cache. (it might expire data when the storage is full). This makes it hard to implement a system for tags (so, for example, you add a tag for a MySQL table, and then you can trigger expiration for all query results associated with that table). A common solution for this, is to use memcached for caching, and a second slower, but more reliable cache, which should be faster than MySQL though. MongoDB could be a good candidate for that (as long as you can keep the queries to MongoDB simple).
Well you can go with Memcached or Redis for caching objects. Mongodb can be also used as a cache. I use mongodb for caching aggregation results, since it has advantage of wide range of queries as well unlike Memcached.
For example, in a tagging application, if I have to display page count corresponding to each tag, it scans whole table for a group by query. So I have a cronjob which computes that group by query and cache the aggregation result in Mongo. This works perfectly well for me in production. You can do this for countless other complex computations as well.
Also mongodb capped collections and TTL collections are perfect for caching.
I have about 10 tables with ~10,000 rows each which need to be pulled very often.
For example, list of countries, list of all schools in the world, etc.
PHP can't persist this stuff in memory (to my knowledge) so I would have to query the server for a SELECT * FROM TABLE every time. Should I use memcached here? At first though it's a clear absolutely yes, but at second thought, wouldn't mysql already be caching for me and this would be almost redundant?
I don't have too much understanding of how mysql caches data (or if it even does cache entire tables).
You could use MySQL query cache, but then you are still using DB resources to establish the connection and execute the query. Another option is opcode caching if your pages are relatively static. However I think memcached is the most flexible solution. For example if you have a list of countries which need to be accessed from various code-points within your application, you could pull the data from the persistent store (mysql), and store them into memcached. Then the data is available to any part of your application (including batch processes and cronjobs) for any business requirement.
I'd suggest reading up on the MySQL query cache:
http://dev.mysql.com/doc/refman/5.6/en/query-cache.html
You do need some kind of a cache here, certainly; layers of caching within and surrounding the database are considerably less efficient than what memcached can provide.
That said, if you're jumping to the conclusion that the Right Thing is to cache the query itself, rather than to cache the content you're generating based on the query, I think you're jumping to conclusions -- more analysis is needed.
What data, other than the content of these queries, is used during output generation? Would a page cache or page fragment cache (or caching reverse-proxy in front) make more sense? Is it really necessary to run these queries "often"? How frequently does the underlying data change? Do you have any kind of a notification event when that happens?
Also, SELECT * queries without a WHERE clause are a "code smell" (indicating that something probably is being done the Wrong Way), especially if not all of the data pulled is directly displayed to the user.