In my database, there's a field named comments that says how many comments are in a post. Until now, I've been using LIMIT with that specific amount while retrieving all the comments, but I worry that it may lead to inaccuracies if it's a popular post in which comments are added/deleted quickly.
My question is, while retrieving comments, let's say, is an upper boundary good to use? Do websites just query the entire table? I used LIMIT because I thought it would be a tad more efficient, but is there really a difference?
Thanks in advance.
You can use SQL_CALC_FOUND_ROWS and FOUND_ROWS() instead of storing number of comments.
Example of select statement:
SELECT SQL_CALC_FOUND_ROWS *
FROM comments
WHERE post_id = ?
ORDER BY id DESC
LIMIT 0, 10
When you run this query, you will get your comments, and right after that run query:
SELECT FOUND_ROWS()
than will return number of comments in previous query without LIMIT.
You can query the first 30 comments (lets say) using LIMIT, and the total comments that belong to your post using COUNT. Now, build up pagination links based on that count.
Now, whenever the user clicks the page number n, use AJAX to fetch the new comments and replace the existing comments. Use LIMIT along with OFFSET for this purpose.
This way, you won't need to pull up all your comments for that post in one go, and at the same time, will be able to provide your users with the required comments, if they need to read more..
Most popular sites, and I think WordPress, as well, uses this process.
UPDATE
If you are using load more kind of thing, you do not need to even run a COUNT query. Just load the first 30 comments, and an ajax based solution to fetch the next 30 queries, and rinse and repeat, unless the sql returns empty results.
The offset should be incremented by 30 (in this case) on every ajax request, but the limit should stay at 30 each time.
Related
I am making a simple message board in PHP with a MySQL database. I have limited messages to 20 a page with the 'LIMIT' operation.
An example of my URL is: http://www.example.com/?page=1
If page is not specified, it defaults to 1. As I mentioned earlier, there is a limit of 20 per page, however if that is out of a possible 30 and I wish to view page 2, I only end up with 10 results. In this case, the LIMIT part of my query resembles LIMIT 20,40 - How can I ensure in this case that 20 are returned?
I would prefer to try and keep this as much on the MySQL side as possible.
EDIT:
To clarify, if I am on page 2, I will be fetching rows 20-30, however this is only 10 rows, so I wish to select 10-30 instead.
EDIT:
I am currently using the following query:
My query:
SELECT MOD(COUNT(`ID`),20) AS lmt WHERE `threadID`=2;
SELECT * FROM `msg_messages` WHERE `threadID`=2 LIMIT 20-(20-lmt) , 40-(20-lmt) ;
There are 30 records this matches.
I'm not sure to really understand the question, but if I do I think that the best practive would be to prevent users to go to a page with no results. To do so, you can easily check how many rows you have in total even if you are using the "LIMIT" clause using SQL_CALC_FOUND_ROWS.
For example you could do:
Select SQL_CALC_FOUND_ROWS * from blog_posts where balblabla...
Then you have to run another query like this:
Select FOUND_ROWS() as posts_count
It will return the total rows very quickly. With this result and knowing the current page, you can decide if you display next/prev links to the user.
You could do a:
SELECT COUNT(*)/20 AS pages FROM tbl;
..to get the max number of pages, then work out if you're going to be left with a partial set and adjust your paging query accordingly.
For example, if I have to count the comments belonging to an article, it's obvious I don't need to cache the comments total.
But what if I want to paginate a gallery (WHERE status = 1) containing 1 million photos. Should I save that in a table called counts or SELECT count(id) as total every time is fine?
Are there other solutions?
Please advise. Thanks.
For MySQL, you don't need to store the counts, you can use SQL_CALC_FOUND_ROWS to avoid two queries.
E.g.,
SELECT SQL_CALC_FOUND_ROWS *
FROM Gallery
WHERE status = 1
LIMIT 10;
SELECT FOUND_ROWS();
From the manual:
In some cases, it is desirable to know how many rows the statement
would have returned without the LIMIT, but without running the
statement again. To obtain this row count, include a
SQL_CALC_FOUND_ROWS option in the SELECT statement, and then invoke
FOUND_ROWS() afterward.
Sample usage here.
It depends a bit on the amount of queries that are done on that table with 1 million records. Consider just taking care of good indexes, especially also multi-column indexes (because they are easily forgotton: here. That will do a lot. And, be sure the queries become cached also well on your server.
If you use this column very regular, consider saving it (if it can't be cached by MySQL), as things could become slow. But most of the times good indexing will take care of it.
Best try: setup some tests to find out if a query can still be fast and performance is not dropping when you execute it a lot of times in a row.
EXPLAIN [QUERY]
Use that command (in MySQL) to get information about the way the query is performed and if it can be improved.
Doing the count every time would be OK.
During paging, you can use SQL_CALC_FOUND_ROWS anyway
Note:
A denormalied count will become stale
No-one will page so many items
On Facebook, when a wall post has a lot of comments, you see only the last two comments. Right above them, however, is a link that says something like "View all 15 comments."
In MySQL, would this require two queries? One to get the total comment count, and the other to get the contents of the last two comments? Then, clicking that "View all" link requires a third in order to get the content of the remaining comments.
If this is the case, wouldn't it be more efficient to simply get the contents of all the comments in a single MySQL query? This would allow you to find the total number using PHP, and you wouldn't need to make the third query if you simply output the contents of all the comments to the page, using css to hide those that you don't want initially shown and JavaScript to show them when the user clicks the "View All" link.
I assume I'm wrong. Surely the Facebook developers are better at this than I am. But I'm developing a site that has a similar requirement and I'm trying to understand the most efficient way to achieve this kind of functionality.
In MySQL you could use a limited select to retrieve the latest comments specifying SQL_CALC_FOUND_ROWS followed by a FOUND_ROWS() query to fetch the count of all comments.
ie. First do :
select SQL_CALC_FOUND_ROWS title, description from comments order by post_date desc limit 2;
to retrieve the 2 most recent comments, then do :
select FOUND_ROWS();
to retrieve the total number of comments.
Refer to the FOUND_ROWS() documentation on mysql.com for more information.
As far as performance is concerned, it should be faster then executing another COUNT(*) query (according to the docs :p)...
They probably know that something like 90% of all page viewers will never click on the "view all" link, so by doing it that way, they conserve the extra database traffic, and the extra web server to client usage.
Performing two database queries back-to-back between the webserver the the database has neglible overhead. There is no need to set up another connection or for the db server to create another thread to service the connection. This isn't really something ripe for optimization. Reducing the amount of data retrieved is prudent use of resources.
Ultimately, the best/fastest way to get the comments count is to store them within the news post table, your almost caching the "COUNT(*)" query to the comments column.
You can then chose to load the comments with a second query if you need to or skip it completely if there are no comments.
You could also opt for a LEFT JOIN query which will count the comments within a subquery, and return 0 if it needs to, keeping it as 1 query instead of 2.
What is the ideal way to keep track of the total count of items when dealing with paged results?
This seems like a simple question at first but it is slightly more complicated (to me... just bail now if you find this too stupid for words) when I actually start thinking about how to do it efficiently.
I need to get a count of items from the database. This is simple enough. I can then store this count in some variable (a $_SESSION variable for instance). I can check to see if this variable is set and if it isn't, get the count again. The trick part is deciding what is the best way to determine when I need to get a new count. It seems I would need to get a new count if I have added/deleted items to the total or if I am reloading or revisiting the grid.
So, how would I decide when to clear this $_SESSION variable? I can see clearing it and getting a new count after an update/delete (or even adding or subtracting to it to avoid the potentially expensive database hit) but (here comes the part I find tricky) what about when someone navigates away from the page or waits a variable amount of time before going to the next page of results or reloads the page?
Since we may be dealing with tens or hundreds of thousands of results, getting a count of them from the database could be quite expensive (right? Or is my assumption incorrect?). Since I need the total count to handle the total number of pages in the paged results... what's the most efficient way to handle this sort of situation and to persist it for... as long as might be needed?
BTW, I would get the count with an SQL query like:
SELECT COUNT(id) FROM foo;
I never use a session variable to store the total found in a query, I include the count in the regular query when I get the information and the count itself comes from a second query:
// first query
SELECT SQL_CALC_FOUND_ROWS * FROM table LIMIT 0, 20;
// I don´t actually use * but just select the columns I need...
// second query
SELECT FOUND_ROWS();
I´ve never noticed any performance degradation because of the second query but I guess you will have to measure that if you want to be sure.
By the way, I use this in PDO, I haven´t tried it in plain MySQL.
Why store it in a session variable? Will the result change per user? I'd rather store it in a user cache like APC or memcached, choose the cache key wisely, and then clear it when inserting or deleting a record related to the query.
A good way to do this would be to use an ORM that does it for you, like Doctrine, which has a result cache.
To get the count, I know that using COUNT(*) is not worse than using COUNT(id). (question: Is it even better?)
EDIT: interesting article about this on the MySQL performance blog
Most likely foo has a PRIMARY KEY index defined on the id column. Indexed COUNT() queries are usually quite easy on the DB.
However, if you want to go the extra mile, another option would be to insert a special hook into code that deals with inserting and deleting rows into foo. Have it write the number of total records into a protected file after each insert/update and read it from there. If every successful insert/update gets accounted for, the number in the protected file is always up-to-date.
As some of you may know, use of the LIMIT keyword in MySQL does not preclude it from reading the preceding records.
For example:
SELECT * FROM my_table LIMIT 10000, 20;
Means that MySQL will still read the first 10,000 records and throw them away before producing the 20 we are after.
So, when paginating a large dataset, high page numbers mean long load times.
Does anyone know of any existing pagination class/technique/methodology that can paginate large datasets in a more efficient way i.e. that does not rely on the LIMIT MySQL keyword?
In PHP if possible as that is the weapon of choice at my company.
Cheers.
First of all, if you want to paginate, you absolutely have to have an ORDER BY clause. Then you simply have to use that clause to dig deeper in your data set. For example, consider this:
SELECT * FROM my_table ORDER BY id LIMIT 20
You'll have the first 20 records, let's say their id's are: 5,8,9,...,55,64. Your pagination link to page 2 will look like "list.php?page=2&id=64" and your query will be
SELECT * FROM my_table WHERE id > 64 ORDER BY id LIMIT 20
No offset, only 20 records read. It doesn't allow you to jump arbitrarily to any page, but most of the time people just browse the next/prev page. An index on "id" will improve the performance, even with big OFFSET values.
A solution might be to not use the limit clause, and use a join instead -- joining on a table used as some kind of sequence.
For more informations, on SO, I found this question / answer, which gives an example -- that might help you ;-)
There are basically 3 approaches to this, each of which have their own trade-offs:
Send all 10000 records to the client, and handle pagination client-side via Javascript or the like. Obvious benefit is that only a single query is necessary for all of the records; obvious downside is that if the record size is in any way significant, the size of the page sent to the browser will be of proportionate size - and the user might not actually care about the full record set.
Do what you're currently doing, namely SQL LIMIT and grab only the records you need with each request, completely stateless. Benefit in that it only sends the records for the page currently requested, so requests are small, downsides in that a) it requires a server request for each page, and b) it's slower as the number of records/pages increases for later pages in the result, as you mentioned. Using a JOIN or a WHERE clause on a monotonically increasing id field can sometimes help in this regard, specifically if you're requesting results from a static table as opposed to a dynamic query.
Maintain some sort of state object on the server which caches the query results and can be referenced in future requests for a limited period of time. Upside is that it has the best query speed, since the actual query only needs to run once; downside is having to manage/store/cleanup those state objects (especially nasty for high-traffic websites).
SELECT * FROM my_table LIMIT 10000, 20;
means show 20 records starting from record # 10000 in the search , if ur using primary keys in the where clause there will not be a heavy load on my sql
any other methods for pagnation will take real huge load like using a join method
I'm not aware of that performance decrease that you've mentioned, and I don't know of any other solution for pagination however a ORDER BY clause might help you reduce the load time.
Best way is to define index field in my_table and for every new inserted row you need increment this field. And after all you need to use WHERE YOUR_INDEX_FIELD BETWEEN 10000 AND 10020
It will much faster.
some other options,
Partition the tables per each page so ignore the limit
Store the results into a session (a good idea would be to create a hash of that data using md5, then using that cache the session per multiple users)