I never really considered this(pagination) as an issue until lately. When I sat down and zeroed in on it, I found myself facing plenty of problems.
What I am into is a basic contacts management system wherein a user can add/update/delete/search contacts. The search part is where I need the pagination to be implemented effectively.
What I have in mind (with +ve and -ve points)
I can specify pageNo and offset while POSTing to my search.php page. This page will fire a simple MySQL query to retrieve the results. Since the number of rows can pretty much run in thousands, I need to paginate it. Quite simple, but I need to fire the same query again and again for every different page. Meaning, when a user goes from page1 to page2, the same MySQL query will be fired(of course with a different offset), which is something I feel is redundant, and am trying to avoid.
Then I thought of capturing the entire set of result, and storing it into $_SESSION, but in this case, what if the results are just huge? Will it affect performance in any way?
On similar lines like the second point, I thought of writing out the results on to a file, which is plain crap! (I just put it here, as a point. I know this is REAL bad way of doing things.)
My Questions:
A. Which of the above methods do I implement? Which one is better? Are there any other methods? I have googled it, but I find that most of the examples follow point1 above.
B. My questions for point1: How can we rely on the order of the mysql results? Suppose, the user navigates to page2 after some time, how can we be sure, that during the second time, the records of the first page arent repeated? (Because, we are doing a fresh query)..
C. What exactly is a MySQL resource? I understand that a mysql_query(..) returns a resource. Is it global in the sense that, it maintains the state between different calls to PHP script? (I can maintain the resource in a $_SESSION).
Thanks a million! :-)
PS: I know this is a pretty long question. I just tried to put across, in a concise way, whats going around in my head.
Use your first suggestion. The one with offsets. It's the "standard" way of doing pagination. Putting the whole result set into session would be a bad idea, since every user would have his own private copy of the data. If you hit performance problems you can always add caching (memcache) which will benefit all users accessing the data.
MySQL will always result your data the same way. The only way that a record from page 1 would appear on page 2 is if a new record was inserted between the time that user navigates from page 1 to page 2. In other words: you have nothing to worry about.
A resource is MySQL's case is a pointer of sorts that points to the result set. You can then manipulate that (fetching data row by row, counting the number of rows returned etc). It is not global.
A. First one, of course. there are other methods, like for the every thing in the Earth, but like for the every thing on the Earth one have to use most usual and generic way first, just because they have to get familiar with it and because it will suit you for sure, as it suits other webmasters.
Also note that your other proposed methods are not among sensible ones.
B. yes, records do move across pages. Nothing bad in that.
C. Nothing in PHP maintains it's state between calls. No resource can be saved in a session. go for offset pagination.
From my experience (which is not much), i usually used the first method, because each time you go to another page you will always get an updated data from mysql. Yes, if you're using order by last_updated_time then the result will move across pages.
But i think that's not what you have in your mind. As you mention in your third question, perhaps you want to have some kind of buffer for your results, but it means you'll have to create the buffer for every result (that's the reason you mention about using file to store mysql result).
probably this is the answer that you're looking for (if that could consider as an answer at all :LOL), but my purpose was just trying to give some perspective.
When constructing your SQL you can do something like the following (0 is the offset, 10 is how many rows to return)
SELECT * FROM `your_table` LIMIT 0, 10
This will display the first 10 results from the database.
Alternative syntax, 3 queries, showing the first 30 results 1-10,11-20,21-30.
SELECT * FROM `your_table` LIMIT 10 OFFSET 0
SELECT * FROM `your_table` LIMIT 10 OFFSET 10
SELECT * FROM `your_table` LIMIT 10 OFFSET 20
Edit:
Okay, to clarify, option 1 is your best bet. Pass in page number. Limit is the same each query, and $offset = ($pageNum - 1 ) * 10;.
You will need an ORDER BY clause. However, if the contents of the database change between page loads there a user might notice discrepancies. It really depends on how frequently your data changes.
I've not tried to store the result of a mysql_query() in session. I would suspect not the way you are thinking of using it. As when the script ends you can consider mysql_close() to be called implicitly, and resources destroyed.
Related
Sorry this may be a noob question, but I don't know how to search about this.
User case
A full-site Search function : when the user input keyword and submit the form, the system should be search in both title & content of forum, blog, products. The search result of all those type of page should display in one single list with pagination. The user can also chose to ordering the result by relevance or recency.
What I did
I am using LMAP. I have data tables for those three page type , and I have make the title & content column as index Key.
I knew that join table is a very bad idea, so I make three separate query for searching the forum, blog, and products. I get all the data into PHP, make them into array, write a function for making a relevance value for every row of search result. For recency, there is "updateDate" column in all those table, so it is ok.
Now I have three nice array. I can implode() them and sort() them easily. I can also render pagination by array_slice().
What make me Frown
Unnecessary performance waste. Yes, what I did is able to do all the things in user case , but --- I don't know how to do (I am a beginner), --- but I am sure the performance can be a lot better.
after the first time query, all the data we need has already get from database. but with my solution, whenever user click to another page of search result, or change the "sort by", the php will start over again, and do the [sql query, relevance function, implode()] again. can I someHow store the result array in someWhere , so the system can save some energy for next user action ?
most of the user will not click on all page of search result. I will guess 90% of user will not keep looking after 10th page, which mean (may be) the first 200 recorded. So, can I do any thing to stop the sql query somewhere instead of all result ?
furthermore, while the traffic grow, there may be some keywords be come common and repeated searching lots of time, what can I do reduce the repeat of those search ? (pls slap me if you think i am thinking too much)
Thank you for reading these, Please correct me if my concept is incorrect, or tell me if I miss something to notice in this user case. Thank you and may God's love be with you.
Edit : I am not using any php framework.
To get you the full story is probably like writing a book. Here are some extracted thoughts:
fully blown page indicators cost you extra data set counts - just present "Next" buttons which can be made up by select ... limit [nr_of_items_per_page+1] and then if(isset($result[nr_of_items_per_page+1])) output next button
these days net traffic costs are not as high as ten years ago and users demand for more. Increase your nr_of_items_per_page to 100, 200, 500 (depending on the data size per record)
Zitty Yams comments work out - I have loaded >10000 records in one go to a client and presented those piece by piece - it just rocks - eg. a list of 10000 names with 10 characters avg makes just 100000 Bytes. Most of the images you get in the net are bigger then that. Of cause there are limits...
php caching via $SESSION works as well - however keep in mind that each Byte to be reserved for php cannot be dedicated to the database (at least not on a shared server). As long as not all data in the database fit into memory, in most cases it is more efficient to extend database memory rather than increasing php caches or os caches.
I have a very common problem, but cannot seem to find a good answer for it.
I need to get a page's worth of rows from a table, as well as enough info to paginate this data. So in general I need a very rough estimate of the total number of rows (in general All I need to know is ceil(count()/50)).
So count() is really overkill. And I already have a SELECT * FROM table LIMIT 0, 50 running, so if it can be appended to this command all the better.
I have heard about SQL_CALC_FOUND_ROWS. But I also heard that it is not particularly more efficient than just doing the count yourself. "Unfortunately, using SQL_CALC_FOUND_ROWS has the nasty consequence of blowing away any LIMIT optimization that might happen".
So, all in all, I kindof think using MySQL's row estimate is the way to go. But I do not know how to do that. Or how off this estimate might be.
Note1: In my situation most of the tables I am working with are just updated a few times a day, not all the time.
Note2: I am using PDO with php.
Another interesting idea I found:
A better design is to convert the pager to a “next” link. Assuming there are 20 results per page, the query should then use a LIMIT of 21 rows and display only 20. If the 21st row exists in the results, there’s a next page, and you can render the “next” link.
If you don't need the total count of the table it's indeed the fastests solution.
It is an old topic that was beaten to death. Many times. Count is the fastest way to get number of rows in a typical table.
But if you never delete anything from it (which is a weird assumption, but will work in some cases.), then you could simply get ID of the last row (which may be faster, but not necessarily). This would also fit your estimations need, as most likely won't be correct.
But then again, if you are using for example myisam, then nothing beats count (which is true for most cases).
Fairly simple concept, making an extremely basic message board system and I want users to have a post count. Now I was debating on whether or not to have a tally in their row that is added each time a post by them is created, or subtracted by one each time a post of theirs is deleted. However I'm sure that performing a count query when the post count is requested would be more accurate due to unforseen circumstances (say a thread gets deleted and it doesn't lower their tally properly), however this seems like it would be less efficient to run a query EVERY time their post count is loaded, especially in the case of them having 10 posts on the same page and it lists their post count each post.
Thoughts/Advice?
Thanks
post_count should definitely be a column in the user table. the little extra effort to get this right is minimal compared to the additional database load you produce with running a few count query on every thread view.
if you use some sort of orm or database abstraction, it should be quite simple to add the counting to their create / delete filters.
Just go for count each time. Unless your load is going to be astronomical, COUNT shouldn't be a problem, and reduces the amount of effort involved in saving and updating data.
Just make sure you put an index on your user_id column, so that you can filter the data with a WHERE clause efficiently.
If you get to the point where this doesn't do it for you, you can implement caching strategies, but given that it's a simple message board, you shouldn't encounter that problem for a while.
EDIT:
Just saw your second concern about the same query repeating 10 times on a page. Don't do that :) Just pull the data once and store it in a variable. No need to repeat the same query multiple times.
Just use COUNT. It will be more accurate and will avoid any possible missed cases.
The case you mention of displaying the post count multiple times on a page won't be a problem unless you have an extremely high traffic site.
In any other case, the query cache of your database server will execute the query, then keep a cache of the response until any of the tables that the query relies on change. In the course of a single page load, nothing else should change, so you will only be executing the query once.
If you really need to worry about it, you can just cache it yourself in a variable and just execute the query once.
Generally speaking, your database queries will always be extremely efficient compared to your app logic. As such, the time wasted on maintaining the post_count in the user table will most probably be far far less than is needed to run a query to update the user table whenever a comment is posted.
Also, it is usually considered bad DB structure to have a field such as you are describing.
There are arguments for both, so ultimately it depends on the volume of traffic you expect. If your code is solid and properly layered, you can confidently keep a row count in your users' record without worrying about losing accuracy, and over time, count() will potentially get heavy, but updating a row count also adds overhead.
For a small site, it makes next to no difference, so if (and only if) you're a stickler for efficiency, the only way to get a useful answer is to run some benchmarks and find out for yourself. One way or another, it's going to be 3/10ths of 2/8ths of diddley squat, so do whatever feels right :)
It's totally reasonable to store the post counts in a column in your Users table. Then, to ensure that your post counts don't become increasingly inaccurate over time, run a scheduled task (e.g. nightly) to update them based on your Posts table.
Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).
I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...
"Oh okay, we'll only face this problem when we have
(a) a lot of users, or
(b) it'll only happen when two people try doing something
at _exactly_ the same time"
I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.
Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...
Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!
The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.
It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.
I kind of agree with #Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.
And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.
And right from the MySQL documentation on the subject:
You can generate sequences without calling LAST_INSERT_ID(), but the utility of
using the function this way is that the ID value is maintained in the server as
the last automatically generated value. It is multi-user safe because multiple
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by
other clients that generate their own sequence values.
Instead of using SELECT MAX(id) you shoud do as the documentation says :
Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query
Even so, neither SELECT MAX(id) nor mysql_insert_id() are "thread-safe" and you still could have race condition. The best option you have is to lock tables before and after your requests. Or even better use transactions.
I don't have the math for it, but I would point out that response (a) is a little silly. Doesn't the company want a lot of users? Isn't that a goal? That response implies that they'd rather solve the problem twice, possibly at great expense the second time, instead of solve it once correctly the first time.
This will happen when someone has added something to the table between one insert and that query running. So to answer your question, two people using the system has the potential for things to go wrong.
At least using the LAST_INSERT_ID() will get the last ID for a particular resource so it won't matter how many new entries have been added in between.
In addition to the risk of getting the wrong ID value returned, there's also the additional database query overhead of SELECT MAX(id), and it's more PHP code to actually execute than a simple mysql_insert_id(). Why deliberately code something to be slow?
I am trying to paginate the results of an SQL query for use on a web page. The language and the database backend are PHP and SQLite.
The code I'm using works something like this (page numbering starts at 0)
http://example.com/table?page=0
page = request(page)
per = 10 // results per page
offset = page * per
// take one extra record so we know if a next link is needed
resultset = query(select columns from table where conditions limit offset, per + 1)
if(page > 0) show a previous link
if(count(resultset) > per) show a next link
unset(resultset[per])
display results
Are there more efficient ways to do pagination than this?
One problem that I can see with my current method is that I must store all 10 (or however many) results in memory before I start displaying them. I do this because PDO does not guarantee that the row count will be available.
Is it more efficient to issue a COUNT(*) query to learn how many rows exist, then stream the results to the browser?
Is this one of those "it depends on the size of your table, and whether the count(*) query requires a full table scan in the database backend", "do some profiling yourself" kind of questions?
I've opted to go with the COUNT(*) two query method, because it allows me to create a link directly to the last page, which the other method does not allow. Performing the count first also allows me to stream the results, and so should work well with higher numbers of records with less memory.
Consistency between pages is not an issue for me. Thank you for your help.
There are several cases where I have a fairly complex (9-12 table join) query, returning many thousands of rows, which I need to paginate. Obviously to paginate nicely, you need to know the total size of the result. With MySQL databases, using the SQL_CALC_FOUND_ROWS directive in the SELECT can help you achieve this easily, although the jury is out on whether that will be more efficient for you to do.
However, since you are using SQLite, I recommend sticking with the 2 query approach. Here is a very concise thread on the matter.
i'd suggest just doing the count first. a count(primary key) is a very efficient query.
I doubt that it will be a problem for your users to wait for the backend to return ten rows. (You can make it up to them by being good at specifying image dimensions, make the webserver negotiate compressed data transfers when possible, etc.)
I don't think that it will be very useful for you to do a count(*) initially.
If you are up to some complicated coding: When the user is looking at page x, use ajax-like magic to pre-load page x+1 for improved user experience.
A general note about pagination:
If the data changes while the user browses through your pages, it may be a problem if your solution demands a very high level of consistency. I've writte a note about that elsewhere.