Suggestion for a efficient Pagination tutorial?

Suggestion for a efficient Pagination tutorial? - php

I have looped data from mysql, and it's a pretty long list. What's the most efficient way to do pagination? Currently, I am looking at this one: http://www.evolt.org/node/19340
Feel free to recommend a better one, thanks !

Rather than fetching everything from the DB like in the article, you could just SELECT the rows you can actually display - ie. if you have 10 items per page, just select 10 - and then selecting the total amount of rows. If the DB is large this can be much more efficient even though it's two queries.

Like Sorin Mocanu said, if you want to order the result by some criteria, maybe modified time or some frequency, then sorting will be a big performance penalty. Even though you only need 10 records, you still need to sort all(maybe millions of) records, unless you make sure index is used and is used correctly.
Here is an excellent article regarding pagination with mysql:
http://www.percona.com/ppc2009/PPC2009_mysql_pagination.pdf
Or from mysql website:
http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html

I've been using the example in the book Wicked Cool PHP as my starting point. Very neat and well explained IMO.

You want something like
SELECT * FROM your_table WHERE condition = true ORDER BY some_field LIMIT 100, 10
Where 100 is the number of records to skip and 10 is the number of rows to retrieve.
Make sure you have an index covering condition and the order criteria fields if you want to have the maximum performance.

This is a really nice function/class to have as part of your standard library. I would strongly recommend you roll your own along these lines:
Query to work out total items (rows).
Code to work out upper & lower limit based on number of items you want to display per page.
Second query LIMIT'ed accordingly.
I'd post some code, but that would take the fun out of it :)

Related

performance issue from 5 queries in one page

As i am a junior PHP Developer growing day by day stuck in a performance problem described here:
I am making a search engine in PHP ,my database has one table with 41 column and million's of rows obviously it is a very large dataset. In index.php i have a form for searching data.When user enters search keyword and hit submit the action is on search.php with results.The query is like this.
SELECT * FROM TABLE WHERE product_description LIKE '%mobile%' ORDER BY id ASC LIMIT 10
This is the first query.After result shows i have to run 4 other query like this:
SELECT DISTINCT(weight_u) as weight from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country_unit) as country_unit from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country) as country from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(hs_code) as hscode from TABLE WHERE product_description LIKE '%mobile%'
These queries are for FILTERS ,the problem is this when i submit search button ,all queries are running simultaneously at the cost of Performance issue,its very slow.
Is there any other method to fetch weight,country,country_unit,hs_code speeder or how can achieve it.
The same functionality is implemented here,Where the filter bar comes after table is filled with data,How i can achieve it .Please help
Full Functionality implemented here.
I have tried to explain my full problem ,if there is any mistake please let me know i will improve the question,i am also new to stackoverflow.

Firstly - are you sure this code is working as you expect it? The first query retrieves 10 records matching your search term. Those records might have duplicate weight_u, country_unit, country or hs_code values, so when you then execute the next 4 queries for your filter, it's entirely possible that you will get values back which are not in the first query, so the filter might not make sense.
if that's true, I would create the filter values in your client code (PHP)- finding the unique values in 10 records is going to be quick and easy, and reduces the number of database round trips.
Finally, the biggest improvement you can make is to use MySQL's fulltext searching features. The reason your app is slow is because your search terms cannot use an index - you're wild-carding the start as well as the end. It's like searching the phonebook for people whose name contains "ishra" - you have to look at every record to check for a match. Fulltext search indexes are designed for this - they also help with fuzzy matching.

I'll give you some tips that will show useful in many situations when querying a large dataset, or mostly any dataset.
If you can list the fields you want instead of querying for '*' is a better practice. The weight of this increases as you have more columns and more rows.
Always try to use the PK's to look for the data. The more specific the filter, the less it will cost.
An index in this kind of situation would come pretty handy, as it will make the search more agile.
LIKE queries are generally pretty slow and resource heavy, and more in your situation. So again, the more specific you are, the better it will get.
Also add, that if you just want to retrieve data from this tables again and again, maybe a VIEW would fit nicely.
Those are just some tips that came to my mind to ease your problem.
Hope it helps.

When should I consider saving the total count in a field?

For example, if I have to count the comments belonging to an article, it's obvious I don't need to cache the comments total.
But what if I want to paginate a gallery (WHERE status = 1) containing 1 million photos. Should I save that in a table called counts or SELECT count(id) as total every time is fine?
Are there other solutions?
Please advise. Thanks.

For MySQL, you don't need to store the counts, you can use SQL_CALC_FOUND_ROWS to avoid two queries.
E.g.,
SELECT SQL_CALC_FOUND_ROWS *
FROM Gallery
WHERE status = 1
LIMIT 10;
SELECT FOUND_ROWS();
From the manual:
In some cases, it is desirable to know how many rows the statement
would have returned without the LIMIT, but without running the
statement again. To obtain this row count, include a
SQL_CALC_FOUND_ROWS option in the SELECT statement, and then invoke
FOUND_ROWS() afterward.
Sample usage here.

It depends a bit on the amount of queries that are done on that table with 1 million records. Consider just taking care of good indexes, especially also multi-column indexes (because they are easily forgotton: here. That will do a lot. And, be sure the queries become cached also well on your server.
If you use this column very regular, consider saving it (if it can't be cached by MySQL), as things could become slow. But most of the times good indexing will take care of it.
Best try: setup some tests to find out if a query can still be fast and performance is not dropping when you execute it a lot of times in a row.
EXPLAIN [QUERY]
Use that command (in MySQL) to get information about the way the query is performed and if it can be improved.

Doing the count every time would be OK.
During paging, you can use SQL_CALC_FOUND_ROWS anyway
Note:
A denormalied count will become stale
No-one will page so many items

Tracking a total count of items over a series of paged results

What is the ideal way to keep track of the total count of items when dealing with paged results?
This seems like a simple question at first but it is slightly more complicated (to me... just bail now if you find this too stupid for words) when I actually start thinking about how to do it efficiently.
I need to get a count of items from the database. This is simple enough. I can then store this count in some variable (a $_SESSION variable for instance). I can check to see if this variable is set and if it isn't, get the count again. The trick part is deciding what is the best way to determine when I need to get a new count. It seems I would need to get a new count if I have added/deleted items to the total or if I am reloading or revisiting the grid.
So, how would I decide when to clear this $_SESSION variable? I can see clearing it and getting a new count after an update/delete (or even adding or subtracting to it to avoid the potentially expensive database hit) but (here comes the part I find tricky) what about when someone navigates away from the page or waits a variable amount of time before going to the next page of results or reloads the page?
Since we may be dealing with tens or hundreds of thousands of results, getting a count of them from the database could be quite expensive (right? Or is my assumption incorrect?). Since I need the total count to handle the total number of pages in the paged results... what's the most efficient way to handle this sort of situation and to persist it for... as long as might be needed?
BTW, I would get the count with an SQL query like:
SELECT COUNT(id) FROM foo;

I never use a session variable to store the total found in a query, I include the count in the regular query when I get the information and the count itself comes from a second query:
// first query
SELECT SQL_CALC_FOUND_ROWS * FROM table LIMIT 0, 20;
// I don´t actually use * but just select the columns I need...
// second query
SELECT FOUND_ROWS();
I´ve never noticed any performance degradation because of the second query but I guess you will have to measure that if you want to be sure.
By the way, I use this in PDO, I haven´t tried it in plain MySQL.

Why store it in a session variable? Will the result change per user? I'd rather store it in a user cache like APC or memcached, choose the cache key wisely, and then clear it when inserting or deleting a record related to the query.
A good way to do this would be to use an ORM that does it for you, like Doctrine, which has a result cache.
To get the count, I know that using COUNT(*) is not worse than using COUNT(id). (question: Is it even better?)
EDIT: interesting article about this on the MySQL performance blog

Most likely foo has a PRIMARY KEY index defined on the id column. Indexed COUNT() queries are usually quite easy on the DB.
However, if you want to go the extra mile, another option would be to insert a special hook into code that deals with inserting and deleting rows into foo. Have it write the number of total records into a protected file after each insert/update and read it from there. If every successful insert/update gets accounted for, the number in the protected file is always up-to-date.

How to efficiently paginate large datasets with PHP and MySQL?

As some of you may know, use of the LIMIT keyword in MySQL does not preclude it from reading the preceding records.
For example:
SELECT * FROM my_table LIMIT 10000, 20;
Means that MySQL will still read the first 10,000 records and throw them away before producing the 20 we are after.
So, when paginating a large dataset, high page numbers mean long load times.
Does anyone know of any existing pagination class/technique/methodology that can paginate large datasets in a more efficient way i.e. that does not rely on the LIMIT MySQL keyword?
In PHP if possible as that is the weapon of choice at my company.
Cheers.

First of all, if you want to paginate, you absolutely have to have an ORDER BY clause. Then you simply have to use that clause to dig deeper in your data set. For example, consider this:
SELECT * FROM my_table ORDER BY id LIMIT 20
You'll have the first 20 records, let's say their id's are: 5,8,9,...,55,64. Your pagination link to page 2 will look like "list.php?page=2&id=64" and your query will be
SELECT * FROM my_table WHERE id > 64 ORDER BY id LIMIT 20
No offset, only 20 records read. It doesn't allow you to jump arbitrarily to any page, but most of the time people just browse the next/prev page. An index on "id" will improve the performance, even with big OFFSET values.

A solution might be to not use the limit clause, and use a join instead -- joining on a table used as some kind of sequence.
For more informations, on SO, I found this question / answer, which gives an example -- that might help you ;-)

There are basically 3 approaches to this, each of which have their own trade-offs:
Send all 10000 records to the client, and handle pagination client-side via Javascript or the like. Obvious benefit is that only a single query is necessary for all of the records; obvious downside is that if the record size is in any way significant, the size of the page sent to the browser will be of proportionate size - and the user might not actually care about the full record set.
Do what you're currently doing, namely SQL LIMIT and grab only the records you need with each request, completely stateless. Benefit in that it only sends the records for the page currently requested, so requests are small, downsides in that a) it requires a server request for each page, and b) it's slower as the number of records/pages increases for later pages in the result, as you mentioned. Using a JOIN or a WHERE clause on a monotonically increasing id field can sometimes help in this regard, specifically if you're requesting results from a static table as opposed to a dynamic query.
Maintain some sort of state object on the server which caches the query results and can be referenced in future requests for a limited period of time. Upside is that it has the best query speed, since the actual query only needs to run once; downside is having to manage/store/cleanup those state objects (especially nasty for high-traffic websites).

SELECT * FROM my_table LIMIT 10000, 20;
means show 20 records starting from record # 10000 in the search , if ur using primary keys in the where clause there will not be a heavy load on my sql
any other methods for pagnation will take real huge load like using a join method

I'm not aware of that performance decrease that you've mentioned, and I don't know of any other solution for pagination however a ORDER BY clause might help you reduce the load time.

Best way is to define index field in my_table and for every new inserted row you need increment this field. And after all you need to use WHERE YOUR_INDEX_FIELD BETWEEN 10000 AND 10020
It will much faster.

some other options,
Partition the tables per each page so ignore the limit
Store the results into a session (a good idea would be to create a hash of that data using md5, then using that cache the session per multiple users)

What is the best way to paginate results in php

I need to display many pages of news in a site. Should I do the pagination in the database query using LIMIT or with the PHP script after getting all the results?

Use limit in SQL! Every time!
Otherwise you're throwing around considerably more data than you need to, which makes your scripts unnecessarily slow, and will lead to scalability problems as the amount of data in your tables increases.
Limit is your friend!

Use limit - you don't want to transfer masses of data from the database to the scripting engine if you can avoid it.

If you want only work with a DBMS that support this than do it on the DBMS. If you want support other DBMS in the future then ad a layer between that can handle depending on the current DBMS.

You can use some existing libraries to help you:
Pear::Pager can help with the output, and to limit the database traffic to only what you need, you can use a wrapper provided in the examples that come with it.
Here's a tutorial I just googled that has it all...

In addition to using LIMIT, I'd suggest using an explicit WHERE clause to set the offset, and order the results on that column. For example:
--- First page (showing first 50 records)
SELECT * FROM people ORDER BY id LIMIT 50
--- Second page
SELECT * FROM people WHERE id > 50 ORDER BY id LIMIT 50
This further limits the numbers of rows returned to those within the desired range. Using the WHERE approach (as opposed to a LIMIT clause with a separate offset, e.g. LIMIT 50,50) allows you to deal effectively with paging through records with other natural keys, e.g. alphabetically by name, or by date order.

Personally, I would use the query to do it. Obviously, that can change if your dealing with AJAX and such, but just doing a basic limit in the query and outputting the results is simple and efficient.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.