Select is much slower when selecting latest records - php

A table with about 70K records is displayed on a site, showing 50 records per page.
Pagination is done with limit offset,50 on the query, and the records can be ordered on different columns.
Browsing the latest pages (so the offset is around 60,000) makes the queries much slower than when browsing the first pages (about 10x)
Is this an issue of using the limit command?
Are there other ways to get the same results?

With large offsets, MySQL needs to browse more records.
Even if the plan uses filesort (which means that all records should be browsed), MySQL optimizes it so that only $offset + $limit top records are sorted, which makes it much more efficient for lower values of $offset.
The typical solution is to index the columns you are ordering on, record the last value of the columns and reuse it in the subsequent queries, like this:
SELECT *
FROM mytable
ORDER BY
value, id
LIMIT 0, 10
which outputs:
value id
1 234
3 57
4 186
5 457
6 367
8 681
10 366
13 26
15 765
17 345 -- this is the last one
To get to the next page, you would use:
SELECT *
FROM mytable
WHERE (value, id) > (17, 345)
ORDER BY
value, id
LIMIT 0, 10
, which uses the index on (value, id).
Of course this won't help with arbitrary access pages, but helps with sequential browsing.
Also, MySQL has certain issues with late row lookup. If the columns are indexed, it may be worth trying to rewrite your query like this:
SELECT *
FROM (
SELECT id
FROM mytable
ORDER BY
value, id
LIMIT $offset, $limit
) q
JOIN mytable m
ON m.id = q.id
See this article for more detailed explanations:
MySQL ORDER BY / LIMIT performance: late row lookups

It's how MySQL deals with limits. If it can sort on an index (and the query is simple enough) it can stop searching after finding the first offset + limit rows. So LIMIT 0,10 means that if the query is simple enough, it may only need to scan 10 rows. But LIMIT 1000,10 means that at minimum it needs to scan 1010 rows. Of course, the actual number of rows that need to be scanned depend on a host of other factors. But the point here is that the lower the limit + offset, the lower that the lower-bound on the number of rows that need to be scanned is...
As for workarounds, I would optimize your queries so that the query itself without the LIMIT clause is as efficient as possible. EXPLAIN is you friend in this case...

Related

MYSQL from the 20th row to the last row [duplicate]

I would like to construct a query that displays all the results in a table, but is offset by 5 from the start of the table. As far as I can tell, MySQL's LIMIT requires a limit as well as an offset. Is there any way to do this?
From the MySQL Manual on LIMIT:
To retrieve all rows from a certain
offset up to the end of the result
set, you can use some large number for
the second parameter. This statement
retrieves all rows from the 96th row
to the last:
SELECT * FROM tbl LIMIT 95, 18446744073709551615;
As you mentioned it LIMIT is required, so you need to use the biggest limit possible, which is 18446744073709551615 (maximum of unsigned BIGINT)
SELECT * FROM somewhere LIMIT 18446744073709551610 OFFSET 5
As noted in other answers, MySQL suggests using 18446744073709551615 as the number of records in the limit, but consider this: What would you do if you got 18,446,744,073,709,551,615 records back? In fact, what would you do if you got 1,000,000,000 records?
Maybe you do want more than one billion records, but my point is that there is some limit on the number you want, and it is less than 18 quintillion. For the sake of stability, optimization, and possibly usability, I would suggest putting some meaningful limit on the query. This would also reduce confusion for anyone who has never seen that magical looking number, and have the added benefit of communicating at least how many records you are willing to handle at once.
If you really must get all 18 quintillion records from your database, maybe what you really want is to grab them in increments of 100 million and loop 184 billion times.
Another approach would be to select an autoimcremented column and then filter it using HAVING.
SET #a := 0;
select #a:=#a + 1 AS counter, table.* FROM table
HAVING counter > 4
But I would probably stick with the high limit approach.
As others mentioned, from the MySQL manual. In order to achieve that, you can use the maximum value of an unsigned big int, that is this awful number (18446744073709551615). But to make it a little bit less messy you can the tilde "~" bitwise operator.
LIMIT 95, ~0
it works as a bitwise negation. The result of "~0" is 18446744073709551615.
You can use a MySQL statement with LIMIT:
START TRANSACTION;
SET #my_offset = 5;
SET #rows = (SELECT COUNT(*) FROM my_table);
PREPARE statement FROM 'SELECT * FROM my_table LIMIT ? OFFSET ?';
EXECUTE statement USING #rows, #my_offset;
COMMIT;
Tested in MySQL 5.5.44. Thus, we can avoid the insertion of the number 18446744073709551615.
note: the transaction makes sure that the variable #rows is in agreement to the table considered in the execution of statement.
I ran into a very similar issue when practicing LC#1321, in which I have to select all the dates but the first 6 dates are skipped.
I achieved this in MySQL with the help of ROW_NUMBER() window function and subquery. For example, the following query returns all the results with the first five rows skipped:
SELECT
fieldname1,
fieldname2
FROM(
SELECT
*,
ROW_NUMBER() OVER() row_num
FROM
mytable
) tmp
WHERE
row_num > 5;
You may need to add some more logics in the subquery, especially in OVER() to fit your need. In addition, RANK()/DENSE_RANK() window functions may be used instead of ROW_NUMBER() depending on your real offset logic.
Reference:
MySQL 8.0 Reference Manual - ROW_NUMBER()
Just today I was reading about the best way to get huge amounts of data (more than a million rows) from a mysql table. One way is, as suggested, using LIMIT x,y where x is the offset and y the last row you want returned. However, as I found out, it isn't the most efficient way to do so. If you have an autoincrement column, you can as easily use a SELECT statement with a WHERE clause saying from which record you'd like to start.
For example,
SELECT * FROM table_name WHERE id > x;
It seems that mysql gets all results when you use LIMIT and then only shows you the records that fit in the offset: not the best for performance.
Source: Answer to this question MySQL Forums. Just take note, the question is about 6 years old.
I know that this is old but I didnt see a similar response so this is the solution I would use.
First, I would execute a count query on the table to see how many records exist. This query is fast and normally the execution time is negligible. Something like:
SELECT COUNT(*) FROM table_name;
Then I would build my query using the result I got from count as my limit (since that is the maximum number of rows the table could possibly return). Something like:
SELECT * FROM table_name LIMIT count_result OFFSET desired_offset;
Or possibly something like:
SELECT * FROM table_name LIMIT desired_offset, count_result;
Of course, if necessary, you could subtract desired_offset from count_result to get an actual, accurate value to supply as the limit. Passing the "18446744073709551610" value just doesnt make sense if I can actually determine an appropriate limit to provide.
WHERE .... AND id > <YOUROFFSET>
id can be any autoincremented or unique numerical column you have...

MySQL Random rows in neighbourhood [duplicate]

This question already has answers here:
Doing a while / loop to get 10 random results
(3 answers)
Closed 9 years ago.
I have this table (PERSONS) with 25M rows:
ID int(10) PK
points int(6) INDEX
some other columns
I want to show the user 4 random rows which are somewhat close to each other in points. I found this query after some searching and tuning to generate random rows which is impressive fast:
SELECT person_id, points
FROM persons AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(person_id)
FROM persons)) AS id)
AS r2
WHERE r1.person_id>= r2.id and points > 0
ORDER BY r1.person_id ASC
LIMIT 4
So I query this in the PHP. Which gives me great and fast results (below 0.05 seconds when warmed up). But these rows are really just random (with at least 1 point since the points > 0). I would like to show some rows which are a little bit close, doesn't have to be every time, but let's say I do this query with limit 50 and than select a random row in PHP and the 3 closest rows (based on points) next to it. I would think you would need to sort the result, pick a random row and show the rows after/before it. But i have no idea how I can make this, since I am quite new to PHP.
Anyone suggestions, all feedback is welcome :)
Build an index on your points column (if it does not already exist), then perform your randomisation logic on that:
ALTER TABLE persons ADD INDEX (points);
SELECT person_id, points
FROM persons JOIN (
SELECT RAND() * MAX(points) AS pivot
FROM persons
WHERE points > 0
) t ON t.pivot <= points
ORDER BY points
LIMIT 4
Note that this approach will select the pivot using a uniform probability distribution over the range of points values; if points are very non-uniform, you can end up pivoting on some values a lot more often than others (thereby resulting in seemingly "non-random" outcomes).
To resolve that, you can select a random record by a more uniformly distributed column (maybe person_id?) and then use the points value of that random record as the pivot; that is, substitute the following for the subquery in the above statement:
SELECT points AS pivot
FROM persons JOIN (
SELECT FLOOR(
MIN(person_id)
+ RAND() * (MAX(person_id)-MIN(person_id))
) AS random
FROM persons
               WHERE  points > 0
) r ON r.random <= person_id
WHERE points > 0
ORDER BY person_id
LIMIT 1
Removing a subquery from it will drasticly improve the performance and caching so you could for example get list your IDs, put it in a file and then random from it (for example by reading random lines from file). This will improve it by a whole lot, as you can see if you will run EXPLAIN on this query and compare it by changing the query to load just data for the 4 (still random) ids.
I would suggest doing two separate sql queries in PHP and not join/subquery them. In many cases the optimizer can not simplify your query and has to perform each one separatly. So, in your case. if you have 1000 persons the optimizer will do the following wueries at worst case:
Get 1000 persons rows
Do Sub Select for each person which get's 1000 persons rows
Join 1000 persons with joined rows resulting in 1.000.000 rows
Filter all of them
In short:
1001 queries with 1.000.000 rows
My advice?
Perform two queries and NO joins or sub-selects as both (especially in combination have dramatic performance drops in most cases)
SELECT person_id, points
FROM persons
ORDER BY RAND() LIMIT 1
Now use the found points for your second query
SELECT person_id, points, ABS(points - <POINTS FROM ABOVE>) AS distance
FROM persons
ORDER BY distance ASC LIMIT 4

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

QUERY speed with limit and milion records

Hi i have a 7milion records db table for testing query speed.
I tested up my 2 queries which are the same query with different limit parametres:
query 1 -
SELECT *
FROM table
LIMIT 20, 50;
query 2 -
SELECT *
FROM table
LIMIT 6000000, 6000030;
query exec times are:
query 1 - 0.006 sec
query 2 - 5.500 sec
In both of these queries, I am fetching same number of records, but in the second case it's taking more time. Can someone please explain the reasons behind this?
Without looking into it too closely, my assumption is that this occurs because the first query only has to read to the 50th record to return results, whereas the second query has to read six million before returning results. Basically, the first query just shorts out quicker.
I would assume that this has an incredible amount to do with the makeup of the table - field types and keys, etc.
If a record is made up of fixed-length fields (e.g. CHAR vs. VARCHAR), then the DBMS can just calculate where the nth record starts and jumps there. If its variable length, then you would have to read the records to determine where the nth record starts. Similarly, I'd further assume that tables which have appropriate primary keys would be quicker to query than those without such keys.
I think the slowdown is tied to the fact you are using limits with offsets and are querying the table with no additional context for indexing. Its possible the first is just faster because it can get to the offset quicker.
It's the difference between returning 50 rows and 6000030 rows (or ~1million rows since you said there were only 7million rows).
With two arguments, the first argument specifies the offset of the
first row to return, and the second specifies the maximum number of
rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
http://dev.mysql.com/doc/refman/5.0/en/select.html
Also, I think you're looking for 30 row pages so your queries should be using 30 as the second parameter in the limit clause.
SELECT *
FROM table
LIMIT 20, 30;
SELECT *
FROM table
LIMIT 6000000, 30;

Is it possible to have 2 limits in a MySQL query?

Ok here is the situation (using PHP/MySQL) you are getting results from a large mysql table,
lets say your mysql query returns 10,000 matching results and you have a paging script to show 20 results per page, your query might look like this
So page 1 query
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse=something else
LIMIT 0,20
So page 2 query
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse=something else
LIMIT 20,40
Now you are grabbing 20 results at a time but there are a total of 10,000 rows that match your search criteria,
How can you only return 3,000 of the 10,000 results and still do your paging of 20 per page with a LIMIT 20 in your query?
I thought this was impossible but myspace does it on there browse page somehow, I know they aren't using php/mysql but how can it be achieved?
UPDATE
I see some people have replied with a couple of methods, it seems none of these would actually improve the performance by limiting the number to 3000?
Program your PHP so that when it finds itself ready to issue a query that ends with LIMIT 3000, 20 or higher, it would just stop and don't issue the query.
Or I am missing something?
Update:
MySQL treats LIMIT clause nicely.
Unless you have SQL_CALC_FOUND_ROWS in your query, MySQL just stops parsing results, sorting etc. as soon as it finds enough records to satisfy your query.
When you have something like that:
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse='something else'
LIMIT 0, 20
, MySQL will fetch first 20 records that satisfy the criteria and stop.
Doesn't matter how many records match the criteria: 50 or 1,000,000, performance will be the same.
If you add an ORDER BY to your query and don't have an index, then MySQL will of course need to browse all the records to find out the first 20.
However, even in this case it will not sort all 10,000: it will have a "running window" of top 20 records and sort only within this window as soon as it finds a record with value large (or small) enough to get into the window.
This is much faster than sorting the whole myriad.
MySQL, however, is not good in pipelining recorsets. This means that this query:
SELECT column
FROM (
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse='something else'
LIMIT 3000
)
LIMIT 0, 20
is worse performance-wise than the first one.
MySQL will fetch 3,000 records, cache them in a temporary table (or in memory) and apply the outer LIMIT only after that.
Firstly, the LIMIT paramters are Offset and number of records, so the second parameter should always be 20 - you don't need to increment this.
Surely if you know the upper limit of rows you want to retrieve, you can just put this into your logic which runs the query, i.e. check that Offset + Limit <= 3000
As Sohnee said, or (depending on your requirements) you can get all the 3000 records by SQL and then use array_slice in php to get chunks of the array.
You could achieve this with a subquery...
SELECT name FROM (
SELECT name FROM tblname LIMIT 0, 3000
) `Results` LIMIT 20, 40
Or with a temporary table, whereby you select all 3000 rows into a temp table then page by the temporary row id, which will be sequential.
You can specify the limit as a function of the page number (20*p, 20*p+2) in your php code, and limit the value of the page number to 150.
Or you could get the 3000 records and them using jquery tabs, split the records on 20 per page.

Categories