Ok here is the situation (using PHP/MySQL) you are getting results from a large mysql table,
lets say your mysql query returns 10,000 matching results and you have a paging script to show 20 results per page, your query might look like this
So page 1 query
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse=something else
LIMIT 0,20
So page 2 query
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse=something else
LIMIT 20,40
Now you are grabbing 20 results at a time but there are a total of 10,000 rows that match your search criteria,
How can you only return 3,000 of the 10,000 results and still do your paging of 20 per page with a LIMIT 20 in your query?
I thought this was impossible but myspace does it on there browse page somehow, I know they aren't using php/mysql but how can it be achieved?
UPDATE
I see some people have replied with a couple of methods, it seems none of these would actually improve the performance by limiting the number to 3000?
Program your PHP so that when it finds itself ready to issue a query that ends with LIMIT 3000, 20 or higher, it would just stop and don't issue the query.
Or I am missing something?
Update:
MySQL treats LIMIT clause nicely.
Unless you have SQL_CALC_FOUND_ROWS in your query, MySQL just stops parsing results, sorting etc. as soon as it finds enough records to satisfy your query.
When you have something like that:
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse='something else'
LIMIT 0, 20
, MySQL will fetch first 20 records that satisfy the criteria and stop.
Doesn't matter how many records match the criteria: 50 or 1,000,000, performance will be the same.
If you add an ORDER BY to your query and don't have an index, then MySQL will of course need to browse all the records to find out the first 20.
However, even in this case it will not sort all 10,000: it will have a "running window" of top 20 records and sort only within this window as soon as it finds a record with value large (or small) enough to get into the window.
This is much faster than sorting the whole myriad.
MySQL, however, is not good in pipelining recorsets. This means that this query:
SELECT column
FROM (
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse='something else'
LIMIT 3000
)
LIMIT 0, 20
is worse performance-wise than the first one.
MySQL will fetch 3,000 records, cache them in a temporary table (or in memory) and apply the outer LIMIT only after that.
Firstly, the LIMIT paramters are Offset and number of records, so the second parameter should always be 20 - you don't need to increment this.
Surely if you know the upper limit of rows you want to retrieve, you can just put this into your logic which runs the query, i.e. check that Offset + Limit <= 3000
As Sohnee said, or (depending on your requirements) you can get all the 3000 records by SQL and then use array_slice in php to get chunks of the array.
You could achieve this with a subquery...
SELECT name FROM (
SELECT name FROM tblname LIMIT 0, 3000
) `Results` LIMIT 20, 40
Or with a temporary table, whereby you select all 3000 rows into a temp table then page by the temporary row id, which will be sequential.
You can specify the limit as a function of the page number (20*p, 20*p+2) in your php code, and limit the value of the page number to 150.
Or you could get the 3000 records and them using jquery tabs, split the records on 20 per page.
Related
i was using order by rand() to generate random rows from database without any issue but i reaalised that as the database size increase this rand() causes heavy load on server so i was looking for an alternative and i tried by generating one random number using php rand() function and put that as id in mysql query and it was very very fast since mysql was knowing the row id
but the issue is in my table all numbers are not availbale.for example 1,2,5,9,12 like that.
if php rand() generate number 3,4 etc the query will be blank as there is no id with number 3 , 4 etc.
what is the best way to generate random numbers preferable from php but it should generate the available no in that table so it must check that table.please advise.
$id23=rand(1,100000000);
SELECT items FROM tablea where status='0' and id='$id23' LIMIT 1
the above query is fast but generate sometimes no which is not availabel in database.
SELECT items FROM tablea where status=0 order by rand() LIMIT 1
the above query is too slow and causes heavy load on server
First of, all generate a random value from 1 to MAX(id), not 100000000.
Then there are at least a couple of good solutions:
Use > not =
SELECT items FROM tablea where status='0' and id>'$id23' LIMIT 1
Create an index on (status,id,items) to make this an index-only query.
Use =, but just try again with a different random value if you don't find a hit. Sometimes it will take several tries, but often it will take only one try. The = should be faster since it can use the primary key. And if it's faster and gets it in one try 90% of the time, that could make up for the other 10% of the time when it takes more than one try. Depends on how many gaps you have in your id values.
Use your DB to find the max value from the table, generate a random number less than or equal to that value, grab the first row in which the id is greater than or equal to your random number. No PHP necessary.
SELECT items
FROM tablea
WHERE status = '0' and
id >= FLOOR(1 + RAND() * (SELECT MAX(id) FROM tablea))
LIMIT 1
You are correct, ORDER BY RAND() is not good solution if you are dealing with large datasets. Depending how often it needs to be randomized, what you can do is generate a column with a random number and then update that number at some predefined interval.
You would take that column and use it as your sort index. This works well for a heavy read environment and produces predicable random order for a certain period of time.
A possible solution is to use limit:
$id23=rand(1,$numberOfRows);
SELECT items FROM tablea where status='0' LIMIT $id23 1
This wont produce any missed rows (but as hek2mgl mentioned) requires knowing the number of rows in the select.
I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4
Hi i have a 7milion records db table for testing query speed.
I tested up my 2 queries which are the same query with different limit parametres:
query 1 -
SELECT *
FROM table
LIMIT 20, 50;
query 2 -
SELECT *
FROM table
LIMIT 6000000, 6000030;
query exec times are:
query 1 - 0.006 sec
query 2 - 5.500 sec
In both of these queries, I am fetching same number of records, but in the second case it's taking more time. Can someone please explain the reasons behind this?
Without looking into it too closely, my assumption is that this occurs because the first query only has to read to the 50th record to return results, whereas the second query has to read six million before returning results. Basically, the first query just shorts out quicker.
I would assume that this has an incredible amount to do with the makeup of the table - field types and keys, etc.
If a record is made up of fixed-length fields (e.g. CHAR vs. VARCHAR), then the DBMS can just calculate where the nth record starts and jumps there. If its variable length, then you would have to read the records to determine where the nth record starts. Similarly, I'd further assume that tables which have appropriate primary keys would be quicker to query than those without such keys.
I think the slowdown is tied to the fact you are using limits with offsets and are querying the table with no additional context for indexing. Its possible the first is just faster because it can get to the offset quicker.
It's the difference between returning 50 rows and 6000030 rows (or ~1million rows since you said there were only 7million rows).
With two arguments, the first argument specifies the offset of the
first row to return, and the second specifies the maximum number of
rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
http://dev.mysql.com/doc/refman/5.0/en/select.html
Also, I think you're looking for 30 row pages so your queries should be using 30 as the second parameter in the limit clause.
SELECT *
FROM table
LIMIT 20, 30;
SELECT *
FROM table
LIMIT 6000000, 30;
I am building a php site using jquery and the DataTables plugin. My page is laid out just as it needs to be with pagination working but in dealing with large datasets, I have noticed the server is pulling ALL returned rows as opposed to the 10 row (can be more) limit stated within each 'page'.
Is it possible to limit the results of a query and yet keep say the ID numbers of those results in memory so when page 2 is hit (or the result number is changed) only new data is sought after?
Does it even make sense to do it this way?
I just dont want to query a DB with 2000 rows returned then have a 'front-end-plugin' make it look like the other results are hidden when they are truthfully on the page from the start.
The LIMIT clause in SQL has two parts -- the limit and the offset.
To get the first 10 rows:
SELECT ... LIMIT 0,10
To get the next 10 rows:
SELECT ... LIMIT 10,10
To get the next 10 rows:
SELECT ... LIMIT 20,10
As long as you ORDER the result set the same each time, you absolutely don't have to (and don't want to) first ask the database to send you all 2000 rows.
To display paging in conjunction with this, you still need to know how many total rows match your query. There are two ways to handle that --
1) Ask for a row count with a separate query
SELECT COUNT(*) FROM table WHERE ...
2) Use the SQL_CALC_FOUND_ROWS hint in your query, which will tell MySQL to calculate how many rows match the query before returning only the 10 you asked for. You then issue a SELECT FOUND_ROWS() query to get that result.
SELECT SQL_CALC_FOUND_ROWS column1, column2 ... LIMIT 0,10
2 is preferable since it does not add an extra query to each page load.
A table with about 70K records is displayed on a site, showing 50 records per page.
Pagination is done with limit offset,50 on the query, and the records can be ordered on different columns.
Browsing the latest pages (so the offset is around 60,000) makes the queries much slower than when browsing the first pages (about 10x)
Is this an issue of using the limit command?
Are there other ways to get the same results?
With large offsets, MySQL needs to browse more records.
Even if the plan uses filesort (which means that all records should be browsed), MySQL optimizes it so that only $offset + $limit top records are sorted, which makes it much more efficient for lower values of $offset.
The typical solution is to index the columns you are ordering on, record the last value of the columns and reuse it in the subsequent queries, like this:
SELECT *
FROM mytable
ORDER BY
value, id
LIMIT 0, 10
which outputs:
value id
1 234
3 57
4 186
5 457
6 367
8 681
10 366
13 26
15 765
17 345 -- this is the last one
To get to the next page, you would use:
SELECT *
FROM mytable
WHERE (value, id) > (17, 345)
ORDER BY
value, id
LIMIT 0, 10
, which uses the index on (value, id).
Of course this won't help with arbitrary access pages, but helps with sequential browsing.
Also, MySQL has certain issues with late row lookup. If the columns are indexed, it may be worth trying to rewrite your query like this:
SELECT *
FROM (
SELECT id
FROM mytable
ORDER BY
value, id
LIMIT $offset, $limit
) q
JOIN mytable m
ON m.id = q.id
See this article for more detailed explanations:
MySQL ORDER BY / LIMIT performance: late row lookups
It's how MySQL deals with limits. If it can sort on an index (and the query is simple enough) it can stop searching after finding the first offset + limit rows. So LIMIT 0,10 means that if the query is simple enough, it may only need to scan 10 rows. But LIMIT 1000,10 means that at minimum it needs to scan 1010 rows. Of course, the actual number of rows that need to be scanned depend on a host of other factors. But the point here is that the lower the limit + offset, the lower that the lower-bound on the number of rows that need to be scanned is...
As for workarounds, I would optimize your queries so that the query itself without the LIMIT clause is as efficient as possible. EXPLAIN is you friend in this case...