mysql order by rand() performance issue and solution

mysql order by rand() performance issue and solution - php

i was using order by rand() to generate random rows from database without any issue but i reaalised that as the database size increase this rand() causes heavy load on server so i was looking for an alternative and i tried by generating one random number using php rand() function and put that as id in mysql query and it was very very fast since mysql was knowing the row id
but the issue is in my table all numbers are not availbale.for example 1,2,5,9,12 like that.
if php rand() generate number 3,4 etc the query will be blank as there is no id with number 3 , 4 etc.
what is the best way to generate random numbers preferable from php but it should generate the available no in that table so it must check that table.please advise.
$id23=rand(1,100000000);
SELECT items FROM tablea where status='0' and id='$id23' LIMIT 1
the above query is fast but generate sometimes no which is not availabel in database.
SELECT items FROM tablea where status=0 order by rand() LIMIT 1
the above query is too slow and causes heavy load on server

First of, all generate a random value from 1 to MAX(id), not 100000000.
Then there are at least a couple of good solutions:
Use > not =
SELECT items FROM tablea where status='0' and id>'$id23' LIMIT 1
Create an index on (status,id,items) to make this an index-only query.
Use =, but just try again with a different random value if you don't find a hit. Sometimes it will take several tries, but often it will take only one try. The = should be faster since it can use the primary key. And if it's faster and gets it in one try 90% of the time, that could make up for the other 10% of the time when it takes more than one try. Depends on how many gaps you have in your id values.

Use your DB to find the max value from the table, generate a random number less than or equal to that value, grab the first row in which the id is greater than or equal to your random number. No PHP necessary.
SELECT items
FROM tablea
WHERE status = '0' and
id >= FLOOR(1 + RAND() * (SELECT MAX(id) FROM tablea))
LIMIT 1

You are correct, ORDER BY RAND() is not good solution if you are dealing with large datasets. Depending how often it needs to be randomized, what you can do is generate a column with a random number and then update that number at some predefined interval.
You would take that column and use it as your sort index. This works well for a heavy read environment and produces predicable random order for a certain period of time.

A possible solution is to use limit:
$id23=rand(1,$numberOfRows);
SELECT items FROM tablea where status='0' LIMIT $id23 1
This wont produce any missed rows (but as hek2mgl mentioned) requires knowing the number of rows in the select.

Related

MYSQL from the 20th row to the last row [duplicate]

I would like to construct a query that displays all the results in a table, but is offset by 5 from the start of the table. As far as I can tell, MySQL's LIMIT requires a limit as well as an offset. Is there any way to do this?

From the MySQL Manual on LIMIT:
To retrieve all rows from a certain
offset up to the end of the result
set, you can use some large number for
the second parameter. This statement
retrieves all rows from the 96th row
to the last:
SELECT * FROM tbl LIMIT 95, 18446744073709551615;

As you mentioned it LIMIT is required, so you need to use the biggest limit possible, which is 18446744073709551615 (maximum of unsigned BIGINT)
SELECT * FROM somewhere LIMIT 18446744073709551610 OFFSET 5

As noted in other answers, MySQL suggests using 18446744073709551615 as the number of records in the limit, but consider this: What would you do if you got 18,446,744,073,709,551,615 records back? In fact, what would you do if you got 1,000,000,000 records?
Maybe you do want more than one billion records, but my point is that there is some limit on the number you want, and it is less than 18 quintillion. For the sake of stability, optimization, and possibly usability, I would suggest putting some meaningful limit on the query. This would also reduce confusion for anyone who has never seen that magical looking number, and have the added benefit of communicating at least how many records you are willing to handle at once.
If you really must get all 18 quintillion records from your database, maybe what you really want is to grab them in increments of 100 million and loop 184 billion times.

Another approach would be to select an autoimcremented column and then filter it using HAVING.
SET #a := 0;
select #a:=#a + 1 AS counter, table.* FROM table
HAVING counter > 4
But I would probably stick with the high limit approach.

As others mentioned, from the MySQL manual. In order to achieve that, you can use the maximum value of an unsigned big int, that is this awful number (18446744073709551615). But to make it a little bit less messy you can the tilde "~" bitwise operator.
LIMIT 95, ~0
it works as a bitwise negation. The result of "~0" is 18446744073709551615.

You can use a MySQL statement with LIMIT:
START TRANSACTION;
SET #my_offset = 5;
SET #rows = (SELECT COUNT(*) FROM my_table);
PREPARE statement FROM 'SELECT * FROM my_table LIMIT ? OFFSET ?';
EXECUTE statement USING #rows, #my_offset;
COMMIT;
Tested in MySQL 5.5.44. Thus, we can avoid the insertion of the number 18446744073709551615.
note: the transaction makes sure that the variable #rows is in agreement to the table considered in the execution of statement.

I ran into a very similar issue when practicing LC#1321, in which I have to select all the dates but the first 6 dates are skipped.
I achieved this in MySQL with the help of ROW_NUMBER() window function and subquery. For example, the following query returns all the results with the first five rows skipped:
SELECT
fieldname1,
fieldname2
FROM(
SELECT
*,
ROW_NUMBER() OVER() row_num
FROM
mytable
) tmp
WHERE
row_num > 5;
You may need to add some more logics in the subquery, especially in OVER() to fit your need. In addition, RANK()/DENSE_RANK() window functions may be used instead of ROW_NUMBER() depending on your real offset logic.
Reference:
MySQL 8.0 Reference Manual - ROW_NUMBER()

Just today I was reading about the best way to get huge amounts of data (more than a million rows) from a mysql table. One way is, as suggested, using LIMIT x,y where x is the offset and y the last row you want returned. However, as I found out, it isn't the most efficient way to do so. If you have an autoincrement column, you can as easily use a SELECT statement with a WHERE clause saying from which record you'd like to start.
For example,
SELECT * FROM table_name WHERE id > x;
It seems that mysql gets all results when you use LIMIT and then only shows you the records that fit in the offset: not the best for performance.
Source: Answer to this question MySQL Forums. Just take note, the question is about 6 years old.

I know that this is old but I didnt see a similar response so this is the solution I would use.
First, I would execute a count query on the table to see how many records exist. This query is fast and normally the execution time is negligible. Something like:
SELECT COUNT(*) FROM table_name;
Then I would build my query using the result I got from count as my limit (since that is the maximum number of rows the table could possibly return). Something like:
SELECT * FROM table_name LIMIT count_result OFFSET desired_offset;
Or possibly something like:
SELECT * FROM table_name LIMIT desired_offset, count_result;
Of course, if necessary, you could subtract desired_offset from count_result to get an actual, accurate value to supply as the limit. Passing the "18446744073709551610" value just doesnt make sense if I can actually determine an appropriate limit to provide.

WHERE .... AND id > <YOUROFFSET>
id can be any autoincremented or unique numerical column you have...

using sql LIMIT 1 on specific queries

I was wondering if is more memory-efficient to include LIMIT 1 on queries
when we expect 1 result at all times.
So for example I have the following query
select * from clients where client_id = x
I don't have extensive db management skills but I'm assuming that
select * from clients where client_id = x limit 1
would be a bit more memory efficient because the query
wont have to iterate thru each row on the table once it finds
that row.
Am I right or it doesn't matter if I include limit in this specific case?

It could make a big difference if you do not have an index.
Consider the following with no index:
select *
from clients
where client_id = x
The engine will need to scan the entire table to determine that there is only one row. With limit 1 it could stop at the first match.
However, if you have an index, then the index would have the information for equality comparisons. So, the limit would not make a difference in terms of performance. There might be some slight difference, but it should be negligible.

If client_id is primary index means, no need to use LIMIT 1. Because it will not more effective.

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?

You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.

Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound

First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.

For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

Select Random Row from SQL Using PHP

I want to request 5 random rows from my SQL table using php.
for instance, i need to:
mysql_query("SELECT * FROM catalogue >> not sure what goes here << LIMIT 5");

SELECT * FROM catalogue order by RAND() LIMIT 5
Edit:
For what its worth, Please note that using rand() on a table with large number of rows is going to be slow. This can actually crash your server.
Some Solution:
MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
But this is for a single random row only.

Assuming the table has AUTO INCREMENT on you could get the biggest ID with
SELECT id FROM catalogue ORDER BY id DESC LIMIT 1
and the smallest ID with
SELECT id FROM catalogue ORDER BY id ASC LIMIT 1
making it possible to do this
$randomId = mt_rand($smallestId, $biggestId);
$randomSql = mysql_query("SELECT * FROM catalogue WHERE id='$randomId'");
For five rows you could create the random ID five times.

If you're selecting random rows from a very large table, you may want to experiment with the approaches in the following link:
http://akinas.com/pages/en/blog/mysql_random_row/
note: just to share other options with everyone

Best way to have random order of elements in the table

I have got table with 300 000 rows. There is specially dedicated field (called order_number) in this table to story the number, which is later used to present the data from this table ordered by order_number field. What is the best and easy way to assign the random number/hash for each of the records in order to select the records ordered by this numbers? The number of rows in the table is not stable and may grow to 1 000 000, so the rand method should take it into the account.

Look at this tutorial on selecting random rows from a table.

If you don't want to use MySQL's built in RAND() function you could use something like this:
select max(id) from table;
$random_number = ...
select * from table where id > $random_number;
That should be a lot quicker.

UPDATE table SET order_number = sha2(id)
or
UPDATE table SET order_number = RAND(id)
sha2() is more random than RAND().

I know you've got enough answers but I would tell you how we did it in our company.
The first approach we use is with additional column for storing random number generated for each record/row. We have INDEX on this column, allowing us to order records by it.
id, name , ordering
1 , zlovic , 14
2 , silvas , 8
3 , jouzel , 59
SELECT * FROM table ORDER BY ordering ASC/DESC
POS: you have index and ordering is very fast
CONS: you will depend on new records to keep the randomization of the records
Second approach we have used is what Karl Roos gave an his answer. We retrieve the number of records in our database and using the > (greater) and some math we retrieve rows randomized. We are working with binary ids thus we need to keep autoincrement column to avoid random writings in InnoDB, sometimes we perform two or more queries to retrieve all of the data, keeping it randomized enough. (If you need 30 random items from 1,000,000 records you can run 3 simple SELECTs each for 10 items with different offset)
Hope this helps you. :)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

mysql order by rand() performance issue and solution - php

A possible solution is to use limit: $id23=rand(1,$numberOfRows); SELECT items FROM tablea where status='0' LIMIT $id23 1 This wont produce any missed rows (but as hek2mgl mentioned) requires knowing the number of rows in the select.

Related

MYSQL from the 20th row to the last row [duplicate]

using sql LIMIT 1 on specific queries

Getting random results from large tables

Select Random Row from SQL Using PHP

Best way to have random order of elements in the table

Categories

Resources