Random content definitive method [duplicate]

Random content definitive method [duplicate] - php

How can I best write a query that selects 10 rows randomly from a total of 600k?

A great post handling several cases, from simple, to gaps, to non-uniform with gaps.
http://jan.kneschke.de/projects/mysql/order-by-rand/
For most general case, here is how you do it:
SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1
This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples

SELECT column FROM table
ORDER BY RAND()
LIMIT 10
Not the efficient solution but works

Simple query that has excellent performance and works with gaps:
SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id
This query on a 200K table takes 0.08s and the normal version (SELECT * FROM tbl ORDER BY RAND() LIMIT 10) takes 0.35s on my machine.
This is fast because the sort phase only uses the indexed ID column. You can see this behaviour in the explain:
SELECT * FROM tbl ORDER BY RAND() LIMIT 10:
SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id
Weighted Version: https://stackoverflow.com/a/41577458/893432

I am getting fast queries (around 0.5 seconds) with a slow cpu, selecting 10 random rows in a 400K registers MySQL database non-cached 2Gb size. See here my code: Fast selection of random rows in MySQL
$time= microtime_float();
$sql='SELECT COUNT(*) FROM pages';
$rquery= BD_Ejecutar($sql);
list($num_records)=mysql_fetch_row($rquery);
mysql_free_result($rquery);
$sql="SELECT id FROM pages WHERE RAND()*$num_records<20
ORDER BY RAND() LIMIT 0,10";
$rquery= BD_Ejecutar($sql);
while(list($id)=mysql_fetch_row($rquery)){
if($id_in) $id_in.=",$id";
else $id_in="$id";
}
mysql_free_result($rquery);
$sql="SELECT id,url FROM pages WHERE id IN($id_in)";
$rquery= BD_Ejecutar($sql);
while(list($id,$url)=mysql_fetch_row($rquery)){
logger("$id, $url",1);
}
mysql_free_result($rquery);
$time= microtime_float()-$time;
logger("num_records=$num_records",1);
logger("$id_in",1);
logger("Time elapsed: <b>$time segundos</b>",1);

From book :
Choose a Random Row Using an Offset
Still another technique that avoids problems found in the preceding
alternatives is to count the rows in the data set and return a random
number between 0 and the count. Then use this number as an offset
when querying the data set
$rand = "SELECT ROUND(RAND() * (SELECT COUNT(*) FROM Bugs))";
$offset = $pdo->query($rand)->fetch(PDO::FETCH_ASSOC);
$sql = "SELECT * FROM Bugs LIMIT 1 OFFSET :offset";
$stmt = $pdo->prepare($sql);
$stmt->execute( $offset );
$rand_bug = $stmt->fetch();
Use this solution when you can’t assume contiguous key values and
you need to make sure each row has an even chance of being selected.

Its very simple and single line query.
SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;

Well if you have no gaps in your keys and they are all numeric you can calculate random numbers and select those lines. but this will probably not be the case.
So one solution would be the following:
SELECT * FROM table WHERE key >= FLOOR(RAND()*MAX(id)) LIMIT 1
which will basically ensure that you get a random number in the range of your keys and then you select the next best which is greater.
you have to do this 10 times.
however this is NOT really random because your keys will most likely not be distributed evenly.
It's really a big problem and not easy to solve fulfilling all the requirements, MySQL's rand() is the best you can get if you really want 10 random rows.
There is however another solution which is fast but also has a trade off when it comes to randomness, but may suit you better. Read about it here: How can i optimize MySQL's ORDER BY RAND() function?
Question is how random do you need it to be.
Can you explain a bit more so I can give you a good solution.
For example a company I worked with had a solution where they needed absolute randomness extremely fast. They ended up with pre-populating the database with random values that were selected descending and set to different random values afterwards again.
If you hardly ever update you could also fill an incrementing id so you have no gaps and just can calculate random keys before selecting... It depends on the use case!

How to select random rows from a table:
From here:
Select random rows in MySQL
A quick improvement over "table scan" is to use the index to pick up random ids.
SELECT *
FROM random, (
SELECT id AS sid
FROM random
ORDER BY RAND( )
LIMIT 10
) tmp
WHERE random.id = tmp.sid;

I improved the answer #Riedsio had. This is the most efficient query I can find on a large, uniformly distributed table with gaps (tested on getting 1000 random rows from a table that has > 2.6B rows).
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1)
Let me unpack what's going on.
#max := (SELECT MAX(id) FROM table)
I'm calculating and saving the max. For very large tables, there is a slight overhead for calculating MAX(id) each time you need a row
SELECT FLOOR(rand() * #max) + 1 as rand)
Gets a random id
SELECT id FROM table INNER JOIN (...) on id > rand LIMIT 1
This fills in the gaps. Basically if you randomly select a number in the gaps, it will just pick the next id. Assuming the gaps are uniformly distributed, this shouldn't be a problem.
Doing the union helps you fit everything into 1 query so you can avoid doing multiple queries. It also lets you save the overhead of calculating MAX(id). Depending on your application, this might matter a lot or very little.
Note that this gets only the ids and gets them in random order. If you want to do anything more advanced I recommend you do this:
SELECT t.id, t.name -- etc, etc
FROM table t
INNER JOIN (
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1)
) x ON x.id = t.id
ORDER BY t.id

All the best answers have been already posted (mainly those referencing the link http://jan.kneschke.de/projects/mysql/order-by-rand/).
I want to pinpoint another speed-up possibility - caching. Think of why you need to get random rows. Probably you want display some random post or random ad on a website. If you are getting 100 req/s, is it really needed that each visitor gets random rows? Usually it is completely fine to cache these X random rows for 1 second (or even 10 seconds). It doesn't matter if 100 unique visitors in the same 1 second get the same random posts, because the next second another 100 visitors will get different set of posts.
When using this caching you can use also some of the slower solution for getting the random data as it will be fetched from MySQL only once per second regardless of your req/s.

I've looked through all of the answers, and I don't think anyone mentions this possibility at all, and I'm not sure why.
If you want utmost simplicity and speed, at a minor cost, then to me it seems to make sense to store a random number against each row in the DB. Just create an extra column, random_number, and set it's default to RAND(). Create an index on this column.
Then when you want to retrieve a row generate a random number in your code (PHP, Perl, whatever) and compare that to the column.
SELECT FROM tbl WHERE random_number >= :random LIMIT 1
I guess although it's very neat for a single row, for ten rows like the OP asked you'd have to call it ten separate times (or come up with a clever tweak that escapes me immediately)

I needed a query to return a large number of random rows from a rather large table. This is what I came up with. First get the maximum record id:
SELECT MAX(id) FROM table_name;
Then substitute that value into:
SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;
Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)
DELIMITER $$
USE [schema name] $$
DROP PROCEDURE IF EXISTS `random_rows` $$
CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN
SET #t = CONCAT('SET #max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM #t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET #t = CONCAT(
'SELECT * FROM ',
tab_name,
' WHERE id>FLOOR(RAND()*#max) LIMIT ',
num_rows);
PREPARE stmt FROM #t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$
then
CALL [schema name].random_rows([table name], n);

Here is a game changer that may be helpfully for many;
I have a table with 200k rows, with sequential id's, I needed to pick N random rows, so I opt to generate random values based in the biggest ID in the table, I created this script to find out which is the fastest operation:
logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();
The results are:
Count: 36.8418693542479 ms
Max: 0.241041183472 ms
Order: 0.216960906982 ms
Based in this results, order desc is the fastest operation to get the max id,
Here is my answer to the question:
SELECT GROUP_CONCAT(n SEPARATOR ',') g FROM (
SELECT FLOOR(RAND() * (
SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 10) a
...
SELECT * FROM tbl WHERE id IN ($result);
FYI: To get 10 random rows from a 200k table, it took me 1.78 ms (including all the operations in the php side)

I used this http://jan.kneschke.de/projects/mysql/order-by-rand/ posted by Riedsio (i used the case of a stored procedure that returns one or more random values):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
INSERT INTO rands
SELECT r1.id
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
In the article he solves the problem of gaps in ids causing not so random results by maintaining a table (using triggers, etc...see the article);
I'm solving the problem by adding another column to the table, populated with contiguous numbers, starting from 1 (edit: this column is added to the temporary table created by the subquery at runtime, doesn't affect your permanent table):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
SET #no_gaps_id := 0;
INSERT INTO rands
SELECT r1.id
FROM (SELECT id, #no_gaps_id := #no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
(SELECT (RAND() *
(SELECT COUNT(*)
FROM random)) AS id)
AS r2
WHERE r1.no_gaps_id >= r2.id
ORDER BY r1.no_gaps_id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
In the article i can see he went to great lengths to optimize the code; i have no ideea if/how much my changes impact the performance but works very well for me.

You can easily use a random offset with a limit
PREPARE stm from 'select * from table limit 10 offset ?';
SET #total = (select count(*) from table);
SET #_offset = FLOOR(RAND() * #total);
EXECUTE stm using #_offset;
You can also apply a where clause like so
PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET #total = (select count(*) from table where available=true);
SET #_offset = FLOOR(RAND() * #total);
EXECUTE stm using #_offset;
Tested on 600,000 rows (700MB) table query execution took ~0.016sec HDD drive.
EDIT: The offset might take a value close to the end of the table, which will result in the select statement returning less rows (or maybe only 1 row), to avoid this we can check the offset again after declaring it, like so
SET #rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET #total = (select count(*) from table where available=true);
SET #_offset = FLOOR(RAND() * #total);
SET #_offset = (SELECT IF(#total-#_offset<#rows_count,#_offset-#rows_count,#_offset));
SET #_offset = (SELECT IF(#_offset<0,0,#_offset));
EXECUTE stm using #rows_count,#_offset;

I know it is not what you want, but the answer I will give you is what I use in production in a small website.
Depending on the quantity of times you access the random value, it is not worthy to use MySQL, just because you won't be able to cache the answer. We have a button there to access a random page, and a user could click in there several times per minute if he wants. This will cause a mass amount of MySQL usage and, at least for me, MySQL is the biggest problem to optimize.
I would go another approach, where you can store in cache the answer. Do one call to your MySQL:
SELECT min(id) as min, max(id) as max FROM your_table
With your min and max Id, you can, in your server, calculate a random number. In python:
random.randint(min, max)
Then, with your random number, you can get a random Id in your Table:
SELECT *
FROM your_table
WHERE id >= %s
ORDER BY id ASC
LIMIT 1
In this method you do two calls to your Database, but you can cache them and don't access the Database for a long period of time, enhancing performance. Note that this is not random if you have holes in your table. Having more than 1 row is easy since you can create the Id using python and do one request for each row, but since they are cached, it's ok.
If you have too many holes in your table, you can try the same approach, but now going for the total number of records:
SELECT COUNT(*) as total FROM your_table
Then in python you go:
random.randint(0, total)
And to fetch a random result you use the LIMIT like bellow:
SELECT *
FROM your_table
ORDER BY id ASC
LIMIT %s, 1
Notice it will get 1 value after X random rows. Even if you have holes in your table, it will be completely random, but it will cost more for your database.

If you want one random record (no matter if there are gapes between ids):
PREPARE stmt FROM 'SELECT * FROM `table_name` LIMIT 1 OFFSET ?';
SET #count = (SELECT
FLOOR(RAND() * COUNT(*))
FROM `table_name`);
EXECUTE stmt USING #count;
Source: https://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/#comment-1266

This is super fast and is 100% random even if you have gaps.
Count the number x of rows that you have available SELECT COUNT(*) as rows FROM TABLE
Pick 10 distinct random numbers a_1,a_2,...,a_10 between 0 and x
Query your rows like this: SELECT * FROM TABLE LIMIT 1 offset a_i for i=1,...,10
I found this hack in the book SQL Antipatterns from Bill Karwin.

The following should be fast, unbiased and independent of id column. However it does not guarantee that the number of rows returned will match the number of rows requested.
SELECT *
FROM t
WHERE RAND() < (SELECT 10 / COUNT(*) FROM t)
Explanation: assuming you want 10 rows out of 100 then each row has 1/10 probability of getting SELECTed which could be achieved by WHERE RAND() < 0.1. This approach does not guarantee 10 rows; but if the query is run enough times the average number of rows per execution will be around 10 and each row in the table will be selected evenly.

If you have just one Read-Request
Combine the answer of #redsio with a temp-table (600K is not that much):
DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;
And then take a version of #redsios Answer:
SELECT dt.*
FROM
(SELECT (RAND() *
(SELECT MAX(id)
FROM tmp_randorder)) AS id)
AS rnd
INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
INNER JOIN datatable AS dt on dt.id = rndo.data_id
ORDER BY abs(rndo.id - rnd.id)
LIMIT 1;
If the table is big, you can sieve on the first part:
INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;
If you have many read-requests
Version: You could keep the table tmp_randorder persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes
select l.data_id as whole
from datatable_idlist l
left join datatable dt on dt.id = l.data_id
where dt.id is null;
Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder. Index that column. Generate a Random-Value in your Application (I'll call it $rand).
select l.*
from datatable l
order by abs(random_sortorder - $rand) desc
limit 1;
This solution discriminates the 'edge rows' with the highest and the lowest random_sortorder, so rearrange them in intervals (once a day).

Another simple solution would be ranking the rows and fetch one of them randomly and with this solution you won't need to have any 'Id' based column in the table.
SELECT d.* FROM (
SELECT t.*, #rownum := #rownum + 1 AS rank
FROM mytable AS t,
(SELECT #rownum := 0) AS r,
(SELECT #cnt := (SELECT RAND() * (SELECT COUNT(*) FROM mytable))) AS n
) d WHERE rank >= #cnt LIMIT 10;
You can change the limit value as per your need to access as many rows as you want but that would mostly be consecutive values.
However, if you don't want consecutive random values then you can fetch a bigger sample and select randomly from it. something like ...
SELECT * FROM (
SELECT d.* FROM (
SELECT c.*, #rownum := #rownum + 1 AS rank
FROM buildbrain.`commits` AS c,
(SELECT #rownum := 0) AS r,
(SELECT #cnt := (SELECT RAND() * (SELECT COUNT(*) FROM buildbrain.`commits`))) AS rnd
) d
WHERE rank >= #cnt LIMIT 10000
) t ORDER BY RAND() LIMIT 10;

One way that i find pretty good if there's an autogenerated id is to use the modulo operator '%'. For Example, if you need 10,000 random records out 70,000, you could simplify this by saying you need 1 out of every 7 rows. This can be simplified in this query:
SELECT * FROM
table
WHERE
id %
FLOOR(
(SELECT count(1) FROM table)
/ 10000
) = 0;
If the result of dividing target rows by total available is not an integer, you will have some extra rows than what you asked for, so you should add a LIMIT clause to help you trim the result set like this:
SELECT * FROM
table
WHERE
id %
FLOOR(
(SELECT count(1) FROM table)
/ 10000
) = 0
LIMIT 10000;
This does require a full scan, but it is faster than ORDER BY RAND, and in my opinion simpler to understand than other options mentioned in this thread. Also if the system that writes to the DB creates sets of rows in batches you might not get such a random result as you where expecting.

I think here is a simple and yet faster way, I tested it on the live server in comparison with a few above answer and it was faster.
SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30;
//Took 0.0014secs against a table of 130 rows
SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30
//Took 0.0042secs against a table of 130 rows
SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30
//Took 0.0040secs against a table of 130 rows

SELECT
*
FROM
table_with_600k_rows
WHERE
RAND( )
ORDER BY
id DESC
LIMIT 30;
id is the primary key, sorted by id,
EXPLAIN table_with_600k_rows, find that row does not scan the entire table

I Use this query:
select floor(RAND() * (SELECT MAX(key) FROM table)) from table limit 10
query time:0.016s

This is how I do it:
select *
from table_with_600k_rows
where rand() < 10/600000
limit 10
I like it because does not require other tables, it is simple to write, and it is very fast to execute.

Use the below simple query to get random data from a table.
SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails
GROUP BY usr_fk_id
ORDER BY cnt ASC
LIMIT 10

I guess this is the best possible way..
SELECT id, id * RAND( ) AS random_no, first_name, last_name
FROM user
ORDER BY random_no

Related

Generate random MySQL rows quickly and run same sql query multiple times

I have found an example where it generates a random row quickly:
MySQL select 10 random rows from 600K rows fast
Now I would like to run that query 10 times but I'm getting exactly same output instead of different rows. Any ideas how to solve this:
Here is my code:
<?php
for ($e = 0; $e <= 14; $e++) {
$sql_users = "SELECT user_name, user_age, country, age_from, age_to, gender, profile_image, gender_search, kind_of_relationship
FROM users AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM users)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1";
$statement6 = $dbConn->prepare($sql_users);
$statement6->execute();
more = $statement6->fetch(PDO::FETCH_BOTH);
?>
<?php echo $more['user_name'];?>
<?php } ?>

If you want ten rows, how bad is the performance of:
select u.*
from users u
order by rand()
limit 10;
This does do exactly what you want. And, getting all the rows in a single query saves lots of overhead in running multiple queries. So, despite the order by rand(), it might be faster than your approach. However, that depends on the number of users.
You can also do something like this:
select u.*
from users u cross join
(select count(*) as cnt from users u) x
where rand() < (10*5 / cnt)
order by rand()
limit 10;
The where clause randomly chooses about 50 rows -- give or take. But with a high confidence, there will be at least 10. This number sorts quickly and you can randomly choose 10 of them.

PHP MYSQL General Error returned when using LIMIT [duplicate]

This question already has answers here:
Implement paging (skip / take) functionality with this query
(6 answers)
Closed 1 year ago.
I have this query with MySQL:
select * from table1 LIMIT 10,20
How can I do this with SQL Server?

Starting SQL SERVER 2005, you can do this...
USE AdventureWorks;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber'
FROM Sales.SalesOrderHeader
)
SELECT *
FROM OrderedOrders
WHERE RowNumber BETWEEN 10 AND 20;
or something like this for 2000 and below versions...
SELECT TOP 10 * FROM (SELECT TOP 20 FROM Table ORDER BY Id) ORDER BY Id DESC

Starting with SQL SERVER 2012, you can use the OFFSET FETCH Clause:
USE AdventureWorks;
GO
SELECT SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
ORDER BY SalesOrderID
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;
GO
http://msdn.microsoft.com/en-us/library/ms188385(v=sql.110).aspx
This may not work correctly when the order by is not unique.
If the query is modified to ORDER BY OrderDate, the result set returned is not as expected.

This is how I limit the results in MS SQL Server 2012:
SELECT *
FROM table1
ORDER BY columnName
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY
NOTE: OFFSET can only be used with or in tandem to ORDER BY.
To explain the code line OFFSET xx ROWS FETCH NEXT yy ROW ONLY
The xx is the record/row number you want to start pulling from in the table, i.e: If there are 40 records in table 1, the code above will start pulling from row 10.
The yy is the number of records/rows you want to pull from the table.
To build on the previous example: If table 1 has 40 records and you began pulling from row 10 and grab the NEXT set of 10 (yy).
That would mean, the code above will pull the records from table 1 starting at row 10 and ending at 20. Thus pulling rows 10 - 20.
Check out the link for more info on OFFSET

This is almost a duplicate of a question I asked in October:
Emulate MySQL LIMIT clause in Microsoft SQL Server 2000
If you're using Microsoft SQL Server 2000, there is no good solution. Most people have to resort to capturing the result of the query in a temporary table with a IDENTITY primary key. Then query against the primary key column using a BETWEEN condition.
If you're using Microsoft SQL Server 2005 or later, you have a ROW_NUMBER() function, so you can get the same result but avoid the temporary table.
SELECT t1.*
FROM (
SELECT ROW_NUMBER OVER(ORDER BY id) AS row, t1.*
FROM ( ...original SQL query... ) t1
) t2
WHERE t2.row BETWEEN #offset+1 AND #offset+#count;
You can also write this as a common table expression as shown in #Leon Tayson's answer.

SELECT *
FROM (
SELECT TOP 20
t.*, ROW_NUMBER() OVER (ORDER BY field1) AS rn
FROM table1 t
ORDER BY
field1
) t
WHERE rn > 10

Syntactically MySQL LIMIT query is something like this:
SELECT * FROM table LIMIT OFFSET, ROW_COUNT
This can be translated into Microsoft SQL Server like
SELECT * FROM
(
SELECT TOP #{OFFSET+ROW_COUNT} *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rnum
FROM table
) a
WHERE rnum > OFFSET
Now your query select * from table1 LIMIT 10,20 will be like this:
SELECT * FROM
(
SELECT TOP 30 *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rnum
FROM table1
) a
WHERE rnum > 10

SELECT TOP 10 * FROM table;
Is the same as
SELECT * FROM table LIMIT 0,10;
Here's an article about implementing Limit in MsSQL Its a nice read, specially the comments.

This is one of the reasons I try to avoid using MS Server... but anyway. Sometimes you just don't have an option (yei! and I have to use an outdated version!!).
My suggestion is to create a virtual table:
From:
SELECT * FROM table
To:
CREATE VIEW v_table AS
SELECT ROW_NUMBER() OVER (ORDER BY table_key) AS row,* FROM table
Then just query:
SELECT * FROM v_table WHERE row BETWEEN 10 AND 20
If fields are added, or removed, "row" is updated automatically.
The main problem with this option is that ORDER BY is fixed. So if you want a different order, you would have to create another view.
UPDATE
There is another problem with this approach: if you try to filter your data, it won't work as expected. For example, if you do:
SELECT * FROM v_table WHERE field = 'test' AND row BETWEEN 10 AND 20
WHERE becomes limited to those data which are in the rows between 10 and 20 (instead of searching the whole dataset and limiting the output).

In SQL there's no LIMIT keyword exists. If you only need a limited number of rows you should use a TOP keyword which is similar to a LIMIT.

Must try. In below query, you can see group by, order by, Skip rows, and limit rows.
select emp_no , sum(salary_amount) from emp_salary
Group by emp_no
ORDER BY emp_no
OFFSET 5 ROWS -- Skip first 5
FETCH NEXT 10 ROWS ONLY; -- limit to retrieve next 10 row after skiping rows

Easy way
MYSQL:
SELECT 'filds' FROM 'table' WHERE 'where' LIMIT 'offset','per_page'
MSSQL:
SELECT 'filds' FROM 'table' WHERE 'where' ORDER BY 'any' OFFSET 'offset'
ROWS FETCH NEXT 'per_page' ROWS ONLY
ORDER BY is mandatory

This is a multi step approach that will work in SQL2000.
-- Create a temp table to hold the data
CREATE TABLE #foo(rowID int identity(1, 1), myOtherColumns)
INSERT INTO #foo (myColumns) SELECT myData order By MyCriteria
Select * FROM #foo where rowID > 10

SELECT
*
FROM
(
SELECT
top 20 -- ($a) number of records to show
*
FROM
(
SELECT
top 29 -- ($b) last record position
*
FROM
table -- replace this for table name (i.e. "Customer")
ORDER BY
2 ASC
) AS tbl1
ORDER BY
2 DESC
) AS tbl2
ORDER BY
2 ASC;
-- Examples:
-- Show 5 records from position 5:
-- $a = 5;
-- $b = (5 + 5) - 1
-- $b = 9;
-- Show 10 records from position 4:
-- $a = 10;
-- $b = (10 + 4) - 1
-- $b = 13;
-- To calculate $b:
-- $b = ($a + position) - 1
-- For the present exercise we need to:
-- Show 20 records from position 10:
-- $a = 20;
-- $b = (20 + 10) - 1
-- $b = 29;

If your ID is unique identifier type or your id in table is not sorted you must do like this below.
select * from
(select ROW_NUMBER() OVER (ORDER BY (select 0)) AS RowNumber,* from table1) a
where a.RowNumber between 2 and 5
The code will be
select * from limit 2,5

better use this in MSSQLExpress 2017.
SELECT * FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) as [Count], * FROM table1
) as a
WHERE [Count] BETWEEN 10 and 20;
--Giving a Column [Count] and assigning every row a unique counting without ordering something then re select again where you can provide your limits.. :)

One of the possible way to get result as below , hope this will help.
declare #start int
declare #end int
SET #start = '5000'; -- 0 , 5000 ,
SET #end = '10000'; -- 5001, 10001
SELECT * FROM (
SELECT TABLE_NAME,TABLE_TYPE, ROW_NUMBER() OVER (ORDER BY TABLE_NAME) as row FROM information_schema.tables
) a WHERE a.row > #start and a.row <= #end

If i remember correctly (it's been a while since i dabbed with SQL Server) you may be able to use something like this: (2005 and up)
SELECT
*
,ROW_NUMBER() OVER(ORDER BY SomeFields) AS [RowNum]
FROM SomeTable
WHERE RowNum BETWEEN 10 AND 20

Better way than current query to assemble random categorized entries?

I am trying to display exactly 6 random 'entertainment' entries, but with my current query it's getting a random number between 1 and 6, and displaying that number of entries. How do I update this query in order to make it display exactly 6 random entertainment entries from my Articles table? Also, I don't want to do ORDER BY RAND() because my table will become bigger overtime. Here's my current query:
SELECT
r1.*
FROM
Articles AS r1
INNER JOIN (SELECT(RAND() * (SELECT MAX(id) FROM Articles)) AS id) AS r2
WHERE
r1.id >= r2.id
AND r1.category = 'entertainment'
LIMIT 6;
Table structure:
table Articles
- id (int)
- category (varchar)
- title (varchar)
- image (varchar)
- link (varchar)
- Counter (int)
- dateStamp (datetime)

Your 'entertainment' entries should all have unique id's which should be integers.
If this is the case you could generate 6 random int's between 1 and the amount of entries you have using PHP's rand() function. Here is a function I've written which may be useful.
function selectSixRandomEntries() {
$queryWhere = "";
$i = 0;
while($i < 6) {
$randomNumber = rand(1, 200);
if (strpos($queryWhere, $randomNumber) == -1)
continue;
$queryWhere .= "r1.id = " . rand(1, 200);
if ($i != 5)
$queryWhere .= " OR ";
$i++;
}
return $queryWhere
}
And to use it you could try
$query = "SELECT
r1.*
FROM
Articles AS r1
INNER JOIN (SELECT(RAND() * (SELECT MAX(id) FROM Articles)) AS id) AS r2
WHERE
" . selectSixRandomEntries() . "
AND r1.category = 'entertainment'
LIMIT 6";

With
select floor(rand() * m.maxId + 1) as randomId
from Articles a
join (SELECT MAX(id) maxId FROM Articles) m
limit 100
you will create 100 random ids. I take 100 because you have gaps in you id column, so the probability of not getting enough existing ids will be (very) small. Then you can use that result to select only 6 rows with those ids:
select distinct a.*
from (
select id, floor(rand() * m.maxId + 1) as randomId
from Articles a
join (SELECT MAX(id) maxId FROM Articles) m
limit 100
) r
join Articles a on a.id = r.randomId
order by r.id -- only need it for small tables. will slow down the query on big tables
limit 6
The best value for LIMIT in the subselect depends on percentage of gaps in your ids. 100 should be enough and fast.
Update
If you need to filter by category you can add a WHERE a.category = 'entertainment' clause before ORDER BY and LIMIT. But in that case you will need to ajust the number of generated random ids.
For example: If you have inserted 1M articles but 10% of them are deleted, then an average of 90 randomly generated ids do really exist. If now 10% of articles have category = 'entertainment', then an average of 9 random rows will match the condition. Average means - it might be 3 and might also be 16. So you need to generate more random ids to be sure, that you get at least 6 articles. With LIMIT 1000 in the subselect you will get an average of 90 random entertainment articles. This way you are very unlikely do get less than 6. So you need to know the statistics of your table in order to pick a good LIMIT.
Another issue with the WHERE clause, is that MySQL might reverse the join order to use an index for filtering. This might be faster for small number of generated random ids, but might be slower if the LIMIT in the subselect is huge. You can force the join order by using STRIGHT_JOIN instead of JOIN - But in my test with LIMIT 10000 it didn't make a
measurable difference.
If your condition is too selective (e.g. only 1% of articles have category='entertainment') a simple ORDER BY RAND() can be faster, because otherwise you would need to create too many random ids. But up to 10K rows matching your condition ORDER BY RAND() will be fast enough.

PHP MySQL select two random rows but not with rand()

I need to select 2 random rows but it's known that rand() is too slow. So I tryed a code from a website and it is:
SELECT *
FROM bilder AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM bilder)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 2
But this way I get same 2 rows multiple times and parsing is also not correct, so this is complete useless. Is there a working solution which is better that rand()? The table name is bilder the fields are: id, userid, nickname. id is primary and auto increment. Some rows are also deleted so it's not 1 2 3 4 5 but 1 2 4 5 6... so the solution to generate random numbers and select them won't work

There are multiple solutions to this problem, but something like the following often has good enough performance:
SELECT b.*
FROM bilder b CROSS JOIN
(SELECT COUNT(*) as cnt FROM bilder) v
WHERE rand() <= 100 / cnt
ORDER BY rand()
LIMIT 2;
The subquery selects about 100 rows. Sorting such a small number of rows is usually pretty fast. It then chooses two of them.

The most likely cause of your consternation was failing to wrap the RAND() * (SELECT MAX(id) FROM bilder) in a call to CEIL(), resulting in a float instead of an integer:
SELECT *
FROM bilder AS r1 JOIN
(SELECT ceil(RAND() *
(SELECT MAX(id)
FROM bilder)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 2

There are much faster methods of choosing one random row. Both of these methods below choose only one random row. You asked for two random rows. But these methods are orders of magnitude faster than doing a table-scan, so it's worth using these methods even if it takes multiple tries to get a second distinct random row.
The fastest way is to do it in two queries (I'll show in pseudocode):
$max = SELECT MAX(id) FROM bilder
$rand1 = rand(1..$max)-1
SELECT * FROM bilder WHERE id > $rand1 LIMIT 1
$id1 = id of the first row chosen
$rand2 = rand(1..$max)-1
SELECT * FROM bilder WHERE id > $rand2 AND id <> $id1 LIMIT 1
$id2 = id of the second row chosen
if $id2 = $id1, then choose a new $rand2 and query again
The problem with this is that if there are large gaps due to deleted rows, you get a higher chance of choosing the row that follows the gap.
Another fast method if you don't update the table very often is to add a column for consecutive ordering, then assign sequential values to that column in random order:
ALTER TABLE bilder ADD COLUMN rank INT UNSIGNED, ADD KEY (rank);
SET #r := 0;
UPDATE bilder SET rank = (#r:=#r+1) ORDER BY RAND();
Do this ranking once. It will be slow. Then once the rows are ranked, you can pick random value(s) fast:
$max = SELECT MAX(rank) FROM bilder;
$rand1 = rand(1..$max)
$rand2 = rand(1..$max) until $rand2 != $rand1
SELECT * FROM bilder WHERE rank IN ($rand1, $rand2);
Of course if you add or delete any rows from the table, you have to renumber the rows. Or at least you can do this more efficiently:
If you insert, then insert the new row with a random value and update the rank of the existing row to $max+1.
If you delete, note the rank of the deleted row and update the row with rank of $max to the rank you just deleted.

Mysql Query to fetch records

I have a table structure which looks like as follows
In the above table I need the team_id for which win+runs_scored is maximum.
I know that task can be accomplished by PHP code but if there is any query possible for this then it would be easier for me and also main thing is that in real table contains more then 15000 rows so please if some can can provide me a better solution then it would be great

select t.team_id
from YourTable t
order by t.win + t.run_scored desc
limit 1

SELECT teamid FROM
(
SELECT max(win+run_scored),teamid FROM YOUR_TABLE GROUP BY teamid
ORDER BY max(win+run_scored) desc
)
WHERE rownum <= 1

select max(t.win + t.run_scored) ,t.team_id
from YourTable t
group by t.team_id
Limit 1

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.