There are two type of questions there 1.Passage and 2.Normal questions.
usally in test i want to pick random questions which consist type_id=0 in that if type=1 question come the the next passage should be relates to that question(Comprehension question should come in sequential). By using the below query i am able to get the questions
SELECT *
FROM tbl_testquestion
ORDER BY
CASE
WHEN type_id=0 THEN RAND()
WHEN type_id=1 THEN qu_id
END ASC
all the passage questions are coming last
and i have limit of 40 questions for test and in the table i have 50 passage questions and 70 Normal questions.
How can i write a query to call passage questions in between normal
questions.
EXAMPLE
1.who is the president of America.?(type_id=0)
2.A,B,C are 3 students Aname is "Arun" B name is "Mike" C name is "Jhon"(type_id=1)
who is C from the above passage
3.A,B,C are 3 students Aname is "Arun" B name is "Mike" C name is "Jhon"(type_id=1)
who is A from the above passage
4.Who is CEO of Facebook.?(type_id=0)
Form the Above 4 question we will pick random if Question 1 comes in that rand() no problem when the question 2 comes in the rand() the next question should be sequential. it means next question should be 3 after that passage questions completed it should switch back to rand() functionality
I think that the design of your database should be improved, but I’m going to answer your question as it stands.
I think I have a rather simple solution, which I can express in portable SQL without CTE’s.
It works this way: let’s assign two numbers to each row, call them major (an integer, just to be safe let’s make it a multiple of ten) and minor (a float between 0 and 1). For type 0 questions, minor is always 0. Each type 1 question relating to the same passage gets the same major (we do this with a join with a grouped subselect). We then order the table by the sum of the two values.
It will be slow, because it joins using a text field. It would be better if each distinct passage_description had an integer id to be used for the join.
I assume that all type 0 questions have empty or null passage_description, while type 1 questions have them non-empty (it would make no sense otherwise.)
I assume you have a RAND() function which yields floating values between 0 and 1.
Here we go:
SELECT u.qu_id, u.type_id,
u.passage_description, u.passage_image,
u.cat_id, u.subcat_id,
u.question, u.q_instruction, u.qu_status
FROM (
SELECT grouped.major, RAND()+0.001 AS minor, t1.*
FROM tbl_testquestion t1
JOIN (SELECT 10*FLOOR(1000*RAND()) major, passage_description
FROM tbl_testquestion WHERE type_id = 1
GROUP BY passage_description) grouped
USING (passage_description)
-- LIMIT 39
UNION
SELECT 10*FLOOR(1000*RAND()) major, 0 minor, t0.*
FROM tbl_testquestion t0 WHERE type_id = 0
) u ORDER BY u.major+u.minor ASC LIMIT 40;
With the above query without modifications, there is still a small probability that you get questions of only one type. If you want to be sure that you have at least one type 0 question, you can uncomment the LIMIT 39 on the first part of the UNION. If you want at least two, then say LIMIT 38, and so on. All type 1 questions related to the same passage will be grouped together in one test; it is not guaranteed that all questions in the database related to that passage will be in the test, but in a comment above you mention that this can be “broke”.
Edited:
I added a small amount to minor, just to bypass the rare but possible case in which RAND() returns exactly zero. Since major goes by tens, the fact that minor might now be greater than one is immaterial.
Use the following, I haven't tested this so, if there are any errors please report back, I will correct them. $r is a random value produced by PHP for this query. You could do $r = rand(); before calling the query
SELECT * FROM (
UNION((
SELECT *, RAND()*(SELECT COUNT(*) FROM tbl_testquestions) as orderid
FROM tbl_testquestion
WHERE type_id=0
ORDER BY orderid
LIMIT 20
),(
SELECT *, MD5(CONCAT('$r', passage_description)) as orderid
FROM tbl_testquestion
WHERE type_id=1
ORDER BY orderid
LIMIT 20
))
) AS t1
ORDER BY orderid
Explanation: orderid will keep type_id=1 entries together as it would produce the same random sequence for the same passage questions.
Warning: Unless you add passage_id to the table, this question will work quite slowly.
Edit: Fixed the ordering (I hope), forgot that MYSQL generates random numbers between 0 and 1.
This is the solution for mysql,
sorry it is not so readable because mysql does not supports CTE like sql-server.
Maybe you can compare with sql-server CTE syntax to the bottom to better understand how it works.
select
d.*
, o.q_ix, rnd_ord -- this is only for your reference
from (
select *, floor(rand()*1000) as rnd_ord -- this is main order for questions and groups
from (
select * from (
select
(#r1 := #r1 - 1) as q_ix, -- this is row_number() (negative so we can keep group separated)
passage_description, 0 qu_id, type_id
from (
select distinct passage_description, type_id
from tbl_testquestion,
(SELECT #r1 := 0) v, -- this is the trick for row_number()
(SELECT #rnd_limit := -floor(rand()*3)) r -- this is the trick for dynamic random limit
where type_id=1
) p
order by passage_description -- order by for row_number()
) op
where q_ix < #rnd_limit
union all
select * from (
select
(#r2 := #r2 + 1) as q_ix, -- again row_number()
'' as passage_description, qu_id, type_id
from tbl_testquestion,
(SELECT #r2 := 0) v -- var for row_number
where type_id=0
order by qu_id -- order by for row_number()
) oq
) q
) o
-- look at double join for questions and groups
join tbl_testquestion d on
((d.passage_description = o.passage_description) and (d.type_id=1))
or
((d.qu_id=o.qu_id) and (d.type_id=0))
order by rnd_ord
limit 40
and this is the more readable sql-server syntax:
;with
p as (
-- select a random number of groups (0-2) and label groups (-1,-2)
select top (abs(checksum(NEWID())) % 3) -ROW_NUMBER() over (order by passage_description) p_id, passage_description
from (
select distinct passage_description
from d
where type_id=1
) x
),
q as (
-- label questions (1..n)
select ROW_NUMBER() over (order by qu_id) q_ix, qu_id
from d
where type_id=0
),
o as (
-- calculate final order
select *, ROW_NUMBER() over (order by newid()) rnd_ord
from (
select p.q_ix, passage_description, 0 qu_id from p
union all
select q.q_ix, '', qu_id from q
) x
)
select top 40
d.*
, o.rnd_ord, o.q_ix
from o
join d on
((d.passage_description = o.passage_description) and (d.type_id=1))
or
((d.qu_id = o.qu_id) and (d.type_id=0))
order by
rnd_ord
that's all
Related
How can I best write a query that selects 10 rows randomly from a total of 600k?
A great post handling several cases, from simple, to gaps, to non-uniform with gaps.
http://jan.kneschke.de/projects/mysql/order-by-rand/
For most general case, here is how you do it:
SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1
This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples
SELECT column FROM table
ORDER BY RAND()
LIMIT 10
Not the efficient solution but works
Simple query that has excellent performance and works with gaps:
SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id
This query on a 200K table takes 0.08s and the normal version (SELECT * FROM tbl ORDER BY RAND() LIMIT 10) takes 0.35s on my machine.
This is fast because the sort phase only uses the indexed ID column. You can see this behaviour in the explain:
SELECT * FROM tbl ORDER BY RAND() LIMIT 10:
SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id
Weighted Version: https://stackoverflow.com/a/41577458/893432
I am getting fast queries (around 0.5 seconds) with a slow cpu, selecting 10 random rows in a 400K registers MySQL database non-cached 2Gb size. See here my code: Fast selection of random rows in MySQL
$time= microtime_float();
$sql='SELECT COUNT(*) FROM pages';
$rquery= BD_Ejecutar($sql);
list($num_records)=mysql_fetch_row($rquery);
mysql_free_result($rquery);
$sql="SELECT id FROM pages WHERE RAND()*$num_records<20
ORDER BY RAND() LIMIT 0,10";
$rquery= BD_Ejecutar($sql);
while(list($id)=mysql_fetch_row($rquery)){
if($id_in) $id_in.=",$id";
else $id_in="$id";
}
mysql_free_result($rquery);
$sql="SELECT id,url FROM pages WHERE id IN($id_in)";
$rquery= BD_Ejecutar($sql);
while(list($id,$url)=mysql_fetch_row($rquery)){
logger("$id, $url",1);
}
mysql_free_result($rquery);
$time= microtime_float()-$time;
logger("num_records=$num_records",1);
logger("$id_in",1);
logger("Time elapsed: <b>$time segundos</b>",1);
From book :
Choose a Random Row Using an Offset
Still another technique that avoids problems found in the preceding
alternatives is to count the rows in the data set and return a random
number between 0 and the count. Then use this number as an offset
when querying the data set
$rand = "SELECT ROUND(RAND() * (SELECT COUNT(*) FROM Bugs))";
$offset = $pdo->query($rand)->fetch(PDO::FETCH_ASSOC);
$sql = "SELECT * FROM Bugs LIMIT 1 OFFSET :offset";
$stmt = $pdo->prepare($sql);
$stmt->execute( $offset );
$rand_bug = $stmt->fetch();
Use this solution when you can’t assume contiguous key values and
you need to make sure each row has an even chance of being selected.
Its very simple and single line query.
SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;
Well if you have no gaps in your keys and they are all numeric you can calculate random numbers and select those lines. but this will probably not be the case.
So one solution would be the following:
SELECT * FROM table WHERE key >= FLOOR(RAND()*MAX(id)) LIMIT 1
which will basically ensure that you get a random number in the range of your keys and then you select the next best which is greater.
you have to do this 10 times.
however this is NOT really random because your keys will most likely not be distributed evenly.
It's really a big problem and not easy to solve fulfilling all the requirements, MySQL's rand() is the best you can get if you really want 10 random rows.
There is however another solution which is fast but also has a trade off when it comes to randomness, but may suit you better. Read about it here: How can i optimize MySQL's ORDER BY RAND() function?
Question is how random do you need it to be.
Can you explain a bit more so I can give you a good solution.
For example a company I worked with had a solution where they needed absolute randomness extremely fast. They ended up with pre-populating the database with random values that were selected descending and set to different random values afterwards again.
If you hardly ever update you could also fill an incrementing id so you have no gaps and just can calculate random keys before selecting... It depends on the use case!
How to select random rows from a table:
From here:
Select random rows in MySQL
A quick improvement over "table scan" is to use the index to pick up random ids.
SELECT *
FROM random, (
SELECT id AS sid
FROM random
ORDER BY RAND( )
LIMIT 10
) tmp
WHERE random.id = tmp.sid;
I improved the answer #Riedsio had. This is the most efficient query I can find on a large, uniformly distributed table with gaps (tested on getting 1000 random rows from a table that has > 2.6B rows).
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1)
Let me unpack what's going on.
#max := (SELECT MAX(id) FROM table)
I'm calculating and saving the max. For very large tables, there is a slight overhead for calculating MAX(id) each time you need a row
SELECT FLOOR(rand() * #max) + 1 as rand)
Gets a random id
SELECT id FROM table INNER JOIN (...) on id > rand LIMIT 1
This fills in the gaps. Basically if you randomly select a number in the gaps, it will just pick the next id. Assuming the gaps are uniformly distributed, this shouldn't be a problem.
Doing the union helps you fit everything into 1 query so you can avoid doing multiple queries. It also lets you save the overhead of calculating MAX(id). Depending on your application, this might matter a lot or very little.
Note that this gets only the ids and gets them in random order. If you want to do anything more advanced I recommend you do this:
SELECT t.id, t.name -- etc, etc
FROM table t
INNER JOIN (
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1) UNION
(SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * #max) + 1 as rand) r on id > rand LIMIT 1)
) x ON x.id = t.id
ORDER BY t.id
All the best answers have been already posted (mainly those referencing the link http://jan.kneschke.de/projects/mysql/order-by-rand/).
I want to pinpoint another speed-up possibility - caching. Think of why you need to get random rows. Probably you want display some random post or random ad on a website. If you are getting 100 req/s, is it really needed that each visitor gets random rows? Usually it is completely fine to cache these X random rows for 1 second (or even 10 seconds). It doesn't matter if 100 unique visitors in the same 1 second get the same random posts, because the next second another 100 visitors will get different set of posts.
When using this caching you can use also some of the slower solution for getting the random data as it will be fetched from MySQL only once per second regardless of your req/s.
I've looked through all of the answers, and I don't think anyone mentions this possibility at all, and I'm not sure why.
If you want utmost simplicity and speed, at a minor cost, then to me it seems to make sense to store a random number against each row in the DB. Just create an extra column, random_number, and set it's default to RAND(). Create an index on this column.
Then when you want to retrieve a row generate a random number in your code (PHP, Perl, whatever) and compare that to the column.
SELECT FROM tbl WHERE random_number >= :random LIMIT 1
I guess although it's very neat for a single row, for ten rows like the OP asked you'd have to call it ten separate times (or come up with a clever tweak that escapes me immediately)
I needed a query to return a large number of random rows from a rather large table. This is what I came up with. First get the maximum record id:
SELECT MAX(id) FROM table_name;
Then substitute that value into:
SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;
Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)
DELIMITER $$
USE [schema name] $$
DROP PROCEDURE IF EXISTS `random_rows` $$
CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN
SET #t = CONCAT('SET #max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM #t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET #t = CONCAT(
'SELECT * FROM ',
tab_name,
' WHERE id>FLOOR(RAND()*#max) LIMIT ',
num_rows);
PREPARE stmt FROM #t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$
then
CALL [schema name].random_rows([table name], n);
Here is a game changer that may be helpfully for many;
I have a table with 200k rows, with sequential id's, I needed to pick N random rows, so I opt to generate random values based in the biggest ID in the table, I created this script to find out which is the fastest operation:
logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();
The results are:
Count: 36.8418693542479 ms
Max: 0.241041183472 ms
Order: 0.216960906982 ms
Based in this results, order desc is the fastest operation to get the max id,
Here is my answer to the question:
SELECT GROUP_CONCAT(n SEPARATOR ',') g FROM (
SELECT FLOOR(RAND() * (
SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 10) a
...
SELECT * FROM tbl WHERE id IN ($result);
FYI: To get 10 random rows from a 200k table, it took me 1.78 ms (including all the operations in the php side)
I used this http://jan.kneschke.de/projects/mysql/order-by-rand/ posted by Riedsio (i used the case of a stored procedure that returns one or more random values):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
INSERT INTO rands
SELECT r1.id
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
In the article he solves the problem of gaps in ids causing not so random results by maintaining a table (using triggers, etc...see the article);
I'm solving the problem by adding another column to the table, populated with contiguous numbers, starting from 1 (edit: this column is added to the temporary table created by the subquery at runtime, doesn't affect your permanent table):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
SET #no_gaps_id := 0;
INSERT INTO rands
SELECT r1.id
FROM (SELECT id, #no_gaps_id := #no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
(SELECT (RAND() *
(SELECT COUNT(*)
FROM random)) AS id)
AS r2
WHERE r1.no_gaps_id >= r2.id
ORDER BY r1.no_gaps_id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
In the article i can see he went to great lengths to optimize the code; i have no ideea if/how much my changes impact the performance but works very well for me.
You can easily use a random offset with a limit
PREPARE stm from 'select * from table limit 10 offset ?';
SET #total = (select count(*) from table);
SET #_offset = FLOOR(RAND() * #total);
EXECUTE stm using #_offset;
You can also apply a where clause like so
PREPARE stm from 'select * from table where available=true limit 10 offset ?';
SET #total = (select count(*) from table where available=true);
SET #_offset = FLOOR(RAND() * #total);
EXECUTE stm using #_offset;
Tested on 600,000 rows (700MB) table query execution took ~0.016sec HDD drive.
EDIT: The offset might take a value close to the end of the table, which will result in the select statement returning less rows (or maybe only 1 row), to avoid this we can check the offset again after declaring it, like so
SET #rows_count = 10;
PREPARE stm from "select * from table where available=true limit ? offset ?";
SET #total = (select count(*) from table where available=true);
SET #_offset = FLOOR(RAND() * #total);
SET #_offset = (SELECT IF(#total-#_offset<#rows_count,#_offset-#rows_count,#_offset));
SET #_offset = (SELECT IF(#_offset<0,0,#_offset));
EXECUTE stm using #rows_count,#_offset;
I know it is not what you want, but the answer I will give you is what I use in production in a small website.
Depending on the quantity of times you access the random value, it is not worthy to use MySQL, just because you won't be able to cache the answer. We have a button there to access a random page, and a user could click in there several times per minute if he wants. This will cause a mass amount of MySQL usage and, at least for me, MySQL is the biggest problem to optimize.
I would go another approach, where you can store in cache the answer. Do one call to your MySQL:
SELECT min(id) as min, max(id) as max FROM your_table
With your min and max Id, you can, in your server, calculate a random number. In python:
random.randint(min, max)
Then, with your random number, you can get a random Id in your Table:
SELECT *
FROM your_table
WHERE id >= %s
ORDER BY id ASC
LIMIT 1
In this method you do two calls to your Database, but you can cache them and don't access the Database for a long period of time, enhancing performance. Note that this is not random if you have holes in your table. Having more than 1 row is easy since you can create the Id using python and do one request for each row, but since they are cached, it's ok.
If you have too many holes in your table, you can try the same approach, but now going for the total number of records:
SELECT COUNT(*) as total FROM your_table
Then in python you go:
random.randint(0, total)
And to fetch a random result you use the LIMIT like bellow:
SELECT *
FROM your_table
ORDER BY id ASC
LIMIT %s, 1
Notice it will get 1 value after X random rows. Even if you have holes in your table, it will be completely random, but it will cost more for your database.
If you want one random record (no matter if there are gapes between ids):
PREPARE stmt FROM 'SELECT * FROM `table_name` LIMIT 1 OFFSET ?';
SET #count = (SELECT
FLOOR(RAND() * COUNT(*))
FROM `table_name`);
EXECUTE stmt USING #count;
Source: https://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/#comment-1266
This is super fast and is 100% random even if you have gaps.
Count the number x of rows that you have available SELECT COUNT(*) as rows FROM TABLE
Pick 10 distinct random numbers a_1,a_2,...,a_10 between 0 and x
Query your rows like this: SELECT * FROM TABLE LIMIT 1 offset a_i for i=1,...,10
I found this hack in the book SQL Antipatterns from Bill Karwin.
The following should be fast, unbiased and independent of id column. However it does not guarantee that the number of rows returned will match the number of rows requested.
SELECT *
FROM t
WHERE RAND() < (SELECT 10 / COUNT(*) FROM t)
Explanation: assuming you want 10 rows out of 100 then each row has 1/10 probability of getting SELECTed which could be achieved by WHERE RAND() < 0.1. This approach does not guarantee 10 rows; but if the query is run enough times the average number of rows per execution will be around 10 and each row in the table will be selected evenly.
If you have just one Read-Request
Combine the answer of #redsio with a temp-table (600K is not that much):
DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;
And then take a version of #redsios Answer:
SELECT dt.*
FROM
(SELECT (RAND() *
(SELECT MAX(id)
FROM tmp_randorder)) AS id)
AS rnd
INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
INNER JOIN datatable AS dt on dt.id = rndo.data_id
ORDER BY abs(rndo.id - rnd.id)
LIMIT 1;
If the table is big, you can sieve on the first part:
INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;
If you have many read-requests
Version: You could keep the table tmp_randorder persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes
select l.data_id as whole
from datatable_idlist l
left join datatable dt on dt.id = l.data_id
where dt.id is null;
Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder. Index that column. Generate a Random-Value in your Application (I'll call it $rand).
select l.*
from datatable l
order by abs(random_sortorder - $rand) desc
limit 1;
This solution discriminates the 'edge rows' with the highest and the lowest random_sortorder, so rearrange them in intervals (once a day).
Another simple solution would be ranking the rows and fetch one of them randomly and with this solution you won't need to have any 'Id' based column in the table.
SELECT d.* FROM (
SELECT t.*, #rownum := #rownum + 1 AS rank
FROM mytable AS t,
(SELECT #rownum := 0) AS r,
(SELECT #cnt := (SELECT RAND() * (SELECT COUNT(*) FROM mytable))) AS n
) d WHERE rank >= #cnt LIMIT 10;
You can change the limit value as per your need to access as many rows as you want but that would mostly be consecutive values.
However, if you don't want consecutive random values then you can fetch a bigger sample and select randomly from it. something like ...
SELECT * FROM (
SELECT d.* FROM (
SELECT c.*, #rownum := #rownum + 1 AS rank
FROM buildbrain.`commits` AS c,
(SELECT #rownum := 0) AS r,
(SELECT #cnt := (SELECT RAND() * (SELECT COUNT(*) FROM buildbrain.`commits`))) AS rnd
) d
WHERE rank >= #cnt LIMIT 10000
) t ORDER BY RAND() LIMIT 10;
One way that i find pretty good if there's an autogenerated id is to use the modulo operator '%'. For Example, if you need 10,000 random records out 70,000, you could simplify this by saying you need 1 out of every 7 rows. This can be simplified in this query:
SELECT * FROM
table
WHERE
id %
FLOOR(
(SELECT count(1) FROM table)
/ 10000
) = 0;
If the result of dividing target rows by total available is not an integer, you will have some extra rows than what you asked for, so you should add a LIMIT clause to help you trim the result set like this:
SELECT * FROM
table
WHERE
id %
FLOOR(
(SELECT count(1) FROM table)
/ 10000
) = 0
LIMIT 10000;
This does require a full scan, but it is faster than ORDER BY RAND, and in my opinion simpler to understand than other options mentioned in this thread. Also if the system that writes to the DB creates sets of rows in batches you might not get such a random result as you where expecting.
I think here is a simple and yet faster way, I tested it on the live server in comparison with a few above answer and it was faster.
SELECT * FROM `table_name` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table_name` ) ORDER BY id LIMIT 30;
//Took 0.0014secs against a table of 130 rows
SELECT * FROM `table_name` WHERE 1 ORDER BY RAND() LIMIT 30
//Took 0.0042secs against a table of 130 rows
SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 30
//Took 0.0040secs against a table of 130 rows
SELECT
*
FROM
table_with_600k_rows
WHERE
RAND( )
ORDER BY
id DESC
LIMIT 30;
id is the primary key, sorted by id,
EXPLAIN table_with_600k_rows, find that row does not scan the entire table
I Use this query:
select floor(RAND() * (SELECT MAX(key) FROM table)) from table limit 10
query time:0.016s
This is how I do it:
select *
from table_with_600k_rows
where rand() < 10/600000
limit 10
I like it because does not require other tables, it is simple to write, and it is very fast to execute.
Use the below simple query to get random data from a table.
SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails
GROUP BY usr_fk_id
ORDER BY cnt ASC
LIMIT 10
I guess this is the best possible way..
SELECT id, id * RAND( ) AS random_no, first_name, last_name
FROM user
ORDER BY random_no
I programmed a filter which generates a Query to show special employees.
I have table employees and a lot of 1:1, 1:n and n:m relationships e.g. for skills and languages for the employees like this:
Employees
id name
1 John
2 Mike
Skills
id skill experience
1 PHP 3
2 SQL 1
Employee_Skills
eid sid
1 1
1 2
Now I want to filter employees which have at least 2 years experience in using PHP and 1 year SQL.
My filter always generates a correct working Query for every table, relationship and field.
But now my problem is when I would like to filter the same field in a related table multiple times with a and it does not work.
e.g.
John PHP 3
John SQL 1
PHP and SQL are different rows so AND can not work.
I tried using group_concat and find_in_set but I have the problem that I can not filter experience over 2 years with find_in_set and find_in_set does not know PHP is 3 and SQL is 1.
I also tried
WHERE emp.id IN (SELECT eid FROM Employee_Skills WHERE sid IN (SELECT id FROM Skills WHERE skill = 'PHP' AND experience > 1)) AND emp.id IN (SELECT eid FROM Employee_Skills WHERE sid IN (SELECT id FROM Skills WHERE skill = 'SQL' AND experience > 0))
which works for this example, but it only works for n:m and it too complex to know the relationship type.
I have the final Query with
ski.skill = 'PHP' AND ski.experience > 1 AND ski.skill = 'SQL' AND ski.experience > 0
and I would like to manipulate the Query to make it work.
How does a Query have to look like to deal with relational division.
you can try next approach:
select * from Employees
where id in (
select eid
from Employee_Skills as a
inner join
Skills as ski
on (a.sid = ski.id)
where
(ski.skill = 'PHP' AND a.experience > 2) OR
(ski.skill = 'SQL' AND a.experience > 1)
group by eid
having count(*) = 2
)
so, for every filter you will add OR statement, having will filter employees with all filters passed, just pass appropriate number
You could make a kind of pivot query, where you put the experience in each of all of the known skills in columns. This could be a long query, but you could build it dynamically in php, so it would add all skills as columns to the final query, which would look like this:
SELECT e.*, php_exp, sql_exp
FROM Employee e
INNER JOIN (
SELECT es.eid,
SUM(CASE s.skill WHEN 'PHP' THEN s.experience END) php_exp,
SUM(CASE s.skill WHEN 'SQL' THEN s.experience END) sql_exp,
SUM(CASE s.skill WHEN 'JS' THEN s.experience END) js_exp
-- do the same for other skills here --
FROM Employee_Skills es
INNER JOIN Skills s ON es.sid = s.id
GROUP BY es.eid
) pivot ON pivot.eid = e.id
WHERE php_exp > 2 AND sql_exp > 0;
The WHERE clause is then very concise and intuitive: you use the logical operators like in other circumstances.
If the set of skills is rather static, you could even create a view for the sub-query. Then the final SQL is quite concise.
Here is a fiddle.
Alternative
Using the same principle, but using the SUM in the HAVING clause, you can avoid gathering all skill's experiences:
SELECT e.*
FROM Employee e
INNER JOIN (
SELECT es.eid
FROM Employee_Skills es
INNER JOIN Skills s ON es.sid = s.id
GROUP BY es.eid
HAVING SUM(CASE s.skill WHEN 'PHP' THEN s.experience END) > 2
AND SUM(CASE s.skill WHEN 'SQL' THEN s.experience END) > 0
) pivot ON pivot.eid = e.id;
Here is a fiddle.
You can also replace the CASE construct by the IF function, like this:
HAVING SUM(IF(s.skill='PHP', s.experience, 0)) > 2
... etc.
But it comes down to the same.
The straightforward way would be to repeatedly JOIN the skills:
SELECT e.*
FROM Employees AS e
JOIN Employee_Skills AS j1 ON (e.id = j1.eid)
JOIN Skills AS s1 ON (j1.sid = s1.id AND s1.skill = 'PHP' AND s1.experience > 3)
JOIN Employee_Skills AS j2 ON (e.id = j2.eid)
JOIN Skills AS s2 ON (j2.sid = s2.id AND s2.skill = 'SQL' AND s2.experience > 1)
...
Since all the clauses are required this translated to a straight JOIN.
You will need to add two JOINs for each clause, but they're quite fast joins.
A more hackish way would be to compress the skills into a code in a 1:1 relation with the employees. If experience never exceeds, say, 30, then you can multiply the first condition's experience by 1, the second by 30, the third by 30*30, the fourth by 30*30*30... and never get an overflow.
SELECT eid, SUM(CASE skill
WHEN 'PHP' THEN 30*experience
WHEN 'SQL' THEN 1*experience) AS code
FROM Employees_Skills JOIN Skills ON (Skills.id = Employees_Skills.sid)
GROUP BY eid HAVING code > 0;
Actually since you want 3 years PHP, you can HAVE code > 91. If you had three conditions with experiences 2, 3 and 5, you would request more than x = 2*30*30 + 3*30 + 5. This only serves to whittle the results, since 3*30*30 + 2*30 + 4 still passes the filter but is of no use to you. But since you want a restriction on code, and "> x" costs the same as "> 0" and gives better results... (if you needed more complex filtering than a series of AND, > 0 is safer, though).
The table above you join with Employees, then on the result you perform the true filtering, requiring
((code / 30*30) % 30) > 7 // for instance :-)
AND
((code / 30) % 30) > 3 // for PHP
AND
((code / 1) % 30) > 1 // for SQL
(the *1 and /1 are superfluous, and only inserted to clarify)
This solution requires a full table scan on Skills, with no real possibility of automatic optimizations. So it is slower than the other solution. On the other hand, its cost grows much more slowly, so if you have complex queries, or need OR operators or conditional expressions instead of ANDs, it may be more convenient to implement the "hackish" solution.
Can anyone help me optimise this query? I have the following table:
cdu_user_progress:
--------------------------------------------------------------
|id |uid |lesson_id |game_id |date |score |
--------------------------------------------------------------
For each user, I'm trying to obtain the difference between the best and first scores for a particular game_id for a particular lesson_id, and order the results by that difference ('progress' in my query):
SELECT ms.uid AS id, ms.max_score - fs.first_score AS progress
FROM (
SELECT up.uid, MAX(CASE WHEN game_id = 3 THEN score ELSE NULL END) AS max_score
FROM cdu_user_progress up
WHERE (up.uid IN ('1671', '1672', '1673', '1674', '1675', '1676', '1679', '1716', '1725', '1726', '1937', '1964', '1996', '2062', '2065', '2066', '2085', '2086')) AND (up.lesson_id = '65') AND (up.score > '-1')
GROUP BY up.uid
) ms
LEFT JOIN (
SELECT up.uid, up.score AS first_score
FROM cdu_user_progress up
INNER JOIN (
SELECT up.uid, MIN(CASE WHEN game_id = 3 THEN date ELSE NULL END) AS first_date
FROM cdu_user_progress up
WHERE (up.uid IN ('1671', '1672', '1673', '1674', '1675', '1676', '1679', '1716', '1725', '1726', '1937', '1964', '1996', '2062', '2065', '2066', '2085', '2086')) AND (up.lesson_id = '65') AND (up.score > '-1')
GROUP BY up.uid
) fd ON fd.uid = up.uid AND fd.first_date = up.date
) fs ON fs.uid = ms.uid
ORDER BY progress DESC
Any help would be most appreciated!
Absent any EXPLAIN output or index definitions, we can't make any recommendations. (I noted in a comment that it looks like some join predicates are missing, if we don't have guaranteed uniqueness on the (uid,date) tuple in cdu_user_progress... there's potential that we are going to get rows that are for a different lesson_id or a score that isn't greater than '-1'.
In the query text, immediately before ) fs , I'd be adding
AND up.lesson_id = '65'
AND up.score > '-1'
GROUP BY up.uid
I'd also wrap the up.score column (in the SELECT list of the fd view) in an aggregate function, either MIN() or MAX(), for compliance with the ANSI standard (even though it isn't required by MySQL when SQL_MODE doesn't include ONLY_FULL_GROUP_BY)
If I didn't have a suitable index defined, I'd consider adding an index:
... ON cdu_user_progress (lesson_id, uid, score, game_id, date)
There's some overhead for the derived tables (materializing the inline views) and those derived tables aren't going to have indexes on them (in MySQL 5.5 and earlier.) But the GROUP BY in each inline view ensures that we'll have less than 20 rows, so that's not really going to be a problem.
So, if there's a performance issue, it's in the view queries. Again, we'd really need to see the output from EXPLAIN and the index definitions, and some cardinality estimates, in order to make recommendations.
FOLLOWUP
Given that there's not a unique constraint on (uid,date), I'd add those predicates in the fs view query. I'd also use unique table aliases in the query (for each references to cdu_user_progress) to make both the statement and the EXPLAIN output easier to read. Also, adding the GROUP BY clause and the aggregate function in the fd view... I'd write the query like this:
SELECT ms.uid AS id
, ms.max_score - fs.first_score AS progress
FROM ( SELECT up.uid
, MAX(CASE WHEN up.game_id = 3 THEN up.score ELSE NULL END) AS max_score
FROM cdu_user_progress up
WHERE up.uid IN ('1671','1672','1673','1674','1675','1676','1679','1716','1725','1726','1937','1964','1996','2062','2065','2066','2085','2086')
AND up.lesson_id = '65'
AND up.score > '-1'
GROUP BY up.uid
) ms
LEFT
JOIN ( SELECT uo.uid
, MIN(uo.score) AS first_score
FROM ( SELECT un.uid
, MIN(CASE WHEN un.game_id = 3 THEN un.date ELSE NULL END) AS first_date
FROM cdu_user_progress un
WHERE un.uid IN ('1671','1672','1673','1674','1675','1676','1679','1716','1725','1726','1937','1964','1996','2062','2065','2066','2085','2086')
AND un.lesson_id = '65'
AND un.score > '-1'
GROUP BY un.uid
) fd
JOIN cdu_user_progress uo
ON uo.uid = fd.uid
AND uo.date = fd.first_date
AND uo.lesson_id = '65'
AND uo.score > '-1'
GROUP BY uo.uid
) fs
ON fs.uid = ms.uid
ORDER BY progress DESC
And I believe that would make the index I recommended above suitable for all of the references to cdu_user_progress.
I have a table in a MySQL database (level_records) which has 3 columns (id, date, reading). I want to put the differences between the most recent 20 readings (by date) into an array and then average them, to find the average difference.
I have looked everywhere, but no one seems to have a scenario quite like mine.
I will be very grateful for any help. Thanks.
SELECT AVG(difference)
FROM (
SELECT #next_reading - reading AS difference, #next_reading := reading
FROM (SELECT reading
FROM level_records
ORDER BY date DESC
LIMIT 20) AS recent20
CROSS JOIN (SELECT #next_reading := NULL) AS var
) AS recent_diffs
DEMO
If we consider "differences" to be signed, and if we ignore/exclude any rows that have a NULL values of reading...
If you want to return just the values of the difference between a reading and the immediately preceding reading (to get the latest nineteen differences), then you could do something like this:
SELECT d.diff
FROM ( SELECT e.reading - #prev_reading AS diff
, #prev_reading AS prev_reading
, #prev_reading := e.reading AS reading
FROM ( SELECT r.date
, r.reading
FROM level_records r
CROSS
JOIN (SELECT #prev_reading := NULL) p
ORDER BY r.date DESC
LIMIT 20
) e
ORDER BY e.date ASC
) d
That'll get you the rows returned from MySQL and you can monkey with them in PHP however you want. (The question of how to monkey around with arrays in PHP is a question that doesn't really have anything to do with MySQL.)
If you want to know how to return rows from a SQL resultset into a PHP array, that doesn't really have anything to do with "latest twenty", "difference", or "average" at all. You'd use the same pattern you'd use for returning the result from any query. There's nothing at all unique about that, there are plenty of examples of that, (most of them unfortunately using the deprecated mysql_ interface; for new development, you want to use either PDO or mysqli_.
If you mean by "all 19 sets of differences" that you want to get the difference between a reading and every other reading, and do that for each reading, such that you get a total of 380 rows ( = 20 * (20-1) rows ) then:
SELECT a.reading - b.reading AS diff
, a.id AS a_id
, a.date AS a_date
, a.reading AS a_reading
, b.id AS b_id
, b.date AS b_date
, b.reading AS b_reading
FROM ( SELECT aa.id
, aa.date
, aa.reading
FROM level_record aa
WHERE aa.reading IS NOT NULL
ORDER BY aa.date DESC, aa.id DESC
LIMIT 20
) a
JOIN ( SELECT bb.id
, bb.date
, bb.reading
FROM level_record bb
WHERE bb.reading IS NOT NULL
ORDER BY bb.date DESC, bb.id DESC
LIMIT 20
) b
WHERE a.id <> b.id
ORDER BY a.date DESC, b.date DESC
Sometimes, we only want differences in one direction, that is, if we have the difference between r13 and r15, we essentially already have the inverse, the difference between r15 and f13. And sometimes, it's more convenient to have the inverse copies.
What query you run really depends on what result set you want returned.
If the goal is to get an "average", then rather than monkeying with PHP arrays, we know that the average of the differences between the latest twenty readings will be the same as the difference between the first and last readings (in the latest twenty), divided by nineteen.
If we only want to return a row if there are at least twenty readings available, then something like this:
SELECT (l.reading - f.reading)/19 AS avg_difference
FROM ( SELECT ll.reading
FROM level_records ll
WHERE ll.reading IS NOT NULL
ORDER BY ll.date DESC LIMIT 1
) l
CROSS
JOIN (SELECT ff.reading
FROM level_records ff
WHERE ff.reading IS NOT NULL
ORDER BY ff.date DESC LIMIT 19,1
) f
NOTE: That query will only return a row only if there are at least twenty rows with non-NULL values of reading in the level_records table.
For the more general case, if there are fewer than twenty rows in the table (i.e. fewer than nineteen differences) and we want an average of the differences between the latest available rows, we can do something like this:
SELECT (l.reading - f.reading)/f.cnt AS avg_difference
FROM ( SELECT ll.reading
FROM level_records ll
WHERE ll.reading IS NOT NULL
ORDER BY ll.date DESC
LIMIT 1
) l
CROSS
JOIN (SELECT ee.reading
, ee.cnt
FROM ( SELECT e.date
, e.reading
, (#i := #i + 1) AS cnt
FROM level_records e
CROSS
JOIN (SELECT #i := -1) i
WHERE e.reading IS NOT NULL
ORDER BY e.date DESC
LIMIT 20
) ee
ORDER BY ee.date ASC
LIMIT 1
) f
But, if we need to treat "differences" as unsigned (that is, we are taking the absolute value of the differences between the readings),
then we'd need to get the actual differences between the readings, and then average the absolute values of the differences...
then we could do make use of a MySQL user variable to keep track of the "previous" reading, and have that available when we process the next row, so we can get the difference between them, something like this:
SELECT AVG(d.abs_diff)
FROM ( SELECT ABS(e.reading - #prev_reading) AS abs_diff
, #prev_reading AS prev_reading
, #prev_reading := e.reading AS reading
FROM ( SELECT r.date
, r.reading
FROM level_records r
CROSS
JOIN (SELECT #prev_reading := NULL) p
ORDER BY r.date DESC
LIMIT 20
) e
ORDER BY e.date ASC
) d
I'm currently experimenting with creating a rough ranking/sorting query that will "score" users according to the data that they submit.
Someone with "president" exactly once in the Role/Position field will be given a score of 100, and anyone with "%vice%" (as in vice president) in the Role/Position field will be scored about half of what is given to those with just "president".
SELECT *, sum(relevance)
FROM (
SELECT a.*,
100 AS relevance
FROM application a,
document d
WHERE d.`Role/Position` LIKE 'president'
AND d.`AppID` = a.`AppId`
AND `AwardID` != 'NULL'
AND `Schoolyear` = '2013-2014'
UNION
SELECT a.*,
50 AS relevance
FROM application a,
document d
WHERE d.`Role/Position` LIKE '%vice%'
AND d.`AppID` = a.`AppId`
AND `AwardID` != 'NULL'
AND `Schoolyear` = '2013-2014'
) results
GROUP BY AppID
ORDER BY sum(relevance) DESC
My problem is that if I omit the union select portion, I can come up with the total of 200 for someone with two "president" fields. If the union select portion is kept in the query, then relevance only results to 100.
A person with two "president" fields is supposed to have 200 and someone with "%vice%" and "president" will have 150 in their sum(relevance) value supposedly. It also does not go beyond 150 for someone with two "president" and two "%vice%". Could someone point out what I am doing wrong?
I have a lot to learn in regards to SQL and web design, which is why I am asking for help in determining where I've gone wrong in my query. I based my query on this this guide as a basis.
UNION does a DISTINCT, which will eliminate duplicate rows. Since you want multiple hits per row in application to be possible, you should use UNION ALL instead;
SELECT *, sum(relevance)
FROM (
SELECT a.*, 100 AS relevance
FROM application a, document d
WHERE d.`Role/Position` LIKE 'president' AND d.`AppID` = a.`AppId`
AND `AwardID` != 'NULL' AND `Schoolyear` = '2013-2014'
UNION ALL
SELECT a.*, 50 AS relevance
FROM application a, document d
WHERE d.`Role/Position` LIKE '%vice%' AND d.`AppID` = a.`AppId`
AND `AwardID` != 'NULL' AND `Schoolyear` = '2013-2014'
) results
GROUP BY AppID
ORDER BY SUM(relevance) DESC