Get 5 random rows from MySQL DB [duplicate] - php

This question already has answers here:
MySQL select 10 random rows from 600K rows fast
(28 answers)
Closed 8 years ago.
I have searched all over for an answer and although people say not to use the ORDER BY RAND() clause, I think for my purposes it is ok as this is for a competition which barely has more than a few hundred records at a time PER competition.
So basically i need to retrieve 5 random records from a competition entries table. However any loyalty customers will received an additional EXTRA entry so example:
compEntryid | firstName | lastName | compID |
1 | bob | smith | 100
2 | bob | smith | 100
3 | jane | doe | 100
4 | sam | citizen | 100
etc
So we are giving the loyalty members a better chance at winning a prize. However im a little worried that the returned result from a usual ORDER BY RAND() can include 2 entries of the SAME person ? What is an optimised method to ensure that we truly have 5 random records but at the same time giving those extra entrants a better or (weighted) chance ? Happy to use multiple queries, sub-queries or even a mix of MySQL and PHP ? Any advice is deeply appreciated thank you !
Bass
EDIT:
These 2 queries both work!
query1
SELECT concat(firstName, " ", lastName) name,id, email
FROM t WHERE
RAND()<(SELECT ((5/COUNT(id))*10) FROM t)
group by email ORDER BY RAND() limit 5;
query2
select distinct
email, id, firstName, lastName from
(
select id ,
email, firstName , lastName , compID, rand()/(select count(*) from t where
email=t1.email
) as rank
from t t1
where compID = 100
order by rank) t2 limit 5;
http://sqlfiddle.com/#!2/73470c/2

If you have a few hundred record, I think that order by rand() solution should be fine:
subquery will order weighting number of entries, but duplicates remains. Parent SELECT will take the first 5 distinct rows.
SELECT DISTINCT firstName ,
lastName ,
compID
FROM
( SELECT compEntryid ,firstName , lastName , compID, rand()/(select count(*)
FROM t
WHERE firstName=t1.firstName AND
lastName = t1.lastName) AS rank
FROM t t1
WHERE compID = 100
ORDER BY rank) t2
LIMIT 5
Fiddle

I think you will need to use a sub query if you want to return a compEntryid.
SELECT t.firstName, t.lastName, t.compID, MIN(compEntryid)
FROM t
INNER JOIN
(
SELECT DISTINCT firstName, lastName, compID
FROM t
ORDER by rand()
LIMIT 5
) t2
ON t.firstName = t2.firstName
AND t.lastName = t2.lastName
AND t.compID = t2.compID
GROUP BY t.firstName, t.lastName, t.compID;
This uses a sub query to get 5 random firstName / lastName / compID. Then joins against the table to get the MIN compEntryId.
However not certain about this. Think it will eliminate the duplicates in the sub query before performing the order / limit, which would prevent someone with more entries having more chances.
EDIT
More of a play and I think I have found a solution. Although efficiency is not one of its strong points.
SELECT MIN(compEntryid), firstName, lastName, compID
FROM
(
SELECT firstName, lastName, compID, compEntryid, #seq:=#seq+1 AS seq
FROM
(
SELECT firstName, lastName, compID, compEntryid
FROM t
ORDER by rand()
) sub0
CROSS JOIN (SELECT #seq:=0) sub1
) sub2
GROUP BY sub2.firstName, sub2.lastName, sub2.compID
ORDER BY MIN(seq)
LIMIT 5
This has an inner sub query that gets all the records in a random order. Around that another sub query adds a sequence number to the records. The outer query groups by the name, etc, and orders by the min sequence number for that name. The compEntryId is just grabbed as the MIN for the name / competition (I am assuming you don't care too much about this).
This way if someone had 5 entries the inner sub query would mix them up in the list. the next sub query would add a sequence number. At this stage those 5 entries could be sequence numbers 1 to 5. The outer one would order by the lowest sequence number for the name and ignore the others, so of those 5 only sequence number 1 would be used and 2 to 5 ignored, with the next selected person being the one with sequence number 6.
This way the more entries they have the more likely they are to be a winner, but can't be 2 of the 5 winners.
With thanks to kiks73 for setting up some sqlfiddle data:-
http://sqlfiddle.com/#!2/cd777/1
EDIT
A solution based on that above by #kiks73. Tweaked to use a non correlated sub query for the counts, and eliminates a few uncertainties. For example with his solution I am not quite sure whether MySQL will chose to do the DISTINCT by implicitly doing a GROUP BY, which would also implicitly do an orderering of the results prior to doing the limit (it doesn't seem to, but I am not sure this behaviour is defined).
SELECT t.firstName ,
t.lastName ,
t.compID,
MIN(rand() / t1.entry_count) AS rank
FROM
(
SELECT firstName, lastName, compID, COUNT(*) AS entry_count
FROM t
GROUP BY firstName, lastName, compID
) t1
INNER JOIN t
ON t.firstName=t1.firstName
AND t.lastName = t1.lastName
AND t.compID = t1.compID
GROUP BY t.firstName, t.lastName, t.compID
ORDER BY rank
LIMIT 5

Related

MySQL, getting all top users

I have a table named items which has 3 columns : id, user_id, item_name.
I want to select and show all users that have most submitted items in that table.
For instance :
User-1 has 3 items,
User-2 has 8 items,
User-3 has 5 items,
User-4 has 8 items, and
User-5 has 8 items too.
Based on what I need, the query should be outputting User-2, User-4 and User-5.
My knowledge of MySQL is not thorough unfortunately and I can't get this done by myself.
Your help is much appreciated.
EDIT #1 :
Here's the query that I tried and didn't output my desired result :
SELECT COUNT(id) AS total_count
, user_id
FROM ".DB_PREFIX."items
GROUP
BY user_id
It shows all users and their total number of items submitted. As I mentioned earlier, I need all top users.
E.g.:
SELECT a.*
FROM
( SELECT user_id
, COUNT(*) total
FROM my_table
GROUP
BY user_id
) a
JOIN
( SELECT COUNT(*) total
FROM my_table
GROUP
BY user_id
ORDER
BY total DESC
LIMIT 1
) b
ON b.total = a.total;

Issue with SQL Join & Exclude Query

I have a set of queries that I am trying to run but I am having issues getting them to run together.
My set up is as follows with column names in parantheses:
Table 1 (Email / Date)
Table 2 (Email / Date_Submitted)
I have written 3 queries which each work perfectly, independent of each other, but I cannot seem to figure out how to connect them.
Query 1 - Distinct Emails from Table 1 (rfi_log)
SELECT DISTINCT email, date_submitted
FROM rfi_log
WHERE date_submitted BETWEEN '[start_date]' AND '[end_date]'
Query 2 - Distinct Emails from Table 2 (masterstudies)
SELECT DISTINCT email
FROM orutrimdb.mastersstudies
WHERE date BETWEEN '[start_date]' AND '[end_date]'
Query 3 - Join Query looking for duplicate emails from Table 1 & Table 2
SELECT rfi_log.email as emails, orutrimdb.mastersstudies.email
FROM rfi_log
CROSS JOIN orutrimdb.mastersstudies
ON orutrimdb.mastersstudies.email=rfi_log.email
WHERE date_submitted BETWEEN '[start_date]' AND '[end_date]';
My issue now is that I need to combine these queries by some fashion so that I can get a count of DISTINCT emails from both tables during the date range while EXCLUDING the emails identified from Query 3.
I need the following:
Query 3 = Count of Distinct Emails
Query 2 = Count of Distinct Emails (not identified in Query 3)
Query 1 = Count of Distinct Emails (not identified in Query 3)
Ultimately I need to get a total count of distinct emails during the date range that is "de-duplicated" since there are duplicates located in both tables.
How can this be accomplished?
One method for doing this is union all with aggregation. The following gets duplication information about each email:
select email, sum(isrfi) as numrfi, sum(isms) as numms
from ((select email, 1 as isrfi, 0 as isms
from rfilog
) union all
(select email, 0, 1
from orutrimdb.mastersstudies
)
) e
group by email;
An aggregation on top gives you the information you are looking for:
select numrfi, numms, count(*), min(email), max(email)
from (select email, sum(isrfi) as numrfi, sum(isms) as numms
from ((select email, 1 as isrfi, 0 as isms
from rfilog
) union all
(select email, 0, 1
from orutrimdb.mastersstudies
)
) e
group by email
) e
group by numrfi, numms;
Note that this also finds duplicates within a single table.

GROUP_CONCAT with ordering and missing fields

I have a series of tables that I want to get rows returned from in the following format:
Student ID | Last Name | First Name | Quiz Scores
-------------------------------------------------
xxxxxxx | Snow | Jon | 0,0,0,0,0,0,0,0
There's 3 relevant tables (changing any existing DB structure is not an option):
person - table of all people in the organization
enrollment - table of student and faculty enrollment data
tilt.quiz - table of quiz scores, with each row storing an individual score
The tricky part of this is the Quiz Scores. A row for the quiz score only exists if the student has taken a the quiz. Each quiz row has a module, 1 - 8. So possible quiz data for a student could be (each of these being a separate row):
person_id | module | score
---------------------------
223355 | 1 | 100
223355 | 2 | 95
223355 | 4 | 80
223355 | 7 | 100
I need the quiz scores returned in proper order with 8 comma separated values, regardless if any or all of the quizzes are missing.
I currently have the following query:
SELECT
person.id,
first_name,
last_name,
GROUP_CONCAT(tilt.quiz.score) AS scores
FROM person
LEFT JOIN enrollment ON person.id = enrollment.person_id
LEFT JOIN tilt.quiz ON person.id = tilt.quiz.person_id
WHERE
enrollment.course_id = '$num' AND enrollment_status_id = 1
GROUP BY person.id
ORDER BY last_name
The problems with this are:
It does not order the quizzes by module
If any of the quizzes are missing it simply returns fewer values
So I need the GROUP_CONCAT scores to at least include commas for missing quiz values, and have them ordered correctly.
The one solution I considered was creating a temporary table of the quiz scores, but I'm not sure this is the most efficient method or exactly how to go about it.
EDIT: Another solution would be to execute a query to check for the existence of each quiz individually but this seems clunky (a total of 9 queries instead of 1); I was hoping there was a more elegant way.
How would this be accomplished?
There are some assumptions here about your data structure, but this should be pretty close to what you're after. Take a look at the documentation for GROUP_CONCAT and COALESCE.
SELECT `person`.`id`, `person`.`first_name`, `person`.`last_name`,
GROUP_CONCAT(
COALESCE(`tilt`.`quiz`.`score`, 'N/A')
ORDER BY `tilt`.`quiz`.`module_id`
) AS `scores`
FROM `person`
CROSS JOIN `modules`
LEFT JOIN `enrollment` USING (`person_id`)
LEFT JOIN `tilt`.`quiz` USING (`person_id`, `module_id`)
WHERE (`enrollment`.`course_id` = '$num')
AND (`enrollment`.`enrollment_status_id` = 1)
GROUP BY `person`.`id`
ORDER BY `person`.`last_name`
First thing to do is use the IFNULL() function on the score
Then, use ORDER BY inside the GROUP_CONCAT
Here is my proposed query
SELECT
person.id,
first_name,
last_name,
GROUP_CONCAT(IFNULL(tilt.quiz.score,0) ORDER BY tilt.quiz.module) AS scores
FROM person
LEFT JOIN enrollment ON person.id = enrollment.person_id
LEFT JOIN tilt.quiz ON person.id = tilt.quiz.person_id
WHERE
enrollment.course_id = '$num' AND enrollment_status_id = 1
GROUP BY person.id
ORDER BY last_name

PDO and prepared statements on dynamic sized queries

I am developing a small gaming website for college fest where users attend few contests and based on their ranks in result table, points are updated in their user table. Then the result table is truncated for the next event. The schemas are as follows:
user
-------------------------------------------------------------
user_id | name | college | points |
-------------------------------------------------------------
result
---------------------------
user_id | score
---------------------------
Now, the first 3 students are given 100 points, next 15 given 50 points and others are given 10 points each.
Now, I am having problem in developing queries because I don't know how many users will attempt the contest, so I have to append that many ? in the query. Secondly, I also need to put ) at the end.
My queries are like
$query_top3=update user set points =points+100 where id in(?,?,?);
$query_next5=update user set points = points +50 where id in(?,?,?,?,?);
$query_others=update user set points=points+50 where id in (?,?...........,?);
How can I prepare those queries dynamically? Or, is there any better approach?
EDIT
Though its similar to this question,but in my scenario I have 3 different dynamic queries.
If I understand correctly your requirements you can rank results and update users table (adding points) all in one query
UPDATE users u JOIN
(
SELECT user_id,
(
SELECT 1 + COUNT(*)
FROM result
WHERE score >= r.score
AND user_id <> r.user_id
) rank
FROM result r
) q
ON u.user_id = q.user_id
SET points = points +
CASE
WHEN q.rank BETWEEN 1 AND 3 THEN 100
WHEN q.rank BETWEEN 4 AND 18 THEN 50
ELSE 10
END;
It totally dynamic based on the contents in of result table. You no longer need to deal with each user_id individually.
Here is SQLFiddle demo

Get multiple GROUP BY results per group, or use separate concatenated table

I am working on an auction web application. Now i have a table with bids, and from this table i want to select the last 10 bids per auction.
Now I know I can get the last bid by using something like:
SELECT bids.id FROM bids WHERE * GROUP BY bids.id ORDER BY bids.created
Now I have read that setting an amount for the GROUP BY results is not an easy thing to do, actually I have found no easy solution, if there is i would like to hear that.
But i have come up with some solutions to tackle this problem, but I am not sure if i am doing this well.
Alternative
The first thing is creating a new table, calling this bids_history. In this table i store a string of the last items.
example:
bids_history
================================================================
auction_id bid_id bidders times
1 20,25,40 user1,user2,user1 time1,time2,time3
I have to store the names and the times too, because I have found no easy way of taking the string used in bid_id(20,25,40), and just using this in a join.
This way i can just just join on auction id, and i have the latest result.
Now when there is placed a new bid, these are the steps:
insert bid into bids get the lastinserteid
get the bids_history string for this
auction product
explode the string
insert new values
check if there are more than 3
implode the array, and insert the string again
This all seems to me not a very well solution.
I really don't know which way to go. Please keep in mind this is a website with a lot of bidding's, they can g up to 15.000 bidding's per auction item. Maybe because of this amount is GROUPING and ORDERING not a good way to go. Please correct me if I am wrong.
After the auction is over i do clean up the bids table, removing all the bids, and store them in a separate table.
Can someone please help me tackle this problem!
And if you have been, thanks for reading..
EDIT
The tables i use are:
bids
======================
id (prim_key)
aid (auction id)
uid (user id)
cbid (current bid)
created (time created)
======================
auction_products
====================
id (prim_key)
pid (product id)
closetime (time the auction closses)
What i want as the result of the query:
result
===============================================
auction_products.id bids.uid bids.created
2 6 time1
2 8 time2
2 10 time3
5 3 time1
5 4 time2
5 9 time3
7 3 time1
7 2 time2
7 1 time3
So that is per auction the latest bids, to choose by number, 3 or 10
Using user variable, and control flow, i end up with that (just replace the <=3 with <=10 if you want the ten auctions) :
SELECT a.*
FROM
(SELECT aid, uid, created FROM bids ORDER BY aid, created DESC) a,
(SELECT #prev:=-1, #count:=1) b
WHERE
CASE WHEN #prev<>a.aid THEN
CASE WHEN #prev:=a.aid THEN
#count:=1
END
ELSE
#count:=#count+1
END <= 3
Why do this in one query?
$sql = "SELECT id FROM auctions ORDER BY created DESC LIMIT 10";
$auctions = array();
while($row = mysql_fetch_assoc(mysql_query($sql)))
$auctions[] = $row['id'];
$auctions = implode(', ', $auctions);
$sql = "SELECT id FROM bids WHERE auction_id IN ($auctions) ORDER BY created LIMIT 10";
// ...
You should obviously handle the case where, e.g. $auctions is empty, but I think this should work.
EDIT: This is wrong :-)
You will need to use a subquery:
SELECT bids1.id
FROM ( SELECT *
FROM bids AS bids1 LEFT JOIN
bids AS bids2 ON bids1.created < bids2.created
AND bids1.AuctionId = bids2.AuctionId
WHERE bid2.id IS NULL)
ORDER BY bids.created DESC
LIMIT 10
So the subquery performs a left join from bids to itself, pairing each record with all records that have the same auctionId and and a created date that is after its own created date. For the most recent record, there will be no other record with a greater created date, and so that record would not be included in the join, but since we use a Left join, it will be included, with all the bids2 fields being null, hence the WHERE bid2.id IS NULL statement.
So the sub query has one row per auction, contianing the data from the most recent bid. Then simply select off the top ten using orderby and limit.
If your database engine doesn't support subqueries, you can use a view just as well.
Ok, this one should work:
SELECT bids1.id
FROM bids AS bids1 LEFT JOIN
bids AS bids2 ON bids1.created < bids2.created
AND bids1.AuctionId = bids2.AuctionId
GROUP BY bids1.auctionId, bids1.created
HAVING COUNT(bids2.created) < 9
So, like before, left join bids with itself so we can compare each bid with all the others. Then, group it first by auction (we want the last ten bids per auction) and then by created. Because the left join pairs each bid with all previous bids, we can then count the number of bids2.created per group, which will give us the number of bids occurring before that bid. If this count is < 9 (because the first will have count == 0, it is zero indexed) it is one of the ten most recent bids, and we want to select it.
To select last 10 bids for a given auction, just create a normalized bids table (1 record per bid) and issue this query:
SELECT bids.id
FROM bids
WHERE auction = ?
ORDER BY
bids.created DESC
LIMIT 10
To select last 10 bids per multiple auctions, use this:
SELECT bo.*
FROM (
SELECT a.id,
COALESCE(
(
SELECT bi.created
FROM bids bi
WHERE bi.auction = a.id
ORDER BY
bi.auction DESC, bi.created DESC, bi.id DESC
LIMIT 1 OFFSET 9
), '01.01.1900'
) AS mcreated
COALESCE(
(
SELECT bi.id
FROM bids bi
WHERE bi.auction = a.id
ORDER BY
bi.auction DESC, bi.created DESC, bi.id DESC
LIMIT 1 OFFSET 9
), 0)
AS mid
FROM auctions a
) q
JOIN bids bo
ON bo.auction >= q.auction
AND bo.auction <= q.auction
AND (bo.created, bo.id) >= (q.mcreated, q.mid)
Create a composite index on bids (auction, created, id) for this to work fast.

Categories