I have about 5 million records in a database table (MySQL 5.6), and I wanted to get the last 2 dates, grouped by ID. Looking around the web, I found examples that allowed me to cobble together the following...
SELECT id, date
FROM
(
SELECT *,
#id_rank := IF(#current_id = id, #id_rank + 1, 1) AS id_rank,
#current_id := id
FROM `data`
ORDER BY id DESC, date DESC
) ranked
WHERE id_rank <= 2
ORDER BY id ASC, date DESC
Running this code from MySQL Workbench, returned 5,700 rows, which is what I expected. I then tried to call this SQL as-is from PHP, using the following SQL string...
$subSql =
"SELECT *, " .
" #id_rank := IF(#current_id = id, #id_rank + 1, 1) AS id_rank, " .
" #current_id := id " .
"FROM `data` " .
"ORDER BY id DESC, date DESC";
$sql =
"SELECT id, date " .
"FROM ($subSql) ranked " .
"WHERE id_rank <= 2 " .
"ORDER BY id ASC, date DESC";
However, running this code resulted in out-of-memory. I them modified the code to return a count of the number of records expected, and instead of the expected 5,700 rows, it returned 4,925,479. So my question is "what do I have to change in my PHP SQL above, to get the correct results that I was getting from MySQL Workbench".
Assigning variables in MySQL has been deprecated. You should be using window functions:
SELECT id, date
FROM (SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) as id_rank
FROM `data` d
) d
WHERE id_rank <= 2
ORDER BY id ASC, date DESC;
Even in older versions your code is not guaranteed to work because MySQL does not guarantee the order of evaluation of expressions. You are assigning the variable in one expression and using it in another -- and those can go in either order.
If your dates are unique per id, you can do this without variables:
select id, date
from (select d.*,
(select d2.date
from data d2
where d2.id = d.id and d2.date >= d.date
order by d2.date desc
limit 1 offset 1
) as date_2
from data d
) d
where date >= date_2 or date_2 is null;
For performance, you want an index on data(id, date).
Related
I have a php application which has workout table where I am storing daily users workout data based on activity(like: run, walk, cycling). At the end of the month I have to create a leaderboard to generate rank based on highest workout points.
SELECT total_distance, user_id,date_created
FROM workouts
WHERE date_created LIKE '%2019-11%' AND activity_type='Run'.
with the above query I am getting output as mention below
I have optimized the query by using GROUP BY user_id in query.
SELECT SUM(total_distance) as points, user_id,date_created
FROM workouts
WHERE date_created LIKE '%2019-11%' AND activity_type='Run'
GROUP BY user_id
I want my final output as shown in 2nd image along with rank as new column.
How to write a query for that. I am not able to do it..
SELECT
CASE WHEN #l=points THEN #rank ELSE #rank:=#rank+1 END as rank,#l:=points ,points,user_id from (select
SUM(total_distance) as points, user_id,date_created
FROM workouts
WHERE date_created LIKE '%2019-07%'
GROUP BY user_id order by points desc) a, (SELECT#rank := 0) vars;
SQL Demo here http://sqlfiddle.com/#!9/0e8634/2
I would change this condition
date_created LIKE '%2019-11%'
into
month(date_created) = '11' and year(date_created) = '2019'
Final query, order rows by points desc and use row_numbers() or danse_rank() for creating ranking.
Function row_numbers()and danse_rank() are introduced since MySQL version >= 8 and since MariaDB >= 10.2.0.
For newer versions (MySQL version >= 8 OR MariaDB >= 10.2.0.)
SELECT SUM(total_distance) as points, user_id, date_created,
ROW_NUMBER() OVER (ORDER BY points desc) as rank
FROM workouts
WHERE month(date_created) = '11' AND year(date_created) = '2019' AND activity_type='Run'
GROUP BY user_id
ORDER BY points desc
For older versions (MySQL version < 8 OR MariaDB < 10.2.0.)
This is also working for (MySQL version >= 8 OR MariaDB >= 10.2.0.)
SET #rank=0;
SELECT SUM(total_distance) as points, user_id, date_created,
#rank:=#rank+1 as rank
FROM workouts
WHERE month(date_created) = '11' AND year(date_created) = '2019' AND activity_type='Run'
GROUP BY user_id
ORDER BY points desc
I would like to better optimize my code. I'd like to have a single query that allows an alias name to have it's own limit and also include a result with no limit.
Currently I'm using two queries like this:
// ALL TIME //
$mikep = mysqli_query($link, "SELECT tasks.EID, reports.how_did_gig_go FROM tasks INNER JOIN reports ON tasks.EID=reports.eid WHERE `priority` IS NOT NULL AND `partners_name` IS NOT NULL AND mike IS NOT NULL GROUP BY EID ORDER BY tasks.show_date DESC;");
$num_rows_mikep = mysqli_num_rows($mikep);
$rating_sum_mikep = 0;
while ($row = mysqli_fetch_assoc($mikep)) {
$rating_mikep = $row['how_did_gig_go'];
$rating_sum_mikep += $rating_mikep;
}
$average_mikep = $rating_sum_mikep/$num_rows_mikep;
// AND NOW WITH A LIMIT 10 //
$mikep_limit = mysqli_query($link, "SELECT tasks.EID, reports.how_did_gig_go FROM tasks INNER JOIN reports ON tasks.EID=reports.eid WHERE `priority` IS NOT NULL AND `partners_name` IS NOT NULL AND mike IS NOT NULL GROUP BY EID ORDER BY tasks.show_date DESC LIMIT 10;");
$num_rows_mikep_limit = mysqli_num_rows($mikep_limit);
$rating_sum_mikep_limit = 0;
while ($row = mysqli_fetch_assoc($mikep_limit)) {
$rating_mikep_limit = $row['how_did_gig_go'];
$rating_sum_mikep_limit += $rating_mikep_limit;
}
$average_mikep_limit = $rating_sum_mikep_limit/$num_rows_mikep_limit;
This allows me to show an all-time average and also an average over the last 10 reviews. Is it really necessary for me to set up two queries?
Also, I understand I could get the sum in the query, but not all the values are numbers, so I've actually converted them in PHP, but left out that code in order to try and simplify what is displayed in the code.
All-time average and average over the last 10 reviews
In the best case scenario, where your column how_did_gig_go was 100% numeric, a single query like this could work like so:
SELECT
AVG(how_did_gig_go) AS avg_how_did_gig_go
, SUM(CASE
WHEN rn <= 10 THEN how_did_gig_go
ELSE 0
END) / 10 AS latest10_avg
FROM (
SELECT
#num + 1 AS rn
, tasks.show_date
, reports.how_did_gig_go
FROM tasks
INNER JOIN reports ON tasks.EID = reports.eid
CROSS JOIN ( SELECT #num := 0 AS n ) AS v
WHERE priority IS NOT NULL
AND partners_name IS NOT NULL
AND mike IS NOT NULL
ORDER BY tasks.show_date DESC
) AS d
But; Unless all the "numbers" are in fact numeric you are doomed to sending every row back from the server for php to process unless you can clean-up the data in MySQL somehow.
You might avoid sending all that data twice if you establish a way for your php to use only the top 10 from the whole list. There are probably way of doing that in PHP.
If you wanted assistance in SQL to do that, then maybe having 2 columns would help, it would reduce the number of table scans.
SELECT
EID
, how_did_gig_go
, CASE
WHEN rn <= 10 THEN how_did_gig_go
ELSE 0
END AS latest10_how_did_gig_go
FROM (
SELECT
#num + 1 AS rn
, tasks.EID
, reports.how_did_gig_go
FROM tasks
INNER JOIN reports ON tasks.EID = reports.eid
CROSS JOIN ( SELECT #num := 0 AS n ) AS v
WHERE priority IS NOT NULL
AND partners_name IS NOT NULL
AND mike IS NOT NULL
ORDER BY tasks.show_date DESC
) AS d
In future (MySQL 8.x) ROW_NUMBER() OVER(order by tasks.show_date DESC) would be a better method than the "roll your own" row numbering (using #num+1) shown before.
I currently have the following tables with:
TABLE klusbonnen_deelnemers:
bonnummer (varchar) - order number
adres (varchar) - order adres
deelnemer (varchar) - user
binnen (date) - date order received
klaar (date) - original order milestone
datum_gereed (date) - date order completed
gereed (varchar) - YES or NO (YES= completed NO= Not yet completed)
datum_factuur (date) - date when user marked order completed (button clicked)
factuur (varchar) - weeknumber order completed
One order(bonnummer) can have multiple users (deelnemer) who all have to mark the order "completed" (datum_gereed). Only when ALL users (deelnemer) have marked an order (bonnummer) "completed" (datum_gereed) the order IS "completed".
I am trying to write a query that gives me:
All completed orders (bonnummer) in a given timespan (last month).
However...
The completion date (datum_gereed) should hold the LAST date (as that is the actual total completion date).
The list should have the Order (bonnummer) with the latest "marked completed" date (datum_factuur) on top (sort DESC) (of course only when all users (deelnemer) have completed the order (all users(deelnemers) having gereed="YES")
So far i have this:
SELECT DISTINCT tbl1.bonnummer AS 'KLUSBONNUMMER', tbl1.adres AS 'ADRES',
tbl1.binnen AS 'BINNENGEKOMEN OP', tbl1.klaar AS 'ORIGINELE STREEFDATUM',
tbl1.datum_gereed AS 'GEREEDGEKOMEN OP', tbl1.factuur AS 'WEEKNUMMER'
FROM klusbonnen_deelnemers AS tbl1
INNER JOIN
( SELECT tbl2.bonnummer
FROM klusbonnen_deelnemers AS tbl2
WHERE tbl2.bonnummer NOT IN (
SELECT tbl3.bonnummer
FROM klusbonnen_deelnemers AS tbl3
WHERE tbl3.gereed = 'NEE')
) AS tbl4 ON tbl1.bonnummer = tbl4.bonnummer
INNER JOIN
( SELECT bonnummer, MAX(datum_gereed) AS 'MAXDATUM'
FROM klusbonnen_deelnemers
GROUP BY bonnummer
) MAXFILTER ON tbl1.bonnummer = MAXFILTER.bonnummer
AND tbl1.datum_gereed = MAXFILTER.MAXDATUM
WHERE tbl1.datum_factuur BETWEEN NOW() - INTERVAL 2 MONTH AND NOW()
ORDER BY tbl1.bonnummer DESC
This query DOES work, however i think this can be done in a much simpler way.
On top of that the query only works in my navicat editor. Calling this query on my "live" website gives an error (subquery in WHERE clause...) (i do have all login correct as other queries DO work).
Anyone out there who can help (simplify) this query? Thx...
this part:
INNER JOIN (SELECT tbl2.bonnummer
FROM klusbonnen_deelnemers AS tbl2
WHERE tbl2.bonnummer NOT IN
(SELECT tbl3.bonnummer
FROM klusbonnen_deelnemers AS tbl3
WHERE tbl3.gereed = 'NEE')) AS tbl4
ON tbl1.bonnummer = tbl4.bonnummer
seems like useless. try to use gereed <> 'NEE' in the "very-bottom"-WHERE
SELECT DISTINCT
kd.bonnummer AS 'KLUSBONNUMMER',
kd.adres AS 'ADRES',
kd.binnen AS 'BINNENGEKOMEN OP',
kd.klaar AS 'ORIGINELE STREEFDATUM',
kd.datum_gereed AS 'GEREEDGEKOMEN OP',
kd.factuur AS 'WEEKNUMMER'
FROM klusbonnen_deelnemers AS kd
INNER JOIN (
SELECT bonnummer, MAX(datum_gereed) AS 'MAXDATUM'
FROM klusbonnen_deelnemers
GROUP BY bonnummer
) AS MAXFILTER
ON (kd.bonnummer = MAXFILTER.bonnummer AND kd.datum_gereed = MAXFILTER.MAXDATUM)
WHERE
kd.gereed <> 'NEE'
kd.datum_factuur BETWEEN NOW() - INTERVAL 2 MONTH AND NOW()
ORDER BY
kd.bonnummer DESC
I have tried the code provided here in stackoverflow, but i still get an error. can someone tell me where im mistaking? thank you! This is my code:
$allOrdersFromToday = $Microinvest->MSelectList('SELECT * ROW_NUMBER() OVER (PARTITION BY Acct ORDER BY ID) AS RowNumber FROM Operations WHERE Date = "' . $todayDate . '" AND OperType = 2 ORDER BY Acct DESC) AS a', '*', 'a.RowNumber = 1');
which should output as:
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY Acct ORDER BY ID) AS RowNumber
FROM Operations
WHERE Date = "' . $todayDate . '" AND OperType = 2 ORDER BY Acct DESC) AS a
WHERE a.RowNumber = 1
but im getting an error... :(
Warning: mssql_query(): message: Incorrect syntax near the keyword
'SELECT'. (severity 15) in /var/www/functions/MssqlLibry.php on line
29
SQL Server does not support order by in subqueries, under most circumstances. Try this:
SELECT a.*
FROM (SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY Acct ORDER BY ID) AS RowNumber
FROM Operations o
WHERE Date = "' . $todayDate . '" AND
OperType = 2
) a
WHERE a.RowNumber = 1
ORDER BY a.Acct DESC;
In addition, you could have a problem because of the format of the date. You should use parameterized queries rather than substituting values into strings.
I have a table that is is sorted 1st by Reminder Date then ID
Table Looks like:
ID | remind_date
1 2011-01-23
2 2010-02-21
4 2011-04-04
5 2011-04-04
6 2009-05-04
I am using a PHP front end to move forward and back thur the records. I want to have forward and back buttons but i am running into a problem with the 2 reminder dates that are the same.
Just to note the ID's are NOT in order, they are here but in the actual database they are mixed up when sorting by reminder_date
The select statement i am using is: ($iid is the current record i am on)
SELECT id FROM myDB.reminders where remind_date > (SELECT remind_date FROM myDB.reminders where id=$iid) order by remind_date ASC LIMIT 1
So what happens when i get to the dates that are the same its skips over one because its asking for remind_date >.
If i use remind_date >= it returns the current record. My solution was then to use limit 2 and check via PHP to if the 1st record = my current ID, if it did use the next one. but what it there are 3 dates the same or 4 etc..
I also thought about using the ID field but since they are out of order i can't add in a ID > $iid.
Any ideas? it works great except for 2 dates that are the same.
You might be able to use this:
SELECT ID, remind_date
FROM
(
SELECT #prev_id := -1
) AS vars
STRAIGHT_JOIN
(
SELECT
ID,
remind_date,
#prev_id AS prev_id,
#prev_id := id
FROM myDB.reminders
ORDER BY remind_date, ID
) T1
WHERE prev_id = $iid
Here is a test of the above with your test data from your comment:
CREATE TABLE Table1 (ID INT NOT NULL, remind_date DATE NOT NULL);
INSERT INTO Table1 (ID, remind_date) VALUES
(45, '2011-01-14'),
(23, '2011-01-22'),
(48, '2011-01-23'),
(25, '2011-01-23'),
(63, '2011-02-19');
SELECT ID, remind_date
FROM
(
SELECT #prev_id := -1
) AS vars
STRAIGHT_JOIN
(
SELECT
ID,
remind_date,
#prev_id AS prev_id,
#prev_id := id
FROM table1
ORDER BY remind_date, ID
) T1
WHERE prev_id = 25
Result:
ID remind_date
48 2011-01-23
add a condition WHERE ID<>MY_LAST_ID. This can not work with triple and more same dates, so you can collect already taken ID's to array like (4,5,6) - see array_push(), implode it with "," to convert to a string (let's call it YOUR_IDS_STRING) and add to your query:
WHERE id NOT IN( YOUR_IDS_STRING )
And after each query make check, does date has changed and if it does - you can unset your array and start from begining (this is not neccesary, but gives you more performance, because YOUR_ID_STRING will be only that long as is need).
If your page is refreshing between queries, maybe try to push YOUR_ID_STRING in session variable, _GET or cookies, and simply concat next id's by operator .=
I used the code provided by Mark Byers and with small changes I adapted it to navigate in opposite directions (and to pass other columns too, not only the date and ID):
$results = $mysqli->query("SELECT * FROM (SELECT #prev_id := -1) AS vars STRAIGHT_JOIN (SELECT *, #prev_id AS prev_id, #prev_id := ID FROM my_table ORDER BY data, ID) T1 WHERE prev_id = ".$ID);
$results = $mysqli->query("SELECT * FROM (SELECT #next_id := 1) AS vars STRAIGHT_JOIN (SELECT *, #next_id AS next_id, #next_id := ID FROM my_table ORDER BY data DESC, ID DESC) T1 WHERE next_id = ".$ID);
I tested it on duplicate dates and it navigates well trough a list of records displayed with:
$results = $mysqli->query("SELECT * FROM my_table ORDER BY data DESC, ID DESC");