I have a question. Suppose I have this table in SQL:
date user_id
2015-03-17 00:06:12 143
2015-03-17 01:06:12 143
2015-03-17 02:06:12 143
2015-03-17 09:06:12 143
2015-03-17 10:10:10 200
I want to get the number of consecutive hours. For example, for user 143, I want to get 2 hours, for user 200 0 hours. I tried like this :
select user_id, TIMESTAMPDIFF(HOUR,min(date), max(date)) as hours
from myTable
group by user_id
But this query fetches all non-consecutive hours. Is it possible to solve the problem with a query, or do I need to post-process the results in PHP?
Use a variable to compare with the previous row.
SELECT user_id, SUM(cont_hour) FROM (
SELECT
user_id,
IF(CONCAT(DATE(#prev_date), ' ', HOUR(#prev_date), ':00:00') - INTERVAL 1 HOUR = CONCAT(DATE(t.date), ' ', HOUR(t.date), ':00:00')
AND #prev_user = t.user_id, 1, 0) AS cont_hour
, #prev_date := t.date
, #prev_user := t.user_id
FROM
table t
, (SELECT #prev_date := NULL, #prev_user := NULL) var_init_subquery
WHERE t.date BETWEEN <this> AND <that>
ORDER BY t.date
) sq
GROUP BY user_id;
I made the comparison a bit more complicated than you expected, but I thought it's necessary, that you don't just compare the hour, but also, that it's the same date (or the previous day, when it's around midnight).
you can read more about user variables here
As a short explanation: The ORDER BY is very important, as well as the order in the SELECT clause. The #prev_date holds the "previous row", cause we assign the value of the current row after we made our comparison.
Another version using temporary variables:
SET #u := 0;
SET #pt := 0;
SET #s := 0;
SELECT `user_id`, MAX(`s`) `conseq` FROM
(
SELECT
#s := IF(#u = `user_id`,
IF(UNIX_TIMESTAMP(`date`) - #pt = 3600, #s + 1, #s),
0) s,
#u := `user_id` `user_id`,
#pt := UNIX_TIMESTAMP(`date`) pt
FROM `users`
ORDER BY `date`
) AS t
GROUP BY `user_id`
The subquery sorts the rows by date, then compares user_id with the previous value. If user IDs are equal, calculates the difference between date and the previous timestamp #pt. If the difference is an hour (3600 seconds), then the #s counter is incremented by one. Otherwise, the counter is reset to 0:
s user_id pt
0 143 1426529172
1 143 1426532772
2 143 1426536372
2 143 1426561572
0 200 1426565410
The outer query collects the maximum counter values per user_id, since the maximum counter value corresponds to the last counter value per user_id.
Output
user_id conseq
143 2
200 0
Note, the query accepts the difference of exactly 1 hour. If you want a more flexible condition, simply adjust the comparison. For example, you can accept a difference in interval between 3000 and 4000 seconds as follows:
#s := IF(#u = `user_id`,
IF( (UNIX_TIMESTAMP(`date`) - #pt) BETWEEN 3000 AND 4000, #s + 1, #s),
0) s
Related
I have a database with colums I am working on. What I am looking for is the date associated with the row where the SUM(#) reaches 6 in a query. The query I have now will give the date when the number in the colum is six but not the sum of the previous rows. example below
Date number
---- ------
6mar16 1
8mar16 4
10mar16 6
12mar16 2
I would like to get a query to get the 10mar16 date because on that date the number is now greater than 6. Earlier dates wont total up to six.
Here is an example of a query i have been working on:
SELECT max(date) FROM `numbers` WHERE `number` > 60
You could use this query, which tracks the accumulated sum and then returns the first one that meets the condition:
select date
from (select * from mytable order by date) as base,
(select #sum := 0) init
where (#sum := #sum + number) >= 6
limit 1
SQL Fiddle
Most databases support ANSI standard window functions. In this case, cumulative sum is your friend:
select t.*
from (select t.*, sum(number) over (order by date) as sumnumber
from t
) t
where sumnumber >= 10
order by sumnumber
fetch first 1 row only;
In MySQL, you need variables:
select t.*
from (select t.*, (#sumn := #sumn + number) as sumnumber
from t cross join (select #sumn) params
order by date
) t
where sumnumber >= 10
order by sumnumber
fetch first 1 row only;
Awesome!!!! It seems to be working great. Here is the code that I used.
SELECT date, id, crewname
FROM (select * FROM flightrecord WHERE `crewname` = 'brayn'
ORDER BY dutyTimeArrive DESC) as base,
(select #sum := 0) init
WHERE (#sum := #sum + tankDropCount) >= 6
limit 1
I have no idea how to solve the following problem: I have several rows in my database with one timestamp per row. Now I would like to filter all rows for entries until the date interval for any two dates is bigger than 30 days. I have no defined date interval for specific dates, like between 12/01/2017 and 11/01/2017, that would be easy, even for me. All I know is that the timestamp interval from one row to the next row (query must be sorted by timestamp desc) must not be bigger than 30 days.
Please see my db at http://sqlfiddle.com/#!9/55a521/2
In this case the last entry shown should be the one with id 65404844. I would appreciate if you might give me a small hint for this.
Thank you very much!
You can use this query to build a filter.
SELECT
t.id,
from_unixtime(timestamp)
, IF(#pt < timestamp - 30*24*60*60, 1, 0) AS filter
, #pt := timestamp
FROM
t
, (SELECT #pt := MIN(timestamp) FROM t) v
ORDER BY timestamp
see it working live in an sqlfiddle
Important here is to order by timestamp. Then you initialize the #pt variable with the lowest value. Another important thing is to have the select clause in the right order.
First you compare the current record with the variable in the IF() function. Then you assign the current record to the variable. This way when the next row is evaluated, the variable still holds the value of the previous row in the IF() function.
To get the rows you want, use above query in a subquery to filter.
SELECT id, ts FROM (
SELECT
t.id,
from_unixtime(timestamp) as ts
, IF(#pt < timestamp - 30*24*60*60, 1, 0) AS filter
, #pt := timestamp
FROM
t
, (SELECT #pt := MIN(timestamp) FROM t) v
ORDER BY timestamp
) sq
WHERE sq.filter = 1
This filters out the rows that have a more than 30 days difference from the previous rows. (1st solution) - only works if the id column has consecutive values
SELECT t.id, t.timestamp, DATEDIFF(FROM_UNIXTIME(t1.timestamp), FROM_UNIXTIME(t.timestamp)) AS days_diff
FROM tbl t
LEFT JOIN tbl t1
ON t.id = t1.id + 1
HAVING days_diff <= 30
ORDER BY t.timestamp DESC;
This filters all the results that are within 30 days of each of the other entries.
SELECT *
FROM tbl t
WHERE EXISTS (
SELECT id
FROM tbl t1
WHERE DATEDIFF(FROM_UNIXTIME(t1.timestamp), FROM_UNIXTIME(t.timestamp)) < 30
AND t1.id <> t.id
)
ORDER BY t.timestamp desc;
I have a MYSQL table for tasks where each task has a date, start time,end time and user_id. I want to calculate total number of hours on specific date.
Table Structure
CREATE TABLE tasks
(`id` int,`user_id` int, `title` varchar(30), `task_date` datetime, `start` time, `end` time)
;
INSERT INTO tasks
(`id`,`user_id`, `title`,`task_date`, `start`, `end`)
VALUES
(1,10, 'Task one','2013-04-02', '02:00:00', '04:00:00'),
(2,10, 'Task two','2013-04-02', '03:00:00', '06:00:00'),
(3,10, 'Task three.','2013-04-02','06:00:00', '07:00:00');
MYSQL Query
select TIME_FORMAT(SEC_TO_TIME(sum(TIME_TO_SEC(TIMEDIFF( end, start)))), "%h:%i") AS diff
FROM tasks
where task_date="2013-04-02"
The result am getting is "06:00" Hours which is fine, but I want to exclude the overlap hours. In the example I gave result should be "05:00" Hours when the hour between 3-4 in the 2nd record is excluded because this hour is already exist in the 1st record between 2-4.
1st record 2=>4 = 2 Hours
2nd record 3=>6 = 3 hours 3-1 hour=2 (The 1 hour is the overlap hour between 1st and 2nd record )
3rd 6=>7=1
Total is 5 Hours
I hope I made my question clear. Example http://sqlfiddle.com/#!2/05dd8/2
Here is an idea that uses variables (in other databases, CTEs and window functions would make this much easier). The idea is to first list all the times -- starts and ends. Then, keep track of the cumulative number of starts and stops.
When the cumulative number is greater than 0, then include the difference from the previous time. If equal to 0, then it is the beginning of a new time period, so nothing is added.
Here is an example of the query, which is simplified a bit for your data by not keeping track of changes in user_id:
select user_id, TIME_FORMAT(SEC_TO_TIME(sum(secs)), '%h:%i')
from (select t.*,
#time := if(#sum = 0, 0, TIME_TO_SEC(TIMEDIFF(start, #prevtime))) as secs,
#prevtime := start,
#sum := #sum + isstart
from ((select user_id, start, 1 as isstart
from tasks t
) union all
(select user_id, end, -1
from tasks t
)
) t cross join
(select #sum := 0, #time := 0, #prevtime := 0) vars
order by 1, 2
) t
group by user_id;
Here is a SQL Fiddle showing it working.
I am using a query that takes an average of all the records for each given id...
$query = "SELECT bline_id, AVG(flow) as flowavg
FROM blf
WHERE bline_id BETWEEN 1 AND 30
GROUP BY bline_id
ORDER BY bline_id ASC";
These records are each updated once daily. I would like to use only the 10 most recent records for each id in my average.
Any help would be qreatly appreciated.
blf table structure is:
id | bline_id | flow | date
If these are really updated every day, then use date arithmetic:
SELECT bline_id, AVG(flow) as flowavg
FROM blf
WHERE bline_id BETWEEN 1 AND 30 and
date >= date_sub(now(), interval 10 day)
GROUP BY bline_id
ORDER BY bline_id ASC
Otherwise, you have to put in a counter, which you can do with a correlated subquery:
SELECT bline_id, AVG(flow) as flowavg
FROM (select blf.*,
(select COUNT(*) from blf blf2 where blf2.bline_id = blf.bline_id and blf2.date >= blf.date
) seqnum
from blf
) blf
WHERE bline_id BETWEEN 1 AND 30 and
seqnum <= 10
GROUP BY bline_id
ORDER BY bline_id ASC
Another option is to simulate ROW_NUMBER().
This statement creates a counter and resets it every time it encounters a new bline_id. It then filters out any records that aren't in the first 10 rows.
SELECT bline_id,
Avg(flow) avg
FROM (SELECT id,
bline_id,
flow,
date,
CASE
WHEN #previous IS NULL
OR #previous = bline_id THEN #rownum := #rownum + 1
ELSE #rownum := 1
end rn,
#previous := bline_id
FROM blf,
(SELECT #rownum := 0,
#previous := NULL) t
WHERE bline_id > 0 and bline_id < 31
ORDER BY bline_id,
date DESC,
id) t
WHERE rn < 11
GROUP BY bline_id
DEMO
It's worthwhile seeing this in action by removing the group by and looking at intermediate results
//This is my query
SELECT bline_id, ROUND(Avg(flow),3) avg
FROM (SELECT id, bline_id, flow, date, CASE
WHEN #previous IS NULL
OR #previous = bline_id THEN #rownum := #rownum + 1
ELSE #rownum := 1
end rn,
#previous := bline_id
FROM blf,
(SELECT #rownum := 0,
#previous := NULL) t
WHERE bline_id > 0 and bline_id < 31
ORDER BY bline_id,
date DESC,
id) t
WHERE rn < 11
GROUP BY bline_id
This query takes the average of the last 10 records. I would like to be able to save these results back into the db, and compare them to the next group of 10 when a new record is added.
The end result I am looking for is to be able to tell if there is a change in the average by +or- 2%.
Does this make sense?
You could create a table with the following fields:
id, bline_id, avg, timestamp
Every time you add a record, insert the results of your query above into this table.
You can then compare the latest record in this table with the previous one.