Calculate overlapping durations in MySQL/PHP - php

This is making my head hurt! :P
I have an assignments table, and I'd like to calculate a member's duration based on their assignments. In its simplified form, this would be relatively straight forward.
-------------------------------------------------------------------------
| id | member_id | unit_id | start_date | end_date |
-------------------------------------------------------------------------
| 1 | 2 | 23 | 2013-01-01 | 2013-02-01 |
-------------------------------------------------------------------------
| 2 | 2 | 25 | 2013-02-01 | 2013-03-01 |
-------------------------------------------------------------------------
| 3 | 2 | 27 | 2013-03-01 | NULL |
-------------------------------------------------------------------------
This would just be a matter of doing a SUM() of the DATEDIFF() on start_date and end_date. The issue is that members have the potential to have concurrent assignments.
-------------------------------------------------------------------------
| id | member_id | unit_id | start_date | end_date |
-------------------------------------------------------------------------
| 1 | 2 | 23 | 2013-01-01 | 2013-02-01 |
-------------------------------------------------------------------------
| 2 | 2 | 25 | 2013-02-01 | 2013-03-01 |
-------------------------------------------------------------------------
| 3 | 2 | 30 | 2013-02-15 | 2013-03-01 |*
-------------------------------------------------------------------------
| 4 | 2 | 27 | 2013-03-01 | NULL |
-------------------------------------------------------------------------
Now I have to somehow realize that #3 occurred during the same time as #2, so I shouldn't add it to the SUM().
Going further, what if the member has gaps in their duration?
-------------------------------------------------------------------------
| id | member_id | unit_id | start_date | end_date |
-------------------------------------------------------------------------
| 1 | 2 | 23 | 2013-01-01 | 2013-02-01 |
-------------------------------------------------------------------------
| 2 | 2 | 25 | 2013-02-01 | 2013-02-05 |*
-------------------------------------------------------------------------
| 3 | 2 | 30 | 2013-02-15 | 2013-03-01 |*
-------------------------------------------------------------------------
| 4 | 2 | 27 | 2013-03-01 | NULL |
-------------------------------------------------------------------------
Also, NULL means "current" so that would be CURDATE().
Any ideas?

Here is the idea. Break each record into two to get a list of dates when assignments start and stop. Then determine how many assignments are active on a given date -- basically adding "1" for each start and "-1" for each end and taking the cumulative sum.
Next, you need to determine when the next date is to get periods before doing the final aggregation.
The first part is handled by this query:
select member_id, thedate,
#sumstart := if(#prevmemberid = memberid, #sumstart + isstart, isstart) as sumstart,
#prevmemberid := memberid
from (select member_id, start_date as thedate, 1 as isstart
from assignments
union all
select member_id, end_date, -1 as isstart
from assignments
order by member_id, thedate
) a cross join
(select #sumstart := 0, #prevmemberid := NULL) const;
The rest then uses more variables:
select member_id,
sum(case when sumstart > 0 then datediff(nextdate, thedate) end) as daysactive
from (select member_id, thedate, sumstart,
if(#prevmemberid = memberid, #nextdate, NULL) as nextdate,
#prevmemberid := memberid,
#nextdate = thedate
from (select member_id, thedate,
#sumstart := if(#prevmemberid = memberid, #sumstart + isstart, isstart) as sumstart,
#prevmemberid := memberid
from (select member_id, start_date as thedate, 1 as isstart
from assignments
union all
select member_id, coalesce(end_date, CURDATE()), -1 as isstart
from assignments
order by member_id, thedate
) a cross join
(select #sumstart := 0, #prevmemberid := NULL) const;
) a cross join
(select #nextmemberid := NULL, #nextdate := NULL) const
order by member_id, thedate desc;
) a
group by member_id;
I don't like using variables in this way, because MySQL does not guarantee the ordering of variable assignments in a given select. In practice, though, they are evaluated in the order written (which this query depends on). Although this could be written without variables, without the with statement, window functions, or even views that take subqueries in the from clause, the resulting SQL would be much uglier.

I think it's easier to perform filter out the overlapping assignments in the code rather than in SQL.
You can retrieve all the assignments for a certain member_id, ordered by start_date:
select * from assignments where member_id='2' order by start_date asc
You can then loop over these assignments and filter out the overlapping assignments.
Two assignments A and B are non-overlapping if A ends before B starts or if B ends before A starts.
Because we ordered the results according to start date, we can safely ignore the second case: B will never start before A, so it cannot end before A starts.
We then get something like:
for i=0..assignments.length
for j=i+1..assignments.length
if (assignments[j].start_date < assignments[i].end_date)
assignments[j] = null; // it overlaps -> get rid of it
Then loop over the assignments and sum the durations for the non-null assignments. This should be easy

Related

Get rows above and below (neighbouring rows) a certain row, based on two criteria SQL

Say I have a table like so:
+---+-------+------+---------------------+
|id | level |score | timestamp |
+---+-------+------+---------------------+
| 4 | 1 | 70 | 2021-01-14 21:50:38 |
| 3 | 1 | 90 | 2021-01-12 15:38:0 |
| 1 | 1 | 20 | 2021-01-14 13:10:12 |
| 5 | 1 | 50 | 2021-01-13 12:32:11 |
| 7 | 1 | 50 | 2021-01-14 17:15:20 |
| 8 | 1 | 55 | 2021-01-14 09:20:00 |
| 10| 2 | 99 | 2021-01-15 10:50:38 |
| 2 | 1 | 45 | 2021-01-15 10:50:38 |
+---+-------+------+---------------------+
What I want to do is show 5 of these rows in a table (in html), with a certain row (e.g. where id=5) in the middle and have the two rows above and below it (in the correct order). Also where level=1. This will be like a score board but only showing the user's score with the two above and two below.
So because scores can be the same, the timestamp column will also need to be used - so if two scores are equal, then the first person to get the score is shown above the other person.
E.g. say the user is id=5, I want to show
+---+-------+------+---------------------+
|id | level |score | timestamp |
+---+-------+------+---------------------+
| 4 | 1 | 70 | 2021-01-14 21:50:38 |
| 8 | 1 | 55 | 2021-01-14 09:20:00 |
| 5 | 1 | 50 | 2021-01-13 12:32:11 |
| 7 | 1 | 50 | 2021-01-14 17:15:20 |
| 2 | 1 | 45 | 2021-01-15 10:50:38 |
| 1 | 1 | 20 | 2021-01-14 13:10:12 |
+---+-------+------+---------------------+
Note that id=7 is below id=5
I am wondering does anyone know a way of doing this?
I have tried this below but it is not outputting what I need (it is outputting where level_id=2 and id=5, and the other rows are not in order)
((SELECT b.* FROM table a JOIN table b ON b.score > a.score OR (b.score = a.score AND b.timestamp < a.timestamp)
WHERE a.level_id = 1 AND a.id = 5 ORDER BY score ASC, timestamp DESC LIMIT 3)
UNION ALL
(SELECT b.* FROM table a JOIN table b ON b.score < a.score OR (b.score = a.score AND b.timestamp > a.timestamp)
WHERE a.level_id = 1 AND a.id = 5 ORDER BY score DESC, timestamp ASC LIMIT 2))
order by score
If it is easier to output all rows in the table, say where level = 1, so it is a full score board.. and then do the getting a certain row and two above and below it using PHP I'd also like to know please :) ! (possibly thinking this may keep the SQL simpler)?
You can use cte and inner join as follows:
With cte as
(select t.*,
dense_rank() over (order by score) as dr
from your_table t)
Select c.*
From cte c join cte cu on c.dr between cu.dr - 2 and cu.dr + 2
Where cu.id = 5
Ordwr by c.dr, c.timestamp
I would suggest window functions:
select t.*
from (select t.*,
max(case when id = 7 then score_rank end) over () as id_rank
from (select t.*,
dense_rank() over (order by score) as score_rank
from t
where level = 1
) t
) t
where score_rank between id_rank - 2 and id_rank + 2;
Note: This returns 5 distinct score values, which may result in more rows depending on duplicates.
Here is a db<>fiddle.
EDIT:
If you want exactly 5 rows using the timestamp, then:
select t.*
from (select t.*,
max(case when id = 7 then score_rank end) over () as id_rank
from (select t.*,
dense_rank() over (order by score, timestamp) as score_rank
from t
where level = 1
) t
) t
where score_rank between id_rank - 2 and id_rank + 2
order by score;
Note: This still treats equivalent timestamps as the same, but they seem to be unique in your data.

php mysql select by month between records

I've this MySQL table my_table:
+-------+------------+-----------+
|Student| Date | Classroom |
+-------+------------+-----------+
| 1 | 2018-01-01 | 101 |
| 2 | 2018-01-01 | 102 |
| 3 | 2018-01-01 | 103 |
| 1 | 2018-03-01 | 104 |
| 2 | 2018-06-01 | 103 |
| 3 | 2018-09-01 | 104 |
| 1 | 2018-11-01 | 106 |
| 2 | 2018-12-01 | 101 |
+-------+------------+-----------+
The students stay in the assigned classroom till changed.
I'm trying to get which classroom they were in for a certain month.
For example in October(10), student 1 was in 104, 2 was in 103, and 3 was in 104.
I'm really unsure on how to proceed with this one so any help is appreciated.
Currently using this query based on Strawberry answer
SELECT x.*
FROM my_table x
LEFT OUTER JOIN my_table y
ON y.student = x.student
AND y.date < x.date
WHERE x.date <= LAST_DAY('2018-10-01')
GROUP BY student
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(Student INT NOT NULL, Date DATE NOT NULL, Classroom INT NOT NULL,PRIMARY KEY(student,classroom));
INSERT INTO my_table VALUES
(1,'2018-01-01',101),
(2,'2018-01-01',102),
(3,'2018-01-01',103),
(1,'2018-03-01',104),
(2,'2018-06-01',103),
(3,'2018-09-01',104),
(1,'2018-11-01',106),
(2,'2018-12-01',101);
SELECT x.*
FROM my_table x
JOIN
( SELECT student
, MAX(date) date
FROM my_table
WHERE date <= LAST_DAY('2018-10-01')
GROUP
BY student
) y
ON y.student = x.student
AND y.date = x.date;
+---------+------------+-----------+
| Student | Date | Classroom |
+---------+------------+-----------+
| 1 | 2018-03-01 | 104 |
| 2 | 2018-06-01 | 103 |
| 3 | 2018-09-01 | 104 |
+---------+------------+-----------+
Here's a go at it (snippet to go in a stored procedure; assumes table called example & output to table months). It produces a row per student for each month of the range.
drop table months;
create table months (month date, student integer, classroom integer);
set #month = (select min(date) from example);
start_loop: LOOP
insert into months select #month, s1.student, classroom from
(select student, max(date) as maxdate from example where date <= #month group by student) s1
join example s2 on s1.student = s2.student and maxdate = date;
if #month = (select max(date) from example) then
leave start_loop;
end if;
set #month = #month + interval 1 month;
END LOOP start_loop;
Let's break the problem into two parts. Firstly, find all the rooms which have been allocated to student A so far and sort them using the date. Next, find the record which is just before or equal to the required month.
For example:
Consider student 1. We get
+-------+------------+-----------+
|Student| Date | Classroom |
+-------+------------+-----------+
| 1 | 2018-01-01 | 101 |
| 1 | 2018-03-01 | 104 |
| 1 | 2018-11-01 | 106 |
+-------+------------+-----------+
Now, let's say for month June we try to find month just less than or equal to 2018-06-01 to get the required room number. I hope this will help.

MySQL, Merge selects in order of one record from each select

I have a table that contains too many records and each bunch of records belong to someone:
---------------------
id | data | username
---------------------
1 | 10 | ali
2 | 11 | ali
3 | 12 | ali
4 | 20 | omid
5 | 21 | omid
6 | 30 | reza
now I want to create a query to result me like this:
1-10-ali
4-20-omid
6-30-reza
2-11-ali
5-21-omid
3-12-ali
Is there anyway to create a query to result me one record per each username and then one from another, and another to the end?
Unfortunately MySQL doesn't have a ranking system so you can use UDV (user defined variables) to rank your records like so.
SELECT id, `data`, name
FROM
( SELECT
id, `data`, name,
#rank := if(#name = name, #rank + 1, 1) as rank,
#name := name
FROM test
CROSS JOIN (SELECT #rank := 1, #name := '') temp
ORDER BY name, `data`
) t
ORDER BY t.rank, t.name, t.data
Sql Fiddle to play with
Output:
+---------------------+
| id | data | name |
+-----+------+--------+
| 1 | 10 | ali |
+---------------------+
| 4 | 20 | omid |
+---------------------+
| 6 | 30 | reza |
+---------------------+
| 2 | 11 | ali |
+---------------------+
| 5 | 21 | omid |
+---------------------+
| 3 | 12 | ali |
+---------------------+
The classic SQL approach is a self join and grouping that lets you determine a row's ranking position by counting the number of rows that come before it. As this is probably slower I doubt I could talk you out of the proprietary method but I mention it to give you an alternative.
select t.id, min(t.`data`), min(t.username)
from test t inner join test t2
on t2.username = t.username and t2.id <= t.id
group by t.id
order by count(*), min(t.username)
Your example would work with
SELECT id, `data`, name
FROM tbl
ORDER BY `data` % 10,
username
`data`;
If data and username do not have the desired pattern, then improve on the example.

MySQL get the sum of all rows without retrieving all of them

This may be a little confusing but please bear with me. Here's the thing:
I have a database that contains ~1000 records, as the following table illustrates:
+------+----------+----------+
| id | date | amount |
+------+----------+----------+
| 0001 | 14/01/15 | 100 |
+------+----------+----------+
| 0002 | 14/02/04 | 358 |
+------+----------+----------+
| 0003 | 14/05/08 | 1125 |
+------+----------+----------+
What I want to do is this:
Retrieve all the records beginning at 2014 and until yesterday:
WHERE `date` > '14-01-01' AND `date` < CURDATE()
But also get the sum of amount up to the current date, this is:
WHERE `date` < CURDATE()
I've already got this working by just selecting all the records based on the second condition, getting the sum, and then excluding those which don't match the first condition. Something like this:
SELECT `id`, `date`, `amount` FROM `table`
WHERE `date` < CURDATE()
And then:
$rows = fetchAll($PDOStatement);
foreach($rows as $row) {
$sum += $row->amount;
if (
strtotime($row->date) > strtotime('14-01-01') &&
strtotime($row->date) < strtotime(date('Y-m-d'))
) {
$valid_rows[] = $row;
}
}
unset $rows;
Is there a way to achieve this in a single query, efficiently? Would a transaction be more efficient than sorting out the records in PHP? This has to be SQL-standard compliant (I'll be doing this on MySQL and SQLite).
Update:
It doesn't matter if the result ends up being something like this:
+------+----------+----------+-----+
| id | date | amount | sum |
+------+----------+----------+-----+
| 0001 | 14/01/15 | 100 | 458 |
+------+----------+----------+-----+
| 0002 | 14/02/04 | 358 | 458 |
+------+----------+----------+-----+
| 0003 | 14/05/08 | 1125 | 458 |
+------+----------+----------+-----+
The worst case would be when the resulting set ends up being the same as the set that gives the sum (in this case appending the sum would be irrelevant and would cause an overhead), but for any other regular cases the bandwith save would be huge.
You can create a special record with your sum and add it at the end of your first query
SELECT * FROM `table` WHERE `date` > '14-01-01' AND `date` < CURDATE()
UNION
SELECT 9999, CURDATE(), SUM(`amount`) FROM `table` WHERE `date` < CURDATE()
Then you will have all your desired record and the record with id 9999 or whatever is your sum
This could be achieved by correlated subquery, something like below:
SELECT *, (SELECT SUM(amount) FROM t WHERE t.date < t1.date) AS PrevAmount
FROM t AS t1
WHERE `date` > '14-01-01' AND `date` < CURDATE()
However it is very unefficient if the number of records is large.
It's hackish, but:
> select * from foo;
+------+------+
| id | val |
+------+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
+------+------+
5 rows in set (0.02 sec)
> select * from foo
left join (
select sum(val)
from foo
where id < 3
) AS bar ON 1=1
where id < 4;
+------+------+----------+
| id | val | sum(val) |
+------+------+----------+
| 1 | 1 | 3 |
| 2 | 2 | 3 |
| 3 | 3 | 3 |
+------+------+----------+
Basically, do your summing in a joined subquery. That'll attach the sum result to every row in the outer table's results. You'll waste a bit of bandwidth sending that duplicated value out with every row, but it does get you the results in a "single" query.
EDIT:
You can get the SUM using a LEFT OUTER JOIN.
SELECT t1.`id`, t1.`date`, t2.sum_amount
FROM
`table` t1
LEFT OUTER JOIN
(
SELECT SUM(`amount`) sum_amount
FROM `table`
WHERE `date` < CURDATE()
) t2
ON 1 = 1
WHERE t1.`date` > STR_TO_DATE('01,1,2014','%d,%m,%Y') AND t1.`date` < CURDATE();
This will do what you want it to do...optimizing the subquery is the real challenge:
SELECT id,date,amount,(SELECT SUM(amount) FROM table) AS total_amount
FROM table
WHERE date BETWEEN '14-01-01' AND DATE_ADD(CURDATE(), INTERVAL -1 DAY)

Select column values as column headers in SQL?

I have a table like this:
------------------------------------------------------
ID | Date | ClientName | TransactionAmount |
------------------------------------------------------
1 | 6/16/13 | C1 | 15 |
------------------------------------------------------
2 | 6/16/13 | C1 | 10 |
------------------------------------------------------
3 | 6/16/13 | C2 | 10 |
------------------------------------------------------
4 | 6/17/13 | C2 | 20 |
------------------------------------------------------
And I would like to get something like this:
------------------------------------------------------------------------
Date | C1_Total_Amount_Transacted | C2_Total_Amount_Transacted |
------------------------------------------------------------------------
6/16/13 | 25 | 10 |
------------------------------------------------------------------------
6/17/13 | 0 | 20 |
In the second table Date is unique also I there are x clients in the databse the
resul table will have x + 1 columns (1 fore date and x one for each client).
There might be necessary to write some PHP code and more querys, any working solution
is perfect, I don`t need a full SQL solution.
Thanks
I presume that you are rather new to SQL. This type of query requires conditional summation. And it is quite easy to express in SQL:
select `date`,
sum(case when Client_Name = 'C1' then TransactionAmount else 0 end) as C1,
sum(case when Client_Name = 'C2' then TransactionAmount else 0 end) as C2
from t
group by `date`
But, you have to list each client in the query. You always have to specify the exact column headers for a SQL query. If you don't know them, then you need to create the SQL as a string and then execute it separately. This is a rather cumbersome process.
You can often get around that by using group_concat(). This puts the values in a single column, with a separator of your choice (default is a comma):
select `date`, group_concat(amount)
from (select `date`, ClientName, sum(TransactionAmount) as amount
from t
group by `date`, ClientName
) t
group by `date`

Categories