Basically, I have a mysql db table which contains a datetime column and a category column. I want to create a SQL query to retrieve all the values present in the category column and count how many occurences of each category values grouped by month/year of the datetime column. If it is possible, I'd also like totals to be returned. A total for the number of all occurences in a month and a total of category counted.
Note: the category values cannot be hardcoded because they are set by the user and stored in another table.
DB table has following structure:
datetime | category
2009-01-05 | fish
2009-01-06 | fish
2009-01-06 | potato
2009-01-16 | fish
2009-02-08 | pineapple
2009-02-15 | potato
I wish returned result from query would be:
Month | fish | potato | pineapple | total
2009-01 | 3 | 1 | 0 | 4
2009-02 | 0 | 1 | 1 | 2
Total | 3 | 2 | 1 | 6
I think (hope) it can be done in a single SQL query but I can't figure out how.
Can anyone help me?
Thanks!
Let me first say that I think this feels more like an issue to handle in your presentation logic (php code). However, SQL can produce such a result. You are trying to accomplish two different things.
First, you're looking for a PIVOT table. MySQL does not support the PIVOT command, but you can simulate it with MAX and CASE. This works well when you know the number of potential categories, but won't work in your case.
Next, you want to have row totals and then a final total row. Again, this is more appropriate to handle in the presentation layer.
However, using Dynamic SQL, you can achieve both a PIVOT table and row totals. Here is some sample code:
First build your PIVOT variable #sql:
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'COUNT(IF(category = ''', category, ''',1,NULL)) AS ', category)
) INTO #sql
FROM (
SELECT *,
#rn:=IF(#prevMonthYear=CONCAT(YEAR(datetime),'-',MONTH(datetime)),#rn+1,1) rn,
#prevMonthYear:=CONCAT(YEAR(datetime),'-',MONTH(datetime)) dt
FROM yourtable JOIN (SELECT #rn:=0,#prevParent:=0) t
) t
;
Now build your Row Summary variable #totsql:
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'SUM(', category, ') AS sum_', category)
) INTO #totsql
FROM (
SELECT *,
#rn:=IF(#prevMonthYear=CONCAT(YEAR(datetime),'-',MONTH(datetime)),#rn+1,1) rn,
#prevMonthYear:=CONCAT(YEAR(datetime),'-',MONTH(datetime)) dt
FROM yourtable JOIN (SELECT #rn:=0,#prevParent:=0) t
) t
;
Put it all together:
SET #sql = CONCAT('SELECT dt,
', #sql, ', COUNT(1) total
FROM (
SELECT *,
#rn:=IF(#prevMonthYear=CONCAT(YEAR(datetime),''-'',MONTH(datetime)),#rn+1,1) rn,
#prevMonthYear:=CONCAT(YEAR(datetime),''-'',MONTH(datetime)) dt
FROM yourtable JOIN (SELECT #rn:=0,#prevParent:=0) t
) t
GROUP BY dt
UNION
SELECT ''Totals'',', #totsql, ', SUM(total)
FROM (
SELECT dt,
', #sql, ', COUNT(1) total
FROM (
SELECT *,
#rn:=IF(#prevMonthYear=CONCAT(YEAR(datetime),''-'',MONTH(datetime)),#rn+1,1) rn,
#prevMonthYear:=CONCAT(YEAR(datetime),''-'',MONTH(datetime)) dt
FROM yourtable JOIN (SELECT #rn:=0,#prevParent:=0) t
) t
GROUP BY dt
) t2
;');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SQL Fiddle Demo
Results:
MONTH FISH POTATO PINEAPPLE TOTAL
2009-1 3 1 0 4
2009-2 0 1 1 2
Totals 3 2 1 6
You can have multiple nested queries or you can use mysql loops in this case. Easiest thing would be to get all the data you need and process it using php.
I have this table structure:
EDIT more complex example: add hidden range
category| day | a |
--------|------------|-------|
1 | 2012-01-01 | 4 |
1 | 2012-01-02 | 4 |
1 | 2012-01-03 | 4 |
1 | 2012-01-04 | 4 |
1 | 2012-01-05 | 5 |
1 | 2012-01-06 | 5 |
1 | 2012-01-07 | 5 |
1 | 2012-01-08 | 4 |
1 | 2012-01-09 | 4 |
1 | 2012-01-10 | 4 |
1 | 2012-01-11 | 5 |
1 | 2012-01-12 | 5 |
1 | 2012-01-16 | 5 |
1 | 2012-01-17 | 5 |
1 | 2012-01-18 | 5 |
1 | 2012-01-19 | 5 |
...
with 'category-day' as unique keys. I would extract a range of dates, for each category, according with column "a" and given limit range, like so:
1,2012-01-01|2012-01-04,4
1,2012-01-05|2012-01-07,5
1,2012-01-08|2012-01-10,4
1,2012-01-11|2012-01-12,5
1,2012-01-13|2012-01-15,0
1,2012-01-16|2012-01-19,5
or similar.
I search the best way for do it. Using only mysql preferably but also with a little bit of php.
NOTE1: not all day are inserted: between two days non-contiguos could not be other days. In this case I would in output the missed range with column "a" = 0.
NOTE2: I did it with a simple query and some rows of php but I don't like it because my simple algorithm need a cycle for each day in range multiplied for each category found. If range is too big and there are too much categories, that's not so good.
FINAL EDIT: OK! After reading all comments and answers, I think not exists a valid, efficient and, at same time, readable solution. So Mosty Mostacho answer is a no 100% valid solution, but it has 100% valid suggestions. Thank you all.
New edit:
As I told you in a comment, I strongly recommend you to use the quick query and then process the missing dates in PHP as that would be faster and more readable:
select
concat(#category := category, ',', min(day)) col1,
concat(max(day), ',', #a := a) col2
from t, (select #category := '', #a := '', #counter := 0) init
where #counter := #counter + (category != #category or a != #a)
group by #counter, category, a
However, if you still want to use the query version, then try this:
select
#counter := #counter + (category != #category or a != #a) counter,
concat(#category := category, ',', min(day)) col1,
concat(max(day), ',', #a := a) col2
from (
select distinct s.day, s.category, coalesce(t1.a, 0) a
from (
select (select min(day) from t) + interval val - 1 day day, c.category
from seq s, (select distinct category from t) c
having day <= (select max(day) from t)
) s
left join t t1 on s.day = t1.day and s.category = t1.category
where s.day between (
select min(day) from t t2
where s.category = t2.category) and (
select max(day) from t t2
where s.category = t2.category)
order by s.category, s.day
) t, (select #category := '', #a := '', #counter := 0) init
group by counter, category, a
order by category, min(day)
Note that MySQL won't allow you to create data on the fly, unless you hardcode UNIONS, for example. This is an expensive process that's why I strongly suggest you to create a table with only an integer field with values from 1 to X, where X is, at least the maximum amount of dates that separate the min(day) and max(day) from your table. If you're not sure about that date, just add 100,000 numbers and you'll be able to generate range periods for over 200 years. In the previous query, this table is seq and the column it has is val.
This results in:
+--------------+--------------+
| COL1 | COL2 |
+--------------+--------------+
| 1,2012-01-01 | 2012-01-04,4 |
| 1,2012-01-05 | 2012-01-07,5 |
| 1,2012-01-08 | 2012-01-10,4 |
| 1,2012-01-11 | 2012-01-12,5 |
| 1,2012-01-13 | 2012-01-15,0 |
| 1,2012-01-16 | 2012-01-19,5 |
+--------------+--------------+
Ok, I'm lying. The result is actually returning a counter column. Just disregard it, as removing it (using a derived table) would be even less performant!
and here's a one liner brutality for you :) (Note: Change the "datt" table name.)
select dd.category,
dd.day as start_day,
(select dp.day from
(
select 1 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
select * from datt where day = d1.day - INTERVAL 1 DAY and a=d1.a
)
union
select 2 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
select * from datt where day = d1.day + INTERVAL 1 DAY and a=d1.a
)
) dp where dp.day >= dd.day - INTERVAL (n-2) DAY order by day asc limit 0,1)
as end_day,
dd.a from (
select 1 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
select * from datt where day = d1.day - INTERVAL 1 DAY and a=d1.a
)
union
select 2 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
select * from datt where day = d1.day + INTERVAL 1 DAY and a=d1.a
)
) dd
where n=1
and it's output is :
|| 1 || 2012-01-01 || 2012-01-01 || 4 ||
|| 1 || 2012-01-03 || 2012-01-04 || 4 ||
|| 1 || 2012-01-05 || 2012-01-07 || 5 ||
|| 1 || 2012-01-08 || 2012-01-10 || 4 ||
|| 1 || 2012-01-11 || 2012-01-12 || 5 ||
Note: Thats the result for non-existing 2012-01-02 in a 01-12 day table.
No need for PHP or temporary tables or anything.
DISCLAIMER: I did this just for fun. This stunt may be too crazy to be used in a production environment. Therefore I'm not posting this as a "real" solution. Also I'm not willing to explain how it works :) And I didn't rethink / refactor it. There might be more elegant ways and names / aliases could be more informative. So please no flame or anything.
Here's my solution. Looks more complicated than it is. I think it may be easier to understand than other answers, no offense :)
Setting up test data:
drop table if exists test;
create table test(category int, day date, a int);
insert into test values
(1 , '2012-01-01' , 4 ),
(1 , '2012-01-02' , 4 ),
(1 , '2012-01-03' , 4 ),
(1 , '2012-01-04' , 4 ),
(1 , '2012-01-05' , 5 ),
(1 , '2012-01-06' , 5 ),
(1 , '2012-01-07' , 5 ),
(1 , '2012-01-08' , 4 ),
(1 , '2012-01-09' , 4 ),
(1 , '2012-01-10' , 4 ),
(1 , '2012-01-11' , 5 ),
(1 , '2012-01-12' , 5 ),
(1 , '2012-01-16' , 5 ),
(1 , '2012-01-17' , 5 ),
(1 , '2012-01-18' , 5 ),
(1 , '2012-01-19' , 5 );
And here it comes:
SELECT category, MIN(`day`) AS firstDayInRange, max(`day`) AS lastDayInRange, a
, COUNT(*) as howMuchDaysInThisRange /*<-- as a little extra*/
FROM
(
SELECT
IF(#prev != qr.a, #is_a_changing:=#is_a_changing+1, #is_a_changing) AS is_a_changing, #prev:=qr.a, qr.* /*See if column a has changed. If yes, increment, so we can GROUP BY it later*/
FROM
(
SELECT
test.category, q.`day`, COALESCE(test.a, 0) AS a /*When there is no a, replace NULL with 0*/
FROM
test
RIGHT JOIN
(
SELECT
DATE_SUB(CURDATE(), INTERVAL number_days DAY) AS `day` /*<-- Create dates from now back 999 days. This query is surprisingly fast. And adding more numbers to create more dates, i.e. 10000 dates is also no problem. Therefor a temporary dates table might not be necessary?*/
FROM
(
SELECT (a + 10*b + 100*c) AS number_days FROM
(SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) aa
, (SELECT 0 AS b UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) bb
, (SELECT 0 AS c UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) cc
)sq /*<-- This generates numbers 0 to 999*/
)q USING(`day`)
, (SELECT #is_a_changing:=0, #prev:=0) r
/*This WHERE clause is just to beautify. It may not be necessary*/
WHERE q.`day` >= (SELECT MIN(test.`day`) FROM test) AND q.`day` <= (SELECT MAX(test.`day`) FROM test)
)qr
)asdf
GROUP BY is_a_changing
ORDER BY 2
Result looks like this:
category firstDayInRange lastDayInRange a howMuchDaysInThisRange
--------------------------------------------------------------------------
1 2012-01-01 2012-01-04 4 4
1 2012-01-05 2012-01-07 5 3
1 2012-01-08 2012-01-10 4 3
1 2012-01-11 2012-01-12 5 2
2012-01-13 2012-01-15 0 3
1 2012-01-16 2012-01-19 5 4
To make this work as you want it to, you should have two tables:
for periods
for days
Where each period can have many days related to it through FOREIGN KEY. With current table structure, the best you can do is to detect the continuous periods on PHP side.
Firstly, this is an extension of #Mosty's solution.
To enable Mosty's solution to include category/date combinations than do not exist in the table I took the following approach -
Start by getting a distinct list of categories and then join this to the entire date range -
SELECT category, `start` + INTERVAL id DAY AS `day`
FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
WHERE id <= DATEDIFF(`end`, `start`)
ORDER BY category, `day`
The above query builds the full date range using the table dummy with a single field id. The id field contains 0,1,2,3,.... - it needs to have enough values to cover every day in the required date range. This can then be joined back to the original table to create a complete list of all categories for all dates and the appropriate value for a -
SELECT cj.category, cj.`day`, IFNULL(t.a, 0) AS a
FROM (
SELECT category, `start` + INTERVAL id DAY AS `day`
FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
WHERE id <= DATEDIFF(`end`, `start`)
ORDER BY category, `day`
) AS cj
LEFT JOIN t
ON cj.category = t.category
AND cj.`day` = t.`day`
This can then be applied to Mosty's query in place of table t -
SELECT
CONCAT(#category := category, ',', MIN(`day`)) col1,
CONCAT(MAX(`day`), ',', #a := a) col2
FROM (
SELECT cj.category, cj.day, IFNULL(t.a, 0) AS a
FROM (
SELECT category, `start` + INTERVAL id DAY AS `day`
FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
WHERE id <= DATEDIFF(`end`, `start`)
ORDER BY category, `day`
) AS cj
LEFT JOIN t
ON cj.category = t.category
AND cj.`day` = t.day) AS t, (select #category := '', #a := '', #counter := 0) init
WHERE #counter := #counter + (category != #category OR a != #a)
GROUP BY #counter, category, a
Completely on mysql side will have performance adv:
Once the procedure has been created, it runs within 0.35 - 0.37 sec
create procedure fetch_range()
begin
declare min date;
declare max date;
create table testdate(
date1 date
);
select min(day) into min
from category;
select max(day) into max
from category;
while min <= max do
insert into testdate values(min);
set min = adddate(min,1);
end while;
select concat(category,',',min(day)),concat(max(day),',',a)
from(
SELECT if(isNull(category),#category,category) category,if(isNull(day),date1,day) day,#a,if(isNull(a) || isNull(#a),if(isNull(a) && isNull(#a),#grp,#grp:=#grp+1),if(#a!=a,#grp:=#grp+1,#grp)) as sor_col,if(isNull(a),0,a) as a,#a:=a,#category:= category
FROM `category`
RIGHT JOIN testdate ON date1 = category.day) as table1
group by sor_col;
drop table testdate;
end
o/p:
1,2012-01-01|2012-01-04,4
1,2012-01-05|2012-01-07,5
1,2012-01-08|2012-01-10,4
1,2012-01-11|2012-01-12,5
1,2012-01-13|2012-01-15,0
1,2012-01-16|2012-01-19,5
Here is mysql solution which will give the desired result excluding the missed range only.
PHP:
The missing range can be added through php.
$sql = "set #a=0,#grp=0,#datediff=0,#category=0,#day='';";
mysql_query($sql);
$sql= "select category,min(day)min,max(day) max,a
from(
select category,day,a,concat(if(#a!=a,#grp:=#grp+1,#grp),if(datediff(#day,day) < -1,#datediff:=#datediff+1,#datediff)) as grp_datediff,datediff(#day,day)diff, #day:= day,#a:=a
FROM category
order by day)as t
group by grp_datediff";
$result = mysql_query($sql);
$diff = 0;
$indx =0;
while($row = mysql_fetch_object($result)){
if(isset($data[$indx - 1]['max'])){
$date1 = new DateTime($data[$indx - 1]['max']);
$date2 = new DateTime($row->min);
$diff = $date1->diff($date2);
}
if ($diff->days > 1) {
$date = new DateTime($data[$indx-1]['max']);
$interval = new DateInterval("P1D");
$min = $date->add($interval);
$date = new DateTime($data[$indx-1]['max']);
$interval = new DateInterval("P".$diff->days."D");
$max = $date->add($interval);
$data[$indx]['category'] = $data[$indx-1]['category'];
$data[$indx]['min'] = $min->format('Y-m-d');
$data[$indx]['max'] = $max->format('Y-m-d');
$data[$indx++]['a'] = 0;
$data[$indx]['category'] = $row->category;
$data[$indx]['min'] = $row->min;
$data[$indx]['max'] = $row->max;
$data[$indx]['a'] = $row->a;
}else{
$data[$indx]['category'] = $row->category;
$data[$indx]['min'] = $row->min;
$data[$indx]['max'] = $row->max;
$data[$indx]['a'] = $row->a;
}
$indx++;
}
Is this what you mean?
SELECT
category,
MIN(t1.day),
MAX(t2.day),
a
FROM
`table` AS t1
INNER JOIN `table` AS t2 USING (category, a)
If I understand your question correctly, I would use something to the effect of:
SELECT MAX(day), MIN(day) FROM `YourTable` WHERE `category`= $cat AND `A`= $increment;
... and ...
$dateRange = $cat.","."$min"."|"."$max".",".$increment;