I'm working on a project written in Laravel 5.2 that has two tables that need to be queried to produce a summary for the amount of records created by year. Here is a simplified layout of the schema with some sample data:
Matters Table
id created_at
-----------------
1 2016-01-05 10:00:00
2 2016-03-09 11:00:00
3 2017-01-03 10:00:00
4 2015-05-06 11:00:00
Notes Table
id created_at
-----------------
1 2015-07-08 10:00:00
2 2016-03-16 11:00:00
3 2017-09-03 10:00:00
4 2017-11-06 11:00:00
Each table has several hundred thousand records, so I'd like to be able to (efficiently) query my data to produce the following results with the counts of each table by year:
year matters notes
----------------------------
2015 1 1
2016 2 1
2017 1 2
I need each column to be sortable. Currently, the fastest way I can think of to do this is to have two queries like the following and then combine the results of the two via PHP:
SELECT YEAR(matters.created_at) AS 'year', COUNT(1) AS 'matters'
FROM matters
GROUP BY YEAR(matters.created_at)
SELECT YEAR(notes.created_at) AS 'year', COUNT(1) AS 'notes'
FROM notes
GROUP BY YEAR(notes.created_at)
But I'm wondering if there is a better way, especially since I have to work in sorting each column based on the user's needs.
Any thoughts?
The year bit could be used to create a JOIN between the tables (and their respective result sets) to produce the combined output that can further be sorted by any field of choice:
SELECT t1.year_rec, IFNULL(t1.matters, 0) AS matters, IFNULL(t2.notes, 0) AS notes
FROM
(
SELECT DATE_FORMAT(created_at, "%Y" ) AS year_rec, COUNT(id) AS matters
FROM matters
GROUP BY year_rec
) t1
LEFT JOIN
(
SELECT DATE_FORMAT(created_at, "%Y" ) AS year_rec, COUNT(id) AS notes
FROM notes
GROUP BY year_rec
) t2
ON t1.year_rec = t2.year_rec
-- ORDER BY --
Demo
Caveat:
Keeping in mind the nature of JOINs and that MySQL doesn't provide a direct means to write FULL OUTER JOIN, you'd notice that if a certain year doesn't have any records in the first table participating in the LEFT JOIN, then that year will be omitted from the final output produced. If we changed the query to have a RIGHT JOIN, then the omission will be carried out based on the second table instead. However, if that is not going to happen in your case (i.e. if you think there will always be records for every year in both the tables), you needn't worry about this situation, but it'll still be good to be aware of this loophole.
For further reading: Full Outer Join in MySQL
Related
mysql table: stats
columns:
date | stats
05-05-2015 22:25:00 | 78
05-05-2015 09:25:00 | 21
05-05-2015 05:25:00 | 25
05-04-2015 09:25:00 | 29
05-04-2015 05:25:00 | 15
sql query:
SELECT MAX(date) as date, stats FROM stats GROUP BY date(date) ORDER BY date DESC
when I do this, I does select one row per date (grouped by date, regardless of the time), and selects the largest date with MAX, but it does not select the corresponding column.
for example, it returns 05-05-2015 22:25:00 as the date, and 25 as the stats. It should be selecting 78 as the stats. I've done my research and seems like solutions to this are out there, but I am not familiar with JOIN or other less-common mysql functions to achieve this, and it's hard for me to understand other examples/solutions so I decided to post my own specific scenario.
This question is asked every single day in SO. Sometimes, it's correctly answered too. Anyway, purists won't like it but here's one option:
Select x.* from stats x join (SELECT MAX(date) max_date FROM stats GROUP BY date(date)) y on y.max_date = x.date;
Obviously, for this to work dates need to be stored using a datetime data type.
I have to make some statistics for my application, so I need an algorithm with a performance as best as possible. I have some several question.
I have a data structure like this in the mysql database:
user_id group_id date
1 5 2012-11-20
1 2 2012-11-01
1 4 2012-11-01
1 3 2012-10-15
1 9 2013-01-18
...
So I need to find the group of some user at a specific date. For example, the group of the user 1 at date 2012-11-15 (15 november 2012) should return the most recent group, which is 2 and 4 (many group at the same time) at date 2012-11-01 (the closest and smaller date).
Normally, I could do a Select where date <= chosen date order by date desc, etc... but that's not the point because if I have 1000 users, it will need 1000 requests to have all the result.
So here are some question:
I have already using the php method to loop through the array to avoid the high number of mysql request, but it's still not good because the array size may be 10000+. Using a foreach (or for?) is quite costly.
So my question is if given an array, ordered by date (desc or asc), what's the fastest way to find the closest index of the element which contain a date smaller (or greater) than a given date; beside using a for or foreach loop to loop through each element.
If there is no solution for the first question, then what kind of data structure would you suggest for this kind of problem.
Note: the date is in mysql format, it's not converted in timestamp when you stored it in an array
EDIT: this is a sql fiddle http://sqlfiddle.com/#!2/dc28d/1
For dos_id = 6, t="2012-11-01" it should returns only 2 and 5 at date "2010-12-10 13:16:58"
Not sure why you'd want to do this in php. Here's some SQL using joins instead to get most recent group(s) for all users given a date. Make sure you've got indexes on date and userid.
SELECT *
FROM test t1
LEFT JOIN test t2
ON t1.userid = t2.userid AND t2.thedate <= '2012-11-15' AND t2.thedate > t1.thedate
WHERE t1.thedate <= '2012-11-15' AND t2.userid IS NULL;
SQLfiddle
Or using your SQLFiddle
SELECT t1.*
FROM dossier_dans_groupe t1
LEFT JOIN dossier_dans_groupe t2
ON t1.dos_id = t2.dos_id AND t2.updated_at <= '2012-11-01'
AND t2.updated_at > t1.updated_at
WHERE t1.updated_at <= '2012-11-01' AND t2.dos_id IS NULL;
This would give you a list of all users and their groups (1 row per group) for the latest date that is smaller than the one you specify (2012-11-15 below).
SELECT user_id, group_id, date FROM table WHERE date <= '2012-11-15' AND NOT EXISTS (SELECT 1 FROM table test WHERE test.user_id = table.user_id AND test.date > table.date and test.date <= '2012-11-15')
Problem - Retrieve sum of subtotals on a half hour interval efficiently
I am using MySQL and I have a table containing subtotals with different times. I want to retrieve the sum of these sales on a half hour interval from 7 am through 12 am. My current solution (below) works but takes 13 seconds to query about 150,000 records. I intend to have several million records in the future and my current method is too slow.
How I can make this more efficient or if possible replace the PHP component with pure SQL? Also, would it help your solution to be even more efficient if I used Unix timestamps instead of having a date and time column?
Table Name - Receipts
subtotal date time sale_id
--------------------------------------------
6 09/10/2011 07:20:33 1
5 09/10/2011 07:28:22 2
3 09/10/2011 07:40:00 3
5 09/10/2011 08:05:00 4
8 09/10/2011 08:44:00 5
...............
10 09/10/2011 18:40:00 6
5 09/10/2011 23:05:00 7
Desired Result
An array like this:
Half hour 1 ::: (7:00 to 7:30) => Sum of Subtotal is 11
Half hour 2 ::: (7:30 to 8:00) => Sum of Subtotal is 3
Half hour 3 ::: (8:00 to 8:30) => Sum of Subtotal is 5
Half hour 4 ::: (8:30 to 9:00) => Sum of Subtotal is 8
Current Method
The current way uses a for loop which starts at 7 am and increments 1800 seconds, equivalent to a half hour. As a result, this makes about 34 queries to the database.
for($n = strtotime("07:00:00"), $e = strtotime("23:59:59"); $n <= $e; $n += 1800) {
$timeA = date("H:i:s", $n);
$timeB = date("H:i:s", $n+1799);
$query = $mySQL-> query ("SELECT SUM(subtotal)
FROM Receipts WHERE time > '$timeA'
AND time < '$timeB'");
while ($row = $query-> fetch_object()) {
$sum[] = $row;
}
}
Current Output
Output is just an array where:
[0] represents 7 am to 7:30 am
[1] represents 7:30 am to 8:00 am
[33] represents 11:30 pm to 11:59:59 pm.
array ("0" => 10000,
"1" => 20000,
..............
"33" => 5000);
You can try this single query as well, it should return a result set with the totals in 30 minute groupings:
SELECT date, MIN(time) as time, SUM(subtotal) as total
FROM `Receipts`
WHERE `date` = '2012-07-30'
GROUP BY hour(time), floor(minute(time)/30)
To run this efficiently, add a composite index on the date and time columns.
You should get back a result set like:
+---------------------+--------------------+
| time | total |
+---------------------+--------------------+
| 2012-07-30 00:00:00 | 0.000000000 |
| 2012-07-30 00:30:00 | 0.000000000 |
| 2012-07-30 01:00:00 | 0.000000000 |
| 2012-07-30 01:30:00 | 0.000000000 |
| 2012-07-30 02:00:00 | 0.000000000 |
| 2012-07-30 02:30:00 | 0.000000000 |
| 2012-07-30 03:00:00 | 0.000000000 |
| 2012-07-30 03:30:00 | 0.000000000 |
| 2012-07-30 04:00:00 | 0.000000000 |
| 2012-07-30 04:30:00 | 0.000000000 |
| 2012-07-30 05:00:00 | 0.000000000 |
| ...
+---------------------+--------------------+
First, I would use a single DATETIME column, but using a DATE and TIME column will work.
You can do all the work in one pass using a single query:
select date,
hour(`time`) hour_num,
IF(MINUTE(`time`) < 30, 0, 1) interval_num,
min(`time`) interval_begin,
max(`time`) interval_end,
sum(subtotal) sum_subtotal
from receipts
where date='2012-07-31'
group by date, hour_num, interval_num;
UPDATE:
Since you aren't concerned with any "missing" rows, I'm also going to assume (probably wrongly) that you aren't concerned that the query might possibly return rows for periods that are not from 7AM to 12AM. This query will return your specified result set:
SELECT (HOUR(r.time)-7)*2+(MINUTE(r.time) DIV 30) AS i
, SUM(r.subtotal) AS sum_subtotal
FROM Receipts r
GROUP BY i
ORDER BY i
This returns the period index (i) derived from an expression referencing the time column. For best performance of this query, you probably want to have a "covering" index available, for example:
ON Receipts(`time`,`subtotal`)
If you are going to include an equality predicate on the date column (which does not appear in your solution, but which does appear in the solution of the "selected" answer, then it would be good to have that column as a leading index in the "covering" index.
ON Receipts(`date`,`time`,`subtotal`)
If you want to ensure that you are not returning any rows for periods before 7AM, then you could simply add a HAVING i >= 0 clause to the query. (Rows for periods before 7AM would generate a negative number for i.)
SELECT (HOUR(r.time)-7)*2+(MINUTE(r.time) DIV 30) AS i
, SUM(r.subtotal) AS sum_subtotal
FROM Receipts r
GROUP BY i
HAVING i >= 0
ORDER BY i
PREVIOUSLY:
I've assumed that you want a result set similar to the one you are currently returning, but in one fell swoop. This query will return the same 33 rows you are currently retrieving, but with an extra column identifying the period (0 - 33). This is as close to your current solution that I could get:
SELECT t.i
, IFNULL(SUM(r.subtotal),0) AS sum_subtotal
FROM (SELECT (d1.i + d2.i + d4.i + d8.i + d16.i + d32.i) AS i
, ADDTIME('07:00:00',SEC_TO_TIME((d1.i+d2.i+d4.i+d8.i+d16.i+d32.i)*1800)) AS b_time
, ADDTIME('07:30:00',SEC_TO_TIME((d1.i+d2.i+d4.i+d8.i+d16.i+d32.i)*1800)) AS e_time
FROM (SELECT 0 i UNION ALL SELECT 1) d1 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 2) d2 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 4) d4 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 8) d8 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 16) d16 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 32) d32
HAVING i <= 33
) t
LEFT
JOIN Receipts r ON r.time >= t.b_time AND r.time < t.e_time
GROUP BY t.i
ORDER BY t.i
Some important notes:
It looks like your current solution may be "missing" rows from Receipts whenever the the seconds is exactly equal to '59' or '00'.
It also looks like you aren't concerned with the date component, you are just getting a single value for all dates. (I may have misread that.) If so, the separation of the DATE and TIME columns helps with this, because you can reference the bare TIME column in your query.
It's easy to add a WHERE clause on the date column. e.g. to get the subtotal rollups for just a single day e.g. add a WHERE clause before the GROUP BY.
WHERE r.date = '2011-09-10'
A covering index ON Receipts(time,subtotal) (if you don't already have a covering index) may help with performance. (If you include an equality predicate on the date column (as in the WHERE clause above, the most suitable covering index would likely be ON Receipts(date,time,subtotal).
I've made an assumption that the time column is of datatype TIME. (If it isn't, then a small adjustment to the query (in the inline view aliased as t) is probably called for, to have the datatype of the (derived) b_time and e_time columns match the datatype of the time column in Receipts.
Some of proposed solutions in other answers are not guaranteed to return 33 rows, when there are no rows in Receipts within a given time period. "Missing rows" may not be an issue for you, but it is a frequent issue with timeseries and timeperiod data.
I've made the assumption that you would prefer to have a guarantee of 33 rows returned. The query above returns a subtotal of zero when no rows are found matching a time period. (I note that your current solution will return a NULL in that case. I've gone and wrapped that SUM aggregate in an IFNULL function, so that it will return a 0 when the SUM is NULL.)
So, the inline query aliased as t is an ugly mess, but it works fast. What it's doing is generating 33 rows, with distinct integer values 0 thru 33. At the same time, it derives a "begin time" and an "end time" that will be used to "match" each period to the time column on the Receipts table.
We take care not to wrap the time column from the Receipts table in any functions, but reference just the bare column. And we want to ensure we don't have any implicit conversion going on (which is why we want the datatypes of b_time and e__time to match. The ADDTIME and SEC_TO_TIME functions both return TIME datatype. (We can't get around doing the matching and the GROUP BY operations.)
The "end time" value for that last period is returned as "24:00:00", and we verify that this is a valid time for matching by running this test:
SELECT MAKETIME(23,59,59) < MAKETIME(24,0,0)
which is successful (returns a 1) so we're good there.
The derived columns (t.b_time and t.e_time) could be included in the resultset as well, but they aren't needed to create your array, and it's (likely) more efficient if you don't include them.
And one final note: for optimal performance, it may be beneficial to load the inline view aliased as t into an actual table (a temporary table would be fine.), and then you could reference the table in place of the inline view. The advantage of doing that is that you could create an index on that table.
One way to make it pure SQL is to use a lookup table. I don't know MySql that well so there maybe alot of improvement to the code. All my code will be Ms Sql..
I would do it something like this:
/* Mock salesTable */
Declare #SalesTable TABLE (SubTotal int, SaleDate datetime)
Insert into #SalesTable (SubTotal, SaleDate) VALUES (1, '2012-08-01 12:00')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (2, '2012-08-01 12:10')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (3, '2012-08-01 12:15')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (4, '2012-08-01 12:30')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (5, '2012-08-01 12:35')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (6, '2012-08-01 13:00')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (7, '2012-08-01 14:00')
/* input data */
declare #From datetime, #To DateTime, #intervall int
set #from = '2012-08-01'
set #to = '2012-08-02'
set #intervall = 30
/* Create lookup table */
DECLARE #lookup TABLE (StartTime datetime, EndTime datetime)
DECLARE #tmpTime datetime
SET #tmpTime = #from
WHILE (#tmpTime <= #To)
BEGIN
INSERT INTO #lookup (StartTime, EndTime) VALUES (#tmpTime, dateAdd(mi, #intervall, #tmpTime))
set #tmpTime = dateAdd(mi, #intervall, #tmpTime)
END
/* Get data */
select l.StartTime, l.EndTime, sum(subTotal) from #SalesTable as SalesTable
join #lookUp as l on SalesTable.SaleDate >= l.StartTime and SalesTable.SaleDate < l.EndTime
group by l.StartTime, l.EndTime
In my query, I'm assuming one datetime field named date. This will give you all the groups starting at whatever datetime you give it to start with:
SELECT
ABS(FLOOR(TIMESTAMPDIFF(MINUTE, date, '2011-08-01 00:00:00') / 30)) AS GROUPING
, SUM(subtotal) AS subtotals
FROM
Receipts
GROUP BY
ABS(FLOOR(TIMESTAMPDIFF(MINUTE, date, '2011-08-01 00:00:00') / 30))
ORDER BY
GROUPING
Always use the proper datatypes for your data. In the case of your date/time columns, it's best to store them as (preferrably UTC zoned) timestamps. This is especially true in that some times don't exist for some dates (for some timzones, hence UTC). You will want an index on this column.
Also, your date/time range isn't going to give you what you want - namely, you're missing anything exactly on the hour (because you use a strict greater-than comparison). Always define ranges as 'lower-bound inclusive, upper-bound exclusive' (so, time >= '07:00:00' AND time < '07:30:00'). This is especially important for timestamps, which have an additional number of fields to deal with.
Because mySQL doesn't have recursive queries, you're going to want a couple of extra tables to pull this off. I'm referencing them as 'permanent' tables, but it would certainly be possible to define them in-line, if necessary.
You're going to want a Calendar table. These are useful for a number of reasons, but here we want them for their listing of dates. This will allow us to show dates that have subtotals of 0, if necessary. You're also going to want a value of times in half-hour increments, for the same reasons.
This should allow you to query your data like so:
SELECT division, COALESCE(SUM(subtotal), 0)
FROM (SELECT TIMESTAMP(calendar_date, clock_time) as division
FROM Calendar
CROSS JOIN Clock
WHERE calendar_date >= DATE('2011-09-10')
AND calendar_date < DATE('2011-09-11')) as divisions
LEFT JOIN Sales_Data
ON occurredAt >= division
AND occurredAt < division + INTERVAL 30 MINUTE
GROUP BY division
(Working example on SQLFiddle, which uses a regular JOIN for brevity)
I found a different solution too and posting it here for reference should anyone stumble upon this. Groups by half hour intervals.
SELECT SUM(total), time, date
FROM tableName
GROUP BY (2*HOUR(time) + FLOOR(MINUTE(time)/30))
Link for more info
http://www.artfulsoftware.com/infotree/queries.php#106
I have a table on my php page populated with data from a MySQL query from a single table
The table looks like:
Location Jan Feb Mar etc… Total
Location1 5 13 7 25
Location2 10 10 6 26
Location3 22 1 7 29
Etc…
The ‘Total’ is calculated by the query using Sum ie SUM(IF(month = '1', 1,0)) AS 'jan', SUM(IF(month = '2', 1,0)) ASfeb, SUM(IF(month = '3', 1,0))
What I want are the totals for each column (Jan, Feb, Mar etc)
I can do this in PHP by adding the values as they are extracted but can I do it the query – and is there an advantage to doing so?
Thanks in advance for help rendered.
To me, this is a strange layout of a database, but I normally use a row based approach along these lines:
location | month | total |
loc1 | 1 | 3 |
loc2 | 1 | 4 |
loc1 | 2 | 7 |
etc...
which makes totals and the like MUCH easier to perform.
However, given you structure, you could either continue to do what you are doing or use a little trickery. If you are doing more work with it in PHP (aside from just the totals) you could easily do a concat() in SQL and get a preformatted string that is ready to be explode()-ed into an array. From there you can use all the wonderful array functions in PHP to get totals or anything else you like.
Edit: Righto, that makes some difference.
If you are using a normal row based approach to storing the data, you can probably make the database do all the work. If you need a query to get the individual months AND the totals, you can either do it in PHP of course, but you can also do a little trick like this:
select
month, // assuming 1-12 for example
sum(someData)
from
yourTable
where
someCondition=1
group by
month
union all
select
13 as month, // assuming 1-12 for example
sum(someData)
from
yourTable
where
someCondition=1
group by
month
This is handy if you have a DB that has loads of free CPU, but a webserver that is struggling or you have a very LARGE amount of data that needs to be tallied up for example. Just make the database do the crunching and return it as some specific number you can identify in your PHP code - months 1-12 will be normal, month 13 is the total.
Too hard to say, it depends on the rest of the code. What are you doing with the results? Just outputting them? How complex is the query?
In any case I would GROUP BY location, month ORDER BY month then loop the result rows in PHP to build the table. (If any month is missing in a location, nothing will be returned, so make sure to watch out for that.)
please I need help with this (for better understanding please see attached image) because I am completely helpless.
As you can see I have users and they store their starting and ending datetimes in my DB as YYYY-mm-dd H:i:s. Now I need to find out overlaps for all users according to the most frequent time range overlaps (for most users). I would like to get 3 most frequented datatime overlaps for most users. How can I do it?
I have no idea which mysql query should I use or maybe it would be better to select all datetimes (start and end) from database and process it in php (but how?). As stated on image results should be for example time 8.30 - 10.00 is result for users A+B+C+D.
Table structure:
UserID | Start datetime | End datetime
--------------------------------------
A | 2012-04-03 4:00:00 | 2012-04-03 10:00:00
A | 2012-04-03 16:00:00 | 2012-04-03 20:00:00
B | 2012-04-03 8:30:00 | 2012-04-03 14:00:00
B | 2012-04-06 21:30:00 | 2012-04-06 23:00:00
C | 2012-04-03 12:00:00 | 2012-04-03 13:00:00
D | 2012-04-01 01:00:01 | 2012-04-05 12:00:59
E | 2012-04-03 8:30:00 | 2012-04-03 11:00:00
E | 2012-04-03 21:00:00 | 2012-04-03 23:00:00
What you effectively have is a collection of sets and want to determine if any of them have non-zero intersections. This is the exact question one asks when trying to find all the ancestors of a node in a nested set.
We can prove that for every overlap, at least one time window will have a start time that falls within all other overlapping time windows. Using this tidbit, we don't need to actually construct artificial timeslots in the day. Simply take a start time and see if it intersects any of the other time windows and then just count up the number of intersections.
So what's the query?
/*SELECT*/
SELECT DISTINCT
MAX(overlapping_windows.start_time) AS overlap_start_time,
MIN(overlapping_windows.end_time) AS overlap_end_time ,
(COUNT(overlapping_windows.id) - 1) AS num_overlaps
FROM user_times AS windows
INNER JOIN user_times AS overlapping_windows
ON windows.start_time BETWEEN overlapping_windows.start_time AND overlapping_windows.end_time
GROUP BY windows.id
ORDER BY num_overlaps DESC;
Depending on your table size and how often you plan on running this query, it might be worthwhile to drop a spatial index on it (see below).
UPDATE
If your running this query often, you'll need to use a spatial index. Because of range based traversal (ie. does start_time fall in between the range of start/end), a BTREE index will not do anything for you. IT HAS TO BE SPATIAL.
ALTER TABLE user_times ADD COLUMN time_windows GEOMETRY NOT NULL DEFAULT 0;
UPDATE user_times SET time_windows = GeomFromText(CONCAT('LineString( -1 ', start_time, ', 1 ', end_time, ')'));
CREATE SPATIAL INDEX time_window ON user_times (time_window);
Then you can update the ON clause in the above query to read
ON MBRWithin( Point(0,windows.start_time), overlapping_windows.time_window )
This will get you an indexed traversal for the query. Again only do this if your planning on running the query often.
Credit for the spatial index to Quassoni's blog.
Something like this should get you started -
SELECT slots.time_slot, COUNT(*) AS num_users, GROUP_CONCAT(DISTINCT user_bookings.user_id ORDER BY user_bookings.user_id) AS user_list
FROM (
SELECT CURRENT_DATE + INTERVAL ((id-1)*30) MINUTE AS time_slot
FROM dummy
WHERE id BETWEEN 1 AND 48
) AS slots
LEFT JOIN user_bookings
ON slots.time_slot BETWEEN `user_bookings`.`start` AND `user_bookings`.`end`
GROUP BY slots.time_slot
ORDER BY num_users DESC
The idea is to create a derived table that consists of time slots for the day. In this example I have used dummy (which can be any table with an AI id that is contiguous for the required set) to create a list of timeslots by adding 30mins incrementally. The result of this is then joined to bookings to be able to count the number of books for each time slot.
UPDATE For entire date/time range you could use a query like this to get the other data required -
SELECT MIN(`start`) AS `min_start`, MAX(`end`) AS `max_end`, DATEDIFF(MAX(`end`), MIN(`start`)) + 1 AS `num_days`
FROM user_bookings
These values can then be substituted into the original query or the two can be combined -
SELECT slots.time_slot, COUNT(*) AS num_users, GROUP_CONCAT(DISTINCT user_bookings.user_id ORDER BY user_bookings.user_id) AS user_list
FROM (
SELECT DATE(tmp.min_start) + INTERVAL ((id-1)*30) MINUTE AS time_slot
FROM dummy
INNER JOIN (
SELECT MIN(`start`) AS `min_start`, MAX(`end`) AS `max_end`, DATEDIFF(MAX(`end`), MIN(`start`)) + 1 AS `num_days`
FROM user_bookings
) AS tmp
WHERE dummy.id BETWEEN 1 AND (48 * tmp.num_days)
) AS slots
LEFT JOIN user_bookings
ON slots.time_slot BETWEEN `user_bookings`.`start` AND `user_bookings`.`end`
GROUP BY slots.time_slot
ORDER BY num_users DESC
EDIT I have added DISTINCT and ORDER BY clauses in the GROUP_CONCAT() in response to your last query.
Please note that you will will need a much greater range of ids in the dummy table. I have not tested this query so it may have syntax errors.
I would not do much in SQL, this is so much simpler in a programming language, SQL is not made for something like this.
Of course, it's just sensible to break the day down into "timeslots" - this is statistics. But as soon as you start handling dates over the 00:00 border, things start to get icky when you use joins and inner selects. Especially with MySQL which does not quite like inner selects.
Here's a possible SQL query
SELECT count(*) FROM `times`
WHERE
( DATEDIFF(`Start`,`End`) = 0 AND
TIME(`Start`) < TIME('$SLOT_HIGH') AND
TIME(`End`) > TIME('$SLOT_LOW'))
OR
( DATEDIFF(`Start`,`End`) > 0 AND
TIME(`Start`) < TIME('$SLOT_HIGH') OR
TIME(`End`) > TIME('$SLOT_LOW')
Here's some pseudo code
granularity = 30*60; // 30 minutes
numslots = 24*60*60 / granularity;
stats = CreateArray(numslots);
for i=0, i < numslots, i++ do
stats[i] = GetCountFromSQL(i*granularity, (i+1)*granularity); // low, high
end
Yes, that makes numslots queries, but no joins no nothing, hence it should be quite fast. Also you can easily change the resolution.
And another positive thing is, you could "ask yourself", "I have two possible timeslots, and I need the one where more people are here, which one should I use?" and just run the query twice with respective ranges and you are not stuck with predefined time slots.
To only find full overlaps (an entry only counts if it covers the full slot) you have to switch low and high ranges in the query.
You might have noticed that I do not add times between entries that could span multiple days, however, adding a whole day, will just increase all slots by one, making that quite useless.
You could however add them by selecting sum(DAY(End) - DAY(Start)) and just add the return value to all slots.
Table seems pretty simple. I would keep your SQL query pretty simple:
SELECT * FROM tablename
Then when you have the info saved in your PHP object. Do the processing with PHP using loops and comparisons.
In simplest form:
for($x, $numrows = mysql_num_rows($query); $x < $numrows; $x++){
/*Grab a row*/
$row = mysql_fetch_assoc($query);
/*store userID, START, END*/
$userID = $row['userID'];
$start = $row['START'];
$end = $row['END'];
/*Have an array for each user in which you store start and end times*/
if(!strcmp($userID, "A")
{
/*Store info in array_a*/
}
else if(!strcmp($userID, "B")
{
/*etc......*/
}
}
/*Now you have an array for each user with their start/stop times*/
/*Do your loops and comparisons to find common time slots. */
/*Also, use strtotime() to switch date/time entries into comparable values*/
Of course this is in very basic form. You'll probably want to do one loop through the array to first get all of the userIDs before you compare them in the loop shown above.