Creating hour groups for time series data MySQL

Creating hour groups for time series data MySQL - php

I have a MySQL database with data recorded every 15 minutes. For simplicity, lets assume there are 2 fields:
DATETIME Created
Double Value
I would like to draw a chart which needs for each hour the opening, min, max, and closing values for an hour. To do this I need to return results from my MySQL query to my PHP to create a JSON. I would like to do this in the MySQL query so that the response is cached.
Here is an example of the problem, given 9 data points trying to get 2 hour groups:
Creation Value
2014-03-25 12:15:00 413.17011
2014-03-25 12:00:00 414
2014-03-25 11:45:00 415
2014-03-25 11:30:00 415
2014-03-25 11:15:00 415.5
2014-03-25 11:00:00 415.5
2014-03-25 10:45:00 416
2014-03-25 10:30:00 416
2014-03-25 10:15:00 415.99
I would need:
Hour 1 (11:15:00 to 12:15:00)
Open: 415.5
Close: 413.17011
High: 415.5
Low: 413.17011
Hour 2 (10:15:00 to 11:15:00)
Open: 415.99
Close: 415.5
High: 416
Low: 415.5
Of course for the full 24 hours this would need repeating, this is just an example.
Any help is really appreciated!
Here is the current MySQL dump for the example (Using MySQL version 2.6.4-pl3):
--
-- Table structure for table `exampleTable`
--
CREATE TABLE `exampleTable` (
`created` datetime NOT NULL,
`value` double NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
--
-- Dumping data for table `exampleTable`
--
INSERT INTO `exampleTable` VALUES ('2014-03-25 12:15:00', 413.17011);
INSERT INTO `exampleTable` VALUES ('2014-03-25 12:00:00', 414);
INSERT INTO `exampleTable` VALUES ('2014-03-25 11:45:00', 415);
INSERT INTO `exampleTable` VALUES ('2014-03-25 11:30:00', 415);
INSERT INTO `exampleTable` VALUES ('2014-03-25 11:15:00', 415.5);
INSERT INTO `exampleTable` VALUES ('2014-03-25 11:00:00', 415.5);
INSERT INTO `exampleTable` VALUES ('2014-03-25 10:45:00', 416);
INSERT INTO `exampleTable` VALUES ('2014-03-25 10:30:00', 416);
INSERT INTO `exampleTable` VALUES ('2014-03-25 10:15:00', 415.99);

Get it to work
You might try
SELECT
DATE(created) AS day,
HOUR(created) AS hour,
(
SELECT Value FROM `table` AS b
WHERE DATE(a.created) = DATE(b.created)
AND HOUR(a.created) = HOUR(b.created)
ORDER BY created ASC LIMIT 1
) AS Open,
(
SELECT Value FROM `table` AS b
WHERE DATE(a.created) = DATE(b.created)
AND HOUR(a.created) = HOUR(b.created)
ORDER BY created DESC LIMIT 1
) AS Close,
MIN(value) AS Low,
MAX(value) AS High
FROM `table` AS a
GROUP BY DATE(created), HOUR(created)
this groups all your rows by DATE+HOUR and computes the MIN respectively MAX as Low or High. To find the first and last row for Open and Close, the easiest in SQL syntax is a subselect. It selects all rows which are relevant for the current row, and sorts them ascending or descending. Then selects the first row.
Please consider that this groups only by hour. Instead of
Hour 1 (11:15:00 to 12:15:00)
Hour 2 (10:15:00 to 11:15:00)
this groups like
Hour 1 (11:00:00 to 11:59:00)
Hour 2 (10:00:00 to 10:59:00)
If you want to keep the 15 minutes offset, you may subtract this from your created timestamp (created - INTERVAL 15 MINUTE) at all occurrences of created in the sql query above.
I created a working sqlfiddle for you.
Performance
Just as hint: If you can, you might want to split date and time into two columns (of types date and time). This way you do not need to cast DATE() on created everytime, but can use the new date column instead. You can then add a combined index to this new columns too, which speeds up your query. See this sqlfiddle for an example.

To get your grouping right, you can use
FLOOR(( UNIX_TIMESTAMP(myTable.dateCreated) - 900 ) / 3600)
where 3600 sets the interval at 1 hour and the - 900 sets the offset at 00:15
Since you need the MIN() and MAX for each of your four values, you'll need to JOIN the main table to itself but grouped by the min or max (based on the column).
finally, you have each sub-query (joined table) calculate the grouping hour above so you can use that to join them. Here's what I cam up with (with slightly different column names and
SELECT openDate,Open,Close,High,Low
FROM (SELECT FLOOR(( UNIX_TIMESTAMP(myTable.dateCreated) - 900 ) / 3600)
AS
theHour,
myTable.value AS Open,myTable.dateCreated openDate
FROM myTable
JOIN (SELECT value,MIN(dateCreated) AS dateCreated
FROM myTable
GROUP BY FLOOR(( UNIX_TIMESTAMP(dateCreated) - 900 )
/ 3600)
) AS
aggTable
ON aggTable.dateCreated = myTable.dateCreated) AS
openTable
LEFT JOIN (SELECT FLOOR(( UNIX_TIMESTAMP(myTable.dateCreated) - 900
) /
3600) AS
theHour
,
myTable.value AS Close,myTable.dateCreated closeDate
FROM myTable
JOIN (SELECT value,MAX(dateCreated) AS dateCreated
FROM myTable
GROUP BY FLOOR(( UNIX_TIMESTAMP(dateCreated) - 900 ) / 3600)
) AS
aggTable
ON aggTable.dateCreated = myTable.dateCreated) AS closeTable
ON openTable.theHour = closeTable.theHour
LEFT JOIN (SELECT
FLOOR((
UNIX_TIMESTAMP(myTable.dateCreated) - 900 ) / 3600) AS
theHour,
MAX(
value)
AS High
FROM myTable
GROUP BY theHour) AS highTable
ON closeTable.theHour = highTable.theHour
LEFT JOIN (SELECT
FLOOR((
UNIX_TIMESTAMP(myTable.dateCreated) - 900 ) / 3600) AS
theHour,
MIN(
value)
AS Low
FROM myTable
GROUP BY theHour) AS lowTable
ON highTable.theHour = lowTable.theHour

Related

MySQL query to check the common time interval and display the interval

I´m looking for some help with the following problem.
I need to check, whether all selected entries share a common time interval and if so, what is the time interval.
To visualize the problem:
id openingTime closingTime
1 09:00 18:00
2 11:00 15:00
3 12:00 20:00
4 21:00 23:00
Desired output is to get either an empty result or one result with the overlapping interval.
Examples:
selected id openingTime closingTime
1,2 => 11:00 15:00
1,2,3 => 12:00 15:00
1,3 => 12:00 18:00
1,2,3,4 => empty empty
Having IDs with overlapping intervals, the SQL command is easy:
SELECT MAX(openingTime), MIN(closingTime) FROM table WHERE id IN (ids)
But this SQL query doesn't deal with the cases when one or more entries are not sharing the same interval.
Here is some sample data and DB fiddle to try it out:
CREATE TABLE `mytable` (
`id` int(11) NOT NULL,
`openingtime` time NOT NULL,
`closingtime` time NOT NULL
);
INSERT INTO `mytable` (`id`, `openingtime`, `closingtime`) VALUES
(1, '09:00:00', '18:00:00'),
(2, '11:00:00', '15:00:00'),
(3, '12:00:00', '20:00:00'),
(4, '21:00:00', '23:00:00');
Thank you for your help.
D.

I am thinking exists and aggregation:
select min(openingtime), max(closingtime)
from (
select t.*,
exists (
select 1
from mytable t1
where t1.openingtime > t.closingtime or t1.closingtime < t.openingtime
) flag
from mytable t
) t
having max(flag) = 0
The subquery checks if any other row in the table does not overlap with the current row. Then the outer query aggregates, and uses having to filter out the whole result if any row was flagged.

A common interval is going to start at an opening time and end at the next closing time. So, you can test each starting time, counting the number of overlaps.
select o.time, min(t.closingtime)
from (select distinct time from t) o join
t
on o.time >= t.openingtime and o./time <= t.closingtime
group by o.openingtime
having count(*) = (select count(*) from t);
This return no rows if there are no overlaps. It returns all overlapping periods if there is more than one (which I don't think is possible with one row per id).

check dates on database

On my database table I have 2 columns, start_date and end_date.
Sample data would be:
-------------------------------
start_date | end_date
-------------------------------
2017-11-01 2017-11-02
2017-11-03 2017-11-07
2017-11-20 2017-11-28
2017-11-13 2017-12-02
-------------------------------
I need to find if there are 5 consecutive days that are not yet used, which in this case, there is:
(2017-11-08 to 2017-11-13)
I'm using PHP and MySQL.
Thanks in advance!

You'd need to check for edge cases depending on your actual data and if there were no overlap dates, but this is a good start for the provided data.
Assuming table and data as defined as below:
CREATE TABLE
`appointments`
(
`appointment_id` INT PRIMARY KEY AUTO_INCREMENT,
`start_date` DATE,
`end_date` DATE
);
INSERT INTO
`appointments`
(`start_date`, `end_date`)
VALUES
('2017-11-01', '2017-11-02'),
('2017-11-03', '2017-11-07'),
('2017-11-20', '2017-11-28'),
('2017-11-13', '2017-12-02');
If you order the rows, and take the lag from the end date before it, and take any gaps of 5 or more. In SQL Server there are LAG functions, but here's a way of doing the same. Then once you have a table of all rows and their corresponding gaps, you take the start date of that period, and create the gap period from the number of days between. Since TIMESTAMPDIFF is inclusive, you need to subtract a day.
SET #end_date = NULL;
SELECT
DATE_ADD(`start_date`, INTERVAL -(`gap_from_last`-1) DAY) AS `start_date`,
`start_date` AS `end_date`
FROM
(
SELECT
`appointment_id`,
CASE
WHEN #end_date IS NULL THEN NULL
ELSE TIMESTAMPDIFF(DAY, #end_date, `start_date`)
END AS `gap_from_last`,
`start_date`,
#end_date := `end_date` AS `end_date` -- Save the lag date from the row before
FROM
`appointments`
ORDER BY
`start_date`,
`end_date`
) AS `date_gap` -- Build table that has the dates and the number of days between
WHERE
`gap_from_last` > 5;
Provides:
start_date | end_date
------------------------
2017-11-08 | 2017-11-13
Edit: Oops! Forgot the SQLFiddle (http://sqlfiddle.com/#!9/09cfce/16)

SELECT x.end_date + INTERVAL 1 DAY unused_start
, MIN(y.start_date) unused_end
FROM appointments x
JOIN appointments y
ON y.start_date >= x.end_date
GROUP
BY x.start_date
HAVING DATEDIFF(MIN(y.start_date),unused_start) >= 5;

SQL for grouping by timeframe with empty frames

My goal is it to create a SQL-Query that counts all items in a certain time frame (e.g. 5min)
That's my code so far:
SELECT FROM_UNIXTIME(FLOOR(timestamp_stop/5*60)*(5*60), '%h:%i') AS timekey, timestamp_stop, count(item) AS performance
FROM task
WHERE done = 1
GROUP BY timekey
ORDER BY timestamp_stop ASC
That works great, but doesn't include time frames in which there aren't any records in the database.
I would like to also get these 0-count-ones, up to the current time.
Currently I have no simple/elegant solution in my mind. Any ideas?
Some little post processing in php would also be possible.

As Gordon mentioned, you probably want a secondary table as a basis for ALL 5-minute intervals. I have done similar with a query to self-build using MySQL variables.
select
YourTable.WhateverFields
from
( select
#startTime RangeStart,
#startTime := date_add( #startTime, interval 5 MINUTE ) RangeEnd
from
( select #startTime := '2014-10-20' ) sqlvars,
AnyTableThatHasAsManyDaysYouExpectToReport
limit
12 * numberOfHoursYouNeed * numberOfDaysYouNeed ) DynamicTimeRange
LEFT JOIN YourTable
on YourTable.DateTimeField >= DynamicTimeRange.RangeStart
AND YourTable.DateTimeField < DynamicTimeRange.RangeEnd
So, in this example, the innermost declars a variable "startTime" to Oct 20, 2014 which defaults to 12:00:00. Then the one out from that creates a result set of two columns for a RangeStart and RangeEnd and might look something like...
RangeStart RangeEnd
2014-10-20 00:00 2014-10-20 00:05
2014-10-20 05:00 2014-10-20 00:10
2014-10-20 10:00 2014-10-20 00:15
2014-10-20 05:00 2014-10-20 00:20
2014-10-20 20:00 2014-10-20 00:25
The table reference "AnyTableThatHasAsManyDaysYouExpectToReport" is just that... any table in your database that has at least as many records as you would need to generate your 5-minute intervals for however many hours and days. If you need 1 day worth = 12 records * 5 minutes = 1 hour * 24 hrs = 24*12 = 288 records needed. If you wanted a week, then so be it... multiply that by 7 so my sample just has place-holders to help clarify the intent...
But with the LEFT JOIN, you get all the intervals...

If there are such time frame, but the where clause filters out the records, you can do:
SELECT FROM_UNIXTIME(FLOOR(timestamp_stop/5*60)*(5*60), '%h:%i') AS timekey,
timestamp_stop,
sum(item is not null and done = 1) AS performance
FROM task
GROUP BY timekey
ORDER BY timestamp_stop ASC;
If you still have gaps, then you need to generate a table (or subquery) containing the list of the time frames that you want and use left join.
EDIT:
A subquery is not pleasant. You have to list all the time values. Something like:
SELECT q.timekey, t.timestamp_stop, coalesce(t.performance, 0) as performance
FROM (SELECT '00:00' as timekey UNION ALL
SELECT '00:05' UNION ALL
. . .
) q LEFT JOIN
(SELECT FROM_UNIXTIME(FLOOR(timestamp_stop/5*60)*(5*60), '%h:%i') AS timekey,
timestamp_stop,
COUNT(item) AS performance
FROM task
WHERE done = 1
GROUP BY timekey
) t
ON t.timekey = q.timekey
ORDER BY timestamp_stop ASC;

postgresql max(count(*)) - php

I have a problem in postgresql.
I have one cohorte (gathering of people) and i would like counting the persons in this cohorte.
Begin date : "2014-09-01", End date : "2014-11-30".
I have 5 persons between 09/01 and 09/22
I have 5 persons between 09/20 and 09/25
I have 5 persons between 09/26 and 10/05
I have 5 persons between 10/01 ans 11/30
I want to have the max of accommodation for each month between the begin date and the end date in SQL (or PHP). Expected max person count:
September(09) => 10
October(10) => 10
November(11) => 5

Find the maximum of simultaneously present persons on a single day for every month in a given period.
I suggest generate_series() to produce the series of days in your period. Then aggregate twice:
First to get a count for each day. A single day can be dealt with plain BETWEEN. Your ranges are obviously meant to be with include borders.
Second to get the maximum per month.
SELECT date_trunc('month', day)::date AS month, max(ct) AS max_ct
FROM (
SELECT g.day, count(*) AS ct
FROM cohorte
,generate_series('2014-09-01'::date -- first of Sept.
,'2014-11-30'::date -- last of Nov.
,'1 day'::interval) g(day)
WHERE g.day BETWEEN t_begin AND t_end
GROUP BY 1
) sub
GROUP BY 1
ORDER BY 1;
Returns:
month | max_ct
-----------+--------
2014-09-01 | 10
2014-10-01 | 10
2014-11-01 | 5
Use to_char() to prettify the month output.
SQL Fiddle .. is down ATM. Here is my test case (that you should have provided):
CREATE TEMP TABLE cohorte (
cohorte_id serial PRIMARY KEY
,person_id int NOT NULL
,t_begin date NOT NULL -- inclusive
,t_end date NOT NULL -- inclusive
);
INSERT INTO cohorte(person_id, t_begin, t_end)
SELECT g, '2014-09-01'::date, '2014-09-22'::date
FROM generate_series (1,5) g
UNION ALL
SELECT g+5, '2014-09-20', '2014-09-25'
FROM generate_series (1,5) g
UNION ALL
SELECT g+10, '2014-09-26', '2014-10-05'
FROM generate_series (1,5) g
UNION ALL
SELECT g+15, '2014-10-01', '2014-11-30'
FROM generate_series (1,5) g;
For more complex checks I'd suggest the OVERLAPS operator:
Find overlapping date ranges in PostgreSQL
For more complex scenarios I'd also consider range types:
Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL

can't you use window function?
I'd try something like this (I've not tested this code, just exposed my thoughts)
SELECT max(count) FROM (
SELECT count(*) OVER (PARTITION BY ???) as count
FROM contract
WHERE daterange(dateStart, dateEnd, '[]') && daterange('2014-09-01', '2014-10-01', '[)')
) as max
Here, my problem remains that I can't find a way to partition for each day of the interval. Maybe this is a wrong approach, but I would be interested by a solution based on windows.
edit: with this request, you have the max of simultaneous present, but over all the time, not only a given month
with presence as (
SELECT id, generate_series(begin_date, end_date, '1 day'::interval) AS date
FROM test
),
presents as (
SELECT count(*) OVER (PARTITION BY date) AS count
FROM presence
)
SELECT max(count) from presents;
Here we come, I think
Imagine your person table has 3 columns :
id
entrance_date
leaving_date
the request would look like
WITH presents as (
SELECT id,
daterange(entrance_date, leaving_date, '[]') * daterange('2014-09-01', '2014-11-30', '[]') as range
FROM person
WHERE daterange(entrance_date, leaving_date, '[]') && daterange('2014-09-01', '2014-11-30', '[]')
),
present_per_day as (
SELECT id,
generate_series(lower(range), upper(range), '1 day'::interval) AS date
FROM presents
),
count_per_day as (
SELECT count(*) OVER (PARTITION BY date) AS count,
date
FROM present_per_day
),
SELECT max(count) OVER (PARTITION BY date_part('year', date), date_part('month', date)) as max,
date_part('year', date),
date_part('month', date)
FROM count_per_day;
(I have to leave, I hope I'll have time to test it later)
In fact, #erwin solution is much much more easy and efficient than this one.

Timetable Stats PHP Page

I have a field called dPostTime, which has the date and time stored of an entry.
If I wanted to create a page that showed me the number of entries per hour, would I need to have 24 queries, one for each hour, or what's the best way to do this.
I'd like the chart to appear as follows:
5-6pm - 24 posts
6-7pm - 56 posts
7-8pm - 34 posts
8-9pm - 35 posts
etc......

MySQL doesn't have recursive functionality, so you're left with using the NUMBERS table trick -
Create a table that only holds incrementing numbers - easy to do using an auto_increment:
DROP TABLE IF EXISTS `example`.`numbers`;
CREATE TABLE `example`.`numbers` (
`id` int(10) unsigned NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Populate the table using:
INSERT INTO NUMBERS
(id)
VALUES
(NULL)
...for as many values as you need.
Use DATE_ADD to construct a list of dates, increasing the days based on the NUMBERS.id value. Replace "2010-01-01" and "2010-01-02" with your respective start and end dates (but use the same format, YYYY-MM-DD HH:MM:SS) -
SELECT x.dt
FROM (SELECT DATE_FORMAT(DATE_ADD('2010-01-01', INTERVAL (n.id - 1) HOUR), '%H:%i') AS dt
FROM numbers n
WHERE DATE_ADD('2010-01-01', INTERVAL (n.id - 1) HOUR) <= '2010-01-02' ) x
LEFT JOIN onto your table of data based on the datetime portion.
SELECT x.dt,
COALESCE(COUNT(a.dPostTime), 0) AS numPerHour
FROM (SELECT DATE_FORMAT(DATE_ADD('2010-01-01', INTERVAL (n.id - 1) HOUR), '%H') AS dt
FROM numbers n
WHERE DATE_ADD('2010-01-01', INTERVAL (n.id - 1) HOUR) <= '2010-01-02' ) x x
LEFT JOIN YOUR_TABLE a ON DATE_FORMAT(a.dPostTime, '%H') = x.dt
GROUP BY x.dt
ORDER BY x.dt
Why Numbers, not Dates?
Simple - dates can be generated based on the number, like in the example I provided. It also means using a single table, vs say one per data type.

SELECT COUNT(*),
HOUR(`dPostTime`) AS `hr`
FROM `table`
GROUP BY `hr`
After that in php format hr to be equal 'hr - 1' - hr

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.