Average elapsed time over many records - php

Using PHP to access mySQL.
I've got 10,000+ records, each of which has a Start-time and an End-time.
I'll want to work with these records a lot, so I'd like to understand how I can optimize queries as I write PHP functions to explore the data.
This first one, I want the average elapsed time over all records, i.e. the average of all (end minus start).
[EDIT]
Apologies: the question is: can I write a single query that will return the average of all timediffs of all my records? Or do I do my averaging in PHP after getting the TIMEDIFFS back (in a loop of 10,000 iterations)?
If I can do it in the mySQL query, how do I construct it? Clearly, it will involve TIMEDIFF() and AVG(), but I'm not sure if I can do it in one query or if I need more.
This is malformed:
SELECT AVG(SELECT TIMEDIFF(startdatetime,enddatetime) from myTable) from myTable
Assume this is my table:
myTable:
ID startdatetime enddatetime
1 2014-05-06 12:31:00 1 2014-05-06 12:41:00
2 2014-05-06 12:51:00 1 2014-05-06 12:55:00
I want to get back the average: (41-31)+(55-51)/2 = 7
(I imagine I'll have to convert the elapsed time to seconds, then average it, then convert it back to minutes.)

Ah OK. Once I figured out what I was after, the searching around got a lot easier. I've been able to piece together this:
SELECT SEC_TO_TIME(AVG(TIME_TO_SEC(TIMEDIFF(enddatetime,startdatetime)))) from myTable WHERE (enddatetime IS NOT NULL AND startdatetime IS NOT NULL)

Related

MySQL SUM time(3) with milliseconds

I have mysql column time(3) and it's storing good value of time..
but then I want to sum two times it converts to bad time format;
I have two records:
id | time
---|-----------
1 | 00:00:15.490
2 | 00:02:14.900
So in real I shoud get: 00:02:30.390
but I get 230.390
is anyway to get correct answer from Mysql?
P.S. I am using php for functions but dont want to use it, unless there is other way.
Need to sum times with MILLISECONDS
for now I am using query SELECT SUM(time) AS total_time FROM times WHERE 1
Provided your table definition is something like this:
create table test (
id integer,
`time` time(3) -- important to specify precision
);
You can do this:
select time(sum(`time`))
from test;
note: requires mysql 5.6+
edit
Actually, time is the wrong function to use, as it doesn't have many smarts.
use sec_to_time instead, ie:
select sec_to_time(sum(`time`))
from test;
time extracts a time value, sec_to_time calculates a time value -- ie, time(70) returns NULL because there's no valid time that has 70 seconds, where as sec_to_time will correctly return '00:01:10' for the same input
edit
Turns out i'm still wrong. Lets try treating the milliseconds separately to the rest of the time:
select sec_to_time(sum(time_to_sec(`time`)) + sum(microsecond(`time`))/1000000)
from test;
Wrap your outputted result with the time function. So:
time(sum(`time`))
where time is the time function and 'time' is your summed column.

MYSQL Group By date + 3 hours

I have a query that counts the "Xp" difference per day from my database, this all works as it should however it groups from midnight-midnight, what I would like to do is group 3am to 3am.
However another issue I think I may have is that my query may not always have the rows being the exact second at 3am due to the fact that it has to run a huge query and retrieve data from another website per user profile, so it should get all data after 3am, but before maybe 4am or something, so it has enough time to get all of the rows.
my current mysql is:
SELECT FROM_UNIXTIME(date, '%Y%m%d') AS YYYYMMDD, MAX(xp)-MIN(xp) AS xp_gain
FROM skills
WHERE userID = '$checkID'
AND skill = '$skill'
AND date >= '$date'
GROUP BY YYYYMMDD
ORDER BY date ASC
The best way to handle this is to add (if you can) another column that is just a DATE (not a DATETIME) and have this field rollover from one day to the next at 3am, (you can to this by subtracting 3 hours from the current time when doing the INSERT).
This gives you a couple of benefits, especially with a large number of rows:
It is much faster to query or group by a DATE than a range of
DATETIME
It will always query the rows at the exact second of 3am,
regardless of how long the query takes.

Getting temperature difference between intervals

my question is more "theoretical" than practical - in other words, Im not really looking for a particular code for how to do something, but more like an advice about how to do it. Ive been thinking about it for some time but cannot come up with some feasible solution.
So basically, I have a MySQL database that saves weather information from my weather station.
Column one contains date and time of measurement (Datetime format field), then there is a whole range of various columns like temp, humidity etc. The one I am interested in now is the one with the temperature. The data is sorted by date and time ascending, meaning the most recent value is always inserted to the end.
Now, what I want to do is using a PHP script, connect to the db and find temperature changes within a certain interval and then find the maximum. In other words, for example lets say I choose interval 3h. Then I would like to find the time, from all the values, where there was the most significant temperature change in those 3 h (or 5h, 1 day etc.).
The problem is that I dont really know how to do this. If I just get the values from the db, Im getting the values one by one, but I cant think of a way of getting a value that is lets say 3h from the current in the past. Then it would be easy, just subtracting them and get the date from the datetime field at that time, but how to get the values that are for example those 3 h apart (also, the problem is that it cannot just simply be a particular number of rows to the past as the intervals of data save are not regular and range between 5-10mins, so 3 h in the past could be various number of rows).
Any ideas how this could be done?
Thx alot
Not terribly hard actually. So I would assume it's a two column table with time and temp fields, where time is a DATETIME field
SELECT MAX(temp) FROM records
WHERE time >= "2013-10-14 12:00:00" and time <= "2013-10-14 15:00:00"
SELECT t1.*, ABS(t1.temperature - t2.temperature) as change
FROM tablename t1
JOIN tablename t2
ON t2.timecolumn <= (t1.timecolumn - INTERVAL 3 HOUR)
LEFT JOIN tablename t3
ON t3.timecolumn <= (t1.timecolumn - INTERVAL 3 HOUR)
AND t2.timecolumn > t3.timecolumn
WHERE
t3.some_non_nullable_column IS NULL
ORDER BY ABS(t1.temperature - t2.temperature) DESC
LIMIT 1;
1 table joined 2 times on itself, t2 is the quaranteed direct predecessor of t1 t2 is the closest record with offset 3h before or more. This could with the proper indexes, and a limited amount of data (where limited is in the eye of the beholder) be quite performant. However, if you need a lot of those queries in a big dataset, this is a prime candidate for denormalization, were you create a table which also stores the calculated offsets compared to the previous entry.

Query optimization: 1800 queries -> 50

I'm querying a postgresql database which holds an agenda-table:
agenda |> id (int) | start (timestamp) | end (timestamp) | facname | .....
I want to make a kind of summary of one day in the form of a 'timeline' consisting of a small picture for every 15 minutes interval: on / off according to the availability of the facility.
Now is it relatively simple to query the database for every 15 minutes and check if a reservation is present and change the img source.
But if you want to make an overview of 10 days and 5 different facilities you'll end up querying the database
10(days) * 36(quaters a day) * 5 (facilities) = 1800 database querys/page load.
So this results in a very heavy pay load.
Is there a way I can reduce the amount of queries and so the payload?
To solve this issue, I think we may first find a way to, given a timestamp, find in which quarter of an hour it belongs to. For instance, the hour 08:38 belongs to quarter 08:30, the 08:51 to 08:45, and so on.
To do that, we can use a function like this:
CREATE FUNCTION date_trunc_quarter(timestamp )
RETURNS TIMESTAMP
LANGUAGE SQL
IMMUTABLE
AS $$
SELECT * FROM
generate_series(
date_trunc('hour',$1),
date_trunc('hour',$1)+interval '1hour',
interval '15min'
) AS gen(quarter)
WHERE gen.quarter < $1
ORDER BY gen.quarter
DESC LIMIT 1
$$;
It uses the generate_series function to generate all the four quarters (e.g. 08:00, 08:15, 08:30 and 08:45) within the same hour as the given timestamp (e.g. 08:38), do get the given hour it uses the well-known date_trunc function. Then, it filters only the quarters which is smaller then the given timestamp, sort it and get the bigger one. As it is always only four values at most, sorting it is not a big issue.
Now, with that you can easily query like this:
SELECT date_trunc_quarter(tstart) AS quarter, count(*)
FROM agenda
GROUP BY quarter
ORDER BY quarter;
I think it is fast enough, and to make it even faster, you can create an expression index on agenda:
CREATE INDEX idx_agenda_quarter ON agenda ((date_trunc_quarter(tstart)));
See this fiddle with a self-contained test case of it all.

Best way to query calendar events?

I'm creating a calendar that displays a timetable of events for a month. Each day has several parameters that determine if more events can be scheduled for this day (how many staff are available, how many times are available etc).
My database is set up using three tables:
Regular Schedule - this is used to create an array for each day of the week that outlines how many staff are available, what hours they are available etc
Schedule Variations - If there are variations for a date, this overrides the information from the regular schedule array.
Events - Existing events, referenced by the date.
At this stage, the code loops through the days in the month and checks two to three things for each day.
Are there any variations in the schedule (public holiday, shorter hours etc)?
What hours/number of staff are available for this day?
(If staff are available) How many events have already been scheduled for this day?
Step 1 and step 3 require a database query - assuming 30 days a month, that's 60 queries per page view.
I'm worried about how this could scale, for a few users I don't imagine that it would be much of a problem, but if 20 people try and load the page at the same time, then it jumps to 1200 queries...
Any ideas or suggestions on how to do this more efficiently would be greatly appreciated!
Thanks!
I can't think of a good reason you'd need to limit each query to one day. Surely you can just select all the values between a pair of dates.
Similarly, you could use a join to get the number of events scheduled events for a given day.
Then do the loop (for each day) on the array returned by the database query.
Create a table:
t_month (day INT)
INSERT
INTO t_month
VALUES
(1),
(2),
...
(31)
Then query:
SELECT *
FROM t_month, t_schedule
WHERE schedule_date = '2009-03-01' + INTERVAL t_month.day DAY
AND schedule_date < '2009-03-01' + INTERVAL 1 MONTH
AND ...
Instead of 30 queries you get just one with a JOIN.
Other RDBMS's allow you to generate rowsets on the fly, but MySQL doesn't.
You, though, can replace t_month with ugly
SELECT 1 AS month_day
UNION ALL
SELECT 2
UNION ALL
...
SELECT 31
I faced the same sort of issue with http://rosterus.com and we just load most of the data into arrays at the top of the page, and then query the array for the relevant data. Pages loaded 10x faster after that.
So run one or two wide queries that gather all the data you need, choose appropriate keys and store each result into an array. Then access the array instead of the database. PHP is very flexible with array indexing, you can using all sorts of things as keys... or several indexes.

Categories