PHP: report table with date gaps - php

I have a table in DB which contains summaries for days. Some days may not have the values.
I need to display table with results where each column is a day from user selected range.
I've tried to play with timestamp (end_date - start_date / 86400 - how many days in report, then use DATEDIFF(row_date, 'user_entered_start_date') and create array from this indexes), but now I've got whole bunch of workarounds for summer time :( Any examples or ideas how to make this correct?
P.S. I need to do this on PHP side, because DB is highly loaded.

Try the DateTime object:
$reportdate=date_create($user_startdate);
$interval=new DateInterval('P1D');//1 day interval
$query="SELECT s.start_date, s.end_date, s.info
FROM summary s
WHERE s.start_date>='$user_startdate'
AND s.end_date<='$user_enddate'";
$r=mysqli_query($db,$query);
while($row=$r->fetch_assoc()){
$rowdate=create_date($row['start_date']);
while($reportdate < $rowdate) {//won't work if $rowdate is a timestamp!
//output $reportdate and blank row
$reportdate->add($interval); //increment date
}
//output $rowdate and $row[info]
$reportdate->add($interval); //increment date
}
ETA another option:
Based on your comments, it may be easier to dynamically generate the missing dates. For this you'll need an integer table, the number of dates that should appear in your report output, a start date and a date increment.
In your db create a table called numbers and insert the numbers 0 through 9:
CREATE TABLE numbers (
num int(10) unsigned NOT NULL,
PRIMARY KEY (num)
);
The numbers table can be used for making sequences of integers. For instance, to get a sequence from 1 to 20:
SELECT i FROM (
SELECT 10*n1.num + n2.num AS i
FROM numbers n1 CROSS JOIN numbers n2) nums
WHERE i BETWEEN 1 AND 20
ORDER BY i ASC;
If you left join a sequence query like the above to your regular query, you should be able to generate both real and blank rows. e.g.
SELECT alldates.d, mystuff.* FROM
(SELECT date_add($start_date, interval i day) AS d FROM
(SELECT 10*n1.num + n2.num AS i
FROM numbers n1 CROSS JOIN numbers n2
ORDER BY i ASC) nums
WHERE i <= DATEDIFF($end_date,$start_date)) alldates
LEFT JOIN mystuff
ON alldates.d = mystuff.somedate
ORDER BY $whatever;

You could "pre-load" the database with blank values. Then do UPDATE queries instead of inserts. All your days with no data will be pre-populated. You can create a simple script that creates a month's/year's/decade's worth of (blank) data and run it as often as you need. Then you never have to worry about how the days with no data get into the database - you start with your data "zeroed" out.

Related

Getting temperature difference between intervals

my question is more "theoretical" than practical - in other words, Im not really looking for a particular code for how to do something, but more like an advice about how to do it. Ive been thinking about it for some time but cannot come up with some feasible solution.
So basically, I have a MySQL database that saves weather information from my weather station.
Column one contains date and time of measurement (Datetime format field), then there is a whole range of various columns like temp, humidity etc. The one I am interested in now is the one with the temperature. The data is sorted by date and time ascending, meaning the most recent value is always inserted to the end.
Now, what I want to do is using a PHP script, connect to the db and find temperature changes within a certain interval and then find the maximum. In other words, for example lets say I choose interval 3h. Then I would like to find the time, from all the values, where there was the most significant temperature change in those 3 h (or 5h, 1 day etc.).
The problem is that I dont really know how to do this. If I just get the values from the db, Im getting the values one by one, but I cant think of a way of getting a value that is lets say 3h from the current in the past. Then it would be easy, just subtracting them and get the date from the datetime field at that time, but how to get the values that are for example those 3 h apart (also, the problem is that it cannot just simply be a particular number of rows to the past as the intervals of data save are not regular and range between 5-10mins, so 3 h in the past could be various number of rows).
Any ideas how this could be done?
Thx alot
Not terribly hard actually. So I would assume it's a two column table with time and temp fields, where time is a DATETIME field
SELECT MAX(temp) FROM records
WHERE time >= "2013-10-14 12:00:00" and time <= "2013-10-14 15:00:00"
SELECT t1.*, ABS(t1.temperature - t2.temperature) as change
FROM tablename t1
JOIN tablename t2
ON t2.timecolumn <= (t1.timecolumn - INTERVAL 3 HOUR)
LEFT JOIN tablename t3
ON t3.timecolumn <= (t1.timecolumn - INTERVAL 3 HOUR)
AND t2.timecolumn > t3.timecolumn
WHERE
t3.some_non_nullable_column IS NULL
ORDER BY ABS(t1.temperature - t2.temperature) DESC
LIMIT 1;
1 table joined 2 times on itself, t2 is the quaranteed direct predecessor of t1 t2 is the closest record with offset 3h before or more. This could with the proper indexes, and a limited amount of data (where limited is in the eye of the beholder) be quite performant. However, if you need a lot of those queries in a big dataset, this is a prime candidate for denormalization, were you create a table which also stores the calculated offsets compared to the previous entry.

Optimising PHP/mysql algorithm

I have to make some statistics for my application, so I need an algorithm with a performance as best as possible. I have some several question.
I have a data structure like this in the mysql database:
user_id group_id date
1 5 2012-11-20
1 2 2012-11-01
1 4 2012-11-01
1 3 2012-10-15
1 9 2013-01-18
...
So I need to find the group of some user at a specific date. For example, the group of the user 1 at date 2012-11-15 (15 november 2012) should return the most recent group, which is 2 and 4 (many group at the same time) at date 2012-11-01 (the closest and smaller date).
Normally, I could do a Select where date <= chosen date order by date desc, etc... but that's not the point because if I have 1000 users, it will need 1000 requests to have all the result.
So here are some question:
I have already using the php method to loop through the array to avoid the high number of mysql request, but it's still not good because the array size may be 10000+. Using a foreach (or for?) is quite costly.
So my question is if given an array, ordered by date (desc or asc), what's the fastest way to find the closest index of the element which contain a date smaller (or greater) than a given date; beside using a for or foreach loop to loop through each element.
If there is no solution for the first question, then what kind of data structure would you suggest for this kind of problem.
Note: the date is in mysql format, it's not converted in timestamp when you stored it in an array
EDIT: this is a sql fiddle http://sqlfiddle.com/#!2/dc28d/1
For dos_id = 6, t="2012-11-01" it should returns only 2 and 5 at date "2010-12-10 13:16:58"
Not sure why you'd want to do this in php. Here's some SQL using joins instead to get most recent group(s) for all users given a date. Make sure you've got indexes on date and userid.
SELECT *
FROM test t1
LEFT JOIN test t2
ON t1.userid = t2.userid AND t2.thedate <= '2012-11-15' AND t2.thedate > t1.thedate
WHERE t1.thedate <= '2012-11-15' AND t2.userid IS NULL;
SQLfiddle
Or using your SQLFiddle
SELECT t1.*
FROM dossier_dans_groupe t1
LEFT JOIN dossier_dans_groupe t2
ON t1.dos_id = t2.dos_id AND t2.updated_at <= '2012-11-01'
AND t2.updated_at > t1.updated_at
WHERE t1.updated_at <= '2012-11-01' AND t2.dos_id IS NULL;
This would give you a list of all users and their groups (1 row per group) for the latest date that is smaller than the one you specify (2012-11-15 below).
SELECT user_id, group_id, date FROM table WHERE date <= '2012-11-15' AND NOT EXISTS (SELECT 1 FROM table test WHERE test.user_id = table.user_id AND test.date > table.date and test.date <= '2012-11-15')

how to show one record per day order by id?

I have this little script that shows one wisdom each day.
so I have three columns.
Id wisdom timestamp
1 wisdon 1 4/1/2012
2 wisdon 2 4/1/2012
3 wisdon 3 4/2/2012
and I want to fetch array of one wisdom for each day
I looked around your website, but unfortunately I didn't find something similar to what I want.
also I got this code
$sql = mysql_query("SELECT DISTINCT id FROM day_table group by timestamp");
but this also not working.
any ideas?
is it possible to make a counter of 24 hours update wisdom date?
please give me some help.
You can make another table that is called wisdom_of_day
The table would have the following columns, id, wisdom_id, date
Basically each day you can randomly select a wisdom from your wisdom table and insert it into the wisdom day table. You can also add a constraint to your date column so it is distinct. It is important that it is a date column and not a timestamp since you don't care about time.
Then you can retrieve the wisdom of the day by querying based on the date.
It's possible I read your question wrong and you just want to select one wisdom for each day, but you want to show multiple days and you want to get the data from your table.
If so, the reason your query is not working is because you are grouping by a timestamp which includes the date and time. You need to group it by date for it to group like you want.
Here is a query that will group by the day correctly. This will only work if you have a timestamp field and are not storing a unix timstamp on an int column.
select id, wisdom, date(timestamp) date_only from day_table group by date_only order by date_only asc;
Hmm, I noticed that your timestamp values are in some kind of date format, maybe as a string? If so the above query probably won't work.
First compute number of days since 1970
SELECT DATEDIFF(CURDATE(), '1970-01-01')
Then insert this number inside RAND, for example:
SELECT * FROM table ORDER BY RAND(15767) LIMIT 1;
Rand with number as argument is deterministic.
Full query:
SELECT * FROM table ORDER BY RAND((SELECT DATEDIFF(CURDATE(), '1970-01-01'))) LIMIT 1;

PHP/MYSQL datetime ranges overlapping for users

please I need help with this (for better understanding please see attached image) because I am completely helpless.
As you can see I have users and they store their starting and ending datetimes in my DB as YYYY-mm-dd H:i:s. Now I need to find out overlaps for all users according to the most frequent time range overlaps (for most users). I would like to get 3 most frequented datatime overlaps for most users. How can I do it?
I have no idea which mysql query should I use or maybe it would be better to select all datetimes (start and end) from database and process it in php (but how?). As stated on image results should be for example time 8.30 - 10.00 is result for users A+B+C+D.
Table structure:
UserID | Start datetime | End datetime
--------------------------------------
A | 2012-04-03 4:00:00 | 2012-04-03 10:00:00
A | 2012-04-03 16:00:00 | 2012-04-03 20:00:00
B | 2012-04-03 8:30:00 | 2012-04-03 14:00:00
B | 2012-04-06 21:30:00 | 2012-04-06 23:00:00
C | 2012-04-03 12:00:00 | 2012-04-03 13:00:00
D | 2012-04-01 01:00:01 | 2012-04-05 12:00:59
E | 2012-04-03 8:30:00 | 2012-04-03 11:00:00
E | 2012-04-03 21:00:00 | 2012-04-03 23:00:00
What you effectively have is a collection of sets and want to determine if any of them have non-zero intersections. This is the exact question one asks when trying to find all the ancestors of a node in a nested set.
We can prove that for every overlap, at least one time window will have a start time that falls within all other overlapping time windows. Using this tidbit, we don't need to actually construct artificial timeslots in the day. Simply take a start time and see if it intersects any of the other time windows and then just count up the number of intersections.
So what's the query?
/*SELECT*/
SELECT DISTINCT
MAX(overlapping_windows.start_time) AS overlap_start_time,
MIN(overlapping_windows.end_time) AS overlap_end_time ,
(COUNT(overlapping_windows.id) - 1) AS num_overlaps
FROM user_times AS windows
INNER JOIN user_times AS overlapping_windows
ON windows.start_time BETWEEN overlapping_windows.start_time AND overlapping_windows.end_time
GROUP BY windows.id
ORDER BY num_overlaps DESC;
Depending on your table size and how often you plan on running this query, it might be worthwhile to drop a spatial index on it (see below).
UPDATE
If your running this query often, you'll need to use a spatial index. Because of range based traversal (ie. does start_time fall in between the range of start/end), a BTREE index will not do anything for you. IT HAS TO BE SPATIAL.
ALTER TABLE user_times ADD COLUMN time_windows GEOMETRY NOT NULL DEFAULT 0;
UPDATE user_times SET time_windows = GeomFromText(CONCAT('LineString( -1 ', start_time, ', 1 ', end_time, ')'));
CREATE SPATIAL INDEX time_window ON user_times (time_window);
Then you can update the ON clause in the above query to read
ON MBRWithin( Point(0,windows.start_time), overlapping_windows.time_window )
This will get you an indexed traversal for the query. Again only do this if your planning on running the query often.
Credit for the spatial index to Quassoni's blog.
Something like this should get you started -
SELECT slots.time_slot, COUNT(*) AS num_users, GROUP_CONCAT(DISTINCT user_bookings.user_id ORDER BY user_bookings.user_id) AS user_list
FROM (
SELECT CURRENT_DATE + INTERVAL ((id-1)*30) MINUTE AS time_slot
FROM dummy
WHERE id BETWEEN 1 AND 48
) AS slots
LEFT JOIN user_bookings
ON slots.time_slot BETWEEN `user_bookings`.`start` AND `user_bookings`.`end`
GROUP BY slots.time_slot
ORDER BY num_users DESC
The idea is to create a derived table that consists of time slots for the day. In this example I have used dummy (which can be any table with an AI id that is contiguous for the required set) to create a list of timeslots by adding 30mins incrementally. The result of this is then joined to bookings to be able to count the number of books for each time slot.
UPDATE For entire date/time range you could use a query like this to get the other data required -
SELECT MIN(`start`) AS `min_start`, MAX(`end`) AS `max_end`, DATEDIFF(MAX(`end`), MIN(`start`)) + 1 AS `num_days`
FROM user_bookings
These values can then be substituted into the original query or the two can be combined -
SELECT slots.time_slot, COUNT(*) AS num_users, GROUP_CONCAT(DISTINCT user_bookings.user_id ORDER BY user_bookings.user_id) AS user_list
FROM (
SELECT DATE(tmp.min_start) + INTERVAL ((id-1)*30) MINUTE AS time_slot
FROM dummy
INNER JOIN (
SELECT MIN(`start`) AS `min_start`, MAX(`end`) AS `max_end`, DATEDIFF(MAX(`end`), MIN(`start`)) + 1 AS `num_days`
FROM user_bookings
) AS tmp
WHERE dummy.id BETWEEN 1 AND (48 * tmp.num_days)
) AS slots
LEFT JOIN user_bookings
ON slots.time_slot BETWEEN `user_bookings`.`start` AND `user_bookings`.`end`
GROUP BY slots.time_slot
ORDER BY num_users DESC
EDIT I have added DISTINCT and ORDER BY clauses in the GROUP_CONCAT() in response to your last query.
Please note that you will will need a much greater range of ids in the dummy table. I have not tested this query so it may have syntax errors.
I would not do much in SQL, this is so much simpler in a programming language, SQL is not made for something like this.
Of course, it's just sensible to break the day down into "timeslots" - this is statistics. But as soon as you start handling dates over the 00:00 border, things start to get icky when you use joins and inner selects. Especially with MySQL which does not quite like inner selects.
Here's a possible SQL query
SELECT count(*) FROM `times`
WHERE
( DATEDIFF(`Start`,`End`) = 0 AND
TIME(`Start`) < TIME('$SLOT_HIGH') AND
TIME(`End`) > TIME('$SLOT_LOW'))
OR
( DATEDIFF(`Start`,`End`) > 0 AND
TIME(`Start`) < TIME('$SLOT_HIGH') OR
TIME(`End`) > TIME('$SLOT_LOW')
Here's some pseudo code
granularity = 30*60; // 30 minutes
numslots = 24*60*60 / granularity;
stats = CreateArray(numslots);
for i=0, i < numslots, i++ do
stats[i] = GetCountFromSQL(i*granularity, (i+1)*granularity); // low, high
end
Yes, that makes numslots queries, but no joins no nothing, hence it should be quite fast. Also you can easily change the resolution.
And another positive thing is, you could "ask yourself", "I have two possible timeslots, and I need the one where more people are here, which one should I use?" and just run the query twice with respective ranges and you are not stuck with predefined time slots.
To only find full overlaps (an entry only counts if it covers the full slot) you have to switch low and high ranges in the query.
You might have noticed that I do not add times between entries that could span multiple days, however, adding a whole day, will just increase all slots by one, making that quite useless.
You could however add them by selecting sum(DAY(End) - DAY(Start)) and just add the return value to all slots.
Table seems pretty simple. I would keep your SQL query pretty simple:
SELECT * FROM tablename
Then when you have the info saved in your PHP object. Do the processing with PHP using loops and comparisons.
In simplest form:
for($x, $numrows = mysql_num_rows($query); $x < $numrows; $x++){
/*Grab a row*/
$row = mysql_fetch_assoc($query);
/*store userID, START, END*/
$userID = $row['userID'];
$start = $row['START'];
$end = $row['END'];
/*Have an array for each user in which you store start and end times*/
if(!strcmp($userID, "A")
{
/*Store info in array_a*/
}
else if(!strcmp($userID, "B")
{
/*etc......*/
}
}
/*Now you have an array for each user with their start/stop times*/
/*Do your loops and comparisons to find common time slots. */
/*Also, use strtotime() to switch date/time entries into comparable values*/
Of course this is in very basic form. You'll probably want to do one loop through the array to first get all of the userIDs before you compare them in the loop shown above.

Finding empty time blocks between two dates?

I have 2 MySQL tables, 'scheduled_time' and 'appointments'
'scheduled_time' has 2 DateTime fields, 'start' and 'end' - this is a time range of when I am available for appointments.
'appointments' contains appointment details but also a 'start' and 'end' field, this will ultimately be within the range specified in 'scheduled_time'.
What is the best way for me to find empty time blocks when taking into account both tables?
Lets say I have 'scheduled_time' starting 11/9/2010 from 8am to 2pm. and I have one 'appointment' from 8am to 10am and one from 1pm to 2pm. How can I find the next available block of say 1 hour?
I did this a while ago. We had a similar structure:
Available (contained all working hours for an employee, flexible working hours)
Appointments (similar to yours)
What I did was basically this (steps):
Get all start and end datetimes for employee < x >, sorted by startdate
let startAvailable = start of the time search (in your case 11/9/2010 # 8am)
let appointment = first appointment in the list of appointments
get the startdate of the first appointment. If the difference between these is big enough, there's your block
if not, let startAvailable = enddate of appointment
remove appointment from the list, let appointment be the next appointment
repeat the process of checking for an available block
First, create a number table. Here as example named "Numbers" with a column "number".
Then you can do something like
select chour as [Free Hour] from Numbers n
inner join Scheduled s on n.chour >= s.start and n.chour < s.[end]
where chour not in
(
select chour from Numbers n
inner join Appointments a ON n.chour >= q.start and n.chour < a.[end]
)
Numbers is my Numbers table, chour is a computed column defined as
DATE_ADD('2010-11-08', INTERVAL number HOUR)
You can also store it as a persisted column of course.
Sorry, if the syntax isn't completely right, I do T-SQL normally :-)
Edit: This technique only works for fixed blocks of time (hour is just an example, you could do half hours just as easily), but it's quite efficient and readable in this case. A usual application in business is mapping dates, because that's the granularity where most contracts live and you can cover a lot of days with a small number table.
This is a great place to use a dummy integer table, which helps you to create data from nothing. Here is an example:
create table ints(i tinyint);
insert into ints values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
You can then use a few cross joins with this table to generate a table of hour-long windows represented by your scheduled_time table, and left join the result against the appointments table to find windows that do not have something already scheduled.
SELECT
h.HourWindowStart, h.HourWindowEnd
FROM (
SELECT
s.start + INTERVAL t.i*100 + u.i*10 + v.i HOUR AS HourWindowStart,
s.start + INTERVAL t.i*100 + u.i*10 + v.i + 1 HOUR AS HourWindowEnd
FROM scheduled_time s
JOIN ints AS t
JOIN ints AS u
JOIN ints AS v
WHERE s.start + INTERVAL t.i*100 + u.i*10 + v.i HOUR < s.end
ORDER BY HourWindowStart
) as h
LEFT JOIN appointments a ON a.end > h.HourWindowStart AND a.start < h.HourWindowEnd
WHERE a.start IS NULL
You can tweak various parts of this process to calculate larger/smaller availability windows (by the half hour, by the day, etc), use more or less cross joins of the integer table based on the maximum number of availability windows that could be represented in a single start/end range in scheduled_time, pre-create a date calendar and join against the two tables, etc.

Categories