I'm trying to figure out an efficient query for a project I'm working on.
We're recording a switch state into a table, each time it changes, a row is added with the new value (0 or 1).
Here's a simplified structure of the table:
day | hour | state
-----+------+-------
10 | 1 | 1 # day 10
10 | 6 | 0
10 | 21 | 1
11 | 3 | 0 # day 11
11 | 6 | 1
13 | 13 | 0 # day 13
....
Now we need to make a daily overview, something like this:
Day 11 : Switch was on during 0-3, 6-24
SELECT * FROM log WHERE day = 11 will give us only [3,0] and [6,1]. From those we can guess that it started ON and ended ON, but how about day 12?
SELECT * FROM log WHERE day = 12 gives nothing, obviously - there's no clue to guess from.
What is an efficient and reliable way to get the starting and ending state for a given day? Something like "Select one entry before day 12 and one after day 12"?
SELECT
day,
hour,
state
FROM
log
WHERE
day*100+hour
BETWEEN
(SELECT max(day*100+hour) FROM log WHERE day < 12)
AND
(SELECT min(day*100+hour) FROM log WHERE day > 12)
Will give you everything between (including) the last entry before day 12 and the first entry after day 12.
The second part might be unnecessary if you don't need to know when the state changed, and it's enough to know the state didn't change until at least midnight of the selected day.
Related
I have my php mysql currently that get the entire months sales and groups it by days. I am now trying to take that further and separate am vs pm sales. The AM shift is 10am-7pm and PM shift is 7pm-2am. I Know I can group by day then by hour and iterate through and get the am that way but I am sure their is a better way directly in sql.
Thanks for any insight.
SELECT DATE(a_tabs.strDate - INTERVAL 16 HOUR) as day ,
DATE_FORMAT(a_tabs.strDate, '%h') AS hour ,
sum(a_invoices.Total) as total
FROM a_tabs
Right JOIN a_invoices on a_tabs.TabId = a_invoices.TabId
WHERE a_tabs.strDate BETWEEN '2022-03-01 09:00:00' and '2022-03-31 18:00:00'
AND a_invoices.status='c'
and a_tabs.status<>'v'
GROUP BY day , hour
result from this query
So in a given day you have 3 periods:
02:00 to 10:00 [8h, no shift]
10:00 to 19:00 [9h, AM shift]
19:00 to 02:00 [7h, PM shift]
But the trouble with a naieve solution is that the PM shift crosses over the date boundary.
Assuming a simplified table like:
CREATE TABLE sales (
id INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY,
rep_id INTEGER UNSIGNED,
dt DATETIME,
amount INTEGER
);
We can correctly align the shifts with the date boundary with dt - INTERVAL 2 HOUR and then something like:
SELECT
DATE(dt - INTERVAL 2 HOUR) 'day',
IF( HOUR(dt - INTERVAL 2 HOUR) BETWEEN 0 AND 7, 'UN',
IF( HOUR(dt - INTERVAL 2 HOUR) BETWEEN 8 AND 16, 'AM', 'PM' )
) AS 'shift',
SUM(amount) AS 'sales'
FROM sales
GROUP BY day, shift;
Sample data omitted for brevity, see on sqlfiddle: http://sqlfiddle.com/#!9/d4fbcb/22/0, but for a sale every 15 minutes on the dot, beginning at 2022-01-01 00:00:00:
| day | shift | sales |
|------------|-------|-------|
| 2021-12-31 | PM | 8 |
| 2022-01-01 | AM | 36 |
| 2022-01-01 | PM | 28 |
| 2022-01-01 | UN | 32 |
| 2022-01-02 | AM | 36 |
| 2022-01-02 | PM | 28 |
| 2022-01-02 | UN | 32 |
You can see that it correctly assigns the sales between 00:00 and 02:00 to the previous day's PM shift, and sales outside the defined shifts as "UN" for undefined.
However, with regard to maintainability, extensibility, and performance: I would not really recommend this approach of calculating the shift duringreport generation at all.
Maintainability: At some point in the future the shift boundary changes, now this query is returning incorrect shift data going forward. The naieve fix is to just change the hours in the query, but now it returns incorrect results for past data.
Extensibilty: At some point in the future a "cover" shift is added for 4PM to 10PM to account for demand. It is not possible to compute using a query like this anymore.
Performance: All of those dt - INTERVAL 2 HOUR and IF() statements add overhead and make it difficult or impossible to use indexes depending on what the requirements off your query are.
What I would suggest is making the "shift" into metadata that is associated with the sale record and calculated at insert time. Depending on your particular requirements, it might just be a string in the sale record, eg: 20220101_AM, or a foreign key relation into more robust schema.
Given both your group by clauses are related to the time. Start by shifting the date so that AM is truly AM and PM is truly PM.
a_tabs.strDate - INTERVAL 7 HOUR
(7 chosen as 7pm end of AM shift).
Use UNIX_TIMESTAMP to get this down to a second value (hours would be better but there isn't a function for that). And then div by a 12 hr interval.
So
SELECT ...
GROUP BY UNIX_TIMESTAMP(a_tabs.strDate - INTERVAL 7 HOUR) DIV (60*60*12)
Here is my query - it mostly works, but I can see it failing on one condition - explained after the query:
$firstDay = '2020-03-01' ;
$lastDay = '2020-03-31' ;
SELECT * FROM clubEventsCal
WHERE ceFreq!=1
AND (ceDate>='$firstDay' AND ceDate<='$lastDay')
UNION SELECT * FROM clubEventsCal
WHERE ceFreq=1
AND (ceDate>='$firstDay' AND ceDate<='$lastDay')
GROUP BY ceStopDate ORDER BY ceID,ceDate ;
The first select gives me all Event records between the two dates. The second select gives me grouped/summarized Event records between the two dates. The problem though is if the value ceDate spans days across two months: IE: 2020-03-30 thru 2020-04-02. When I pull the records for March, all is good - the above query pulls the 2020-03-30 record (grouped) as the first instance of the 4 days/records - allowing us to charge for a single 4 day event. But when I pull the records for April its also going to pull 2020-04-01 as a new grouped Event record for the last two days of the 4 day event and try to charge the customer for a new Event - when in fact those two days were already a part of March's bill.
How can I write the query so that when ceDate starts in Month X but ends in Month Y that when records are pulled for Month Y its not trying to pull records that actually belong to an Event that started in Month X?
Examples of an Event record would look like this:
rid | ceID | ceActive | ceFreq | ceDate | ceStopDate
------------------------------------------------
1 1108 1 3 2020-03-09 | 2020-03-09
2 1111 1 2 2020-03-15 | 2020-03-15
3 1112 1 2 2020-03-17 | 2020-03-17
4 1117 1 1 2020-03-30 | 2020-04-02
5 1117 1 1 2020-03-31 | 2020-04-02
6 1106 1 3 2020-03-21 | 2020-03-21
7 1110 1 2 2020-03-05 | 2020-03-05
8 1113 1 2 2020-03-24 | 2020-03-24
9 1117 1 1 2020-04-01 | 2020-04-02
10 1117 1 1 2020-04-02 | 2020-04-02
The above query pulls all records where ceFreq != 1, and it pulls a single record for the ceFreq = 1 records (rids: 4 & 5). For March, we don't necessarily care that ceID 1117 spills into April. But when we pull records for April - we need to exclude rid 9 & 10, because the Event (ceID=1117), was already accounted for in March.
SELECT * FROM clubEventsCal
...
GROUP BY ceStopDate
This is gibberish.
MySQL (depending on configuration) allows it without choking - but it's semantically wrong and stands out as an anti-pattern.
There are some edge cases where the values returned might contain significant data, but they very unusual. Trying to explain a problem with code which does not work is perhaps not a good strategy.
Looking at your code, its possible that you don't need a union - but there's not enough information in your example records to say if this would actually give the result you expect (it will be significantly faster depending on your indexes):
SELECT IF(cefreq=1, rid, null) AS consolidator
, ceid
, cefreq
, MIN(cedate), MAX(cedate)
, ceStopDate
FROM clubEventsCal
WHERE cID=1001
AND ceActive!=2
AND (ceDate>='$firstDay' AND ceDate<='$lastDay')
GROUP BY IF(cefreq=1, rid, null)
, ceid
, cefreq
, ceStopDate
;
I would have added the ORDER BY - but I don't know where clId came from. Also This will give different resuts to what I think you were trying to achieve for any record where cefreq is null (if you really do want to exclude them, add a predicate in the WHERE clause).
I have a requirement based on dates, where I need to show the count for the staff on the particular date range, Sundays & government holidays should not be counted.
------------------------------------------
|Staff name| less | Full | more | Absent |
------------------------------------------
| name 1 | 1 | 3 | 1 | 1 |
------------------------------------------
| name 2 | 2 | 2 | 2 | 0 |
------------------------------------------
This is my requirement:
Sundays should not be counted
Saturdays will have 4 hours of effort
Week days will have 8 hours of effort
But staff can enter lower or higher than the fixed one.
Here in the table if the staff has worked for 7 hours will be counted +1 in less column, if he works more than 8 hours will be counted +1 in more, if he works exactly 8 hours will be counted +1 in full column, if he absent on the particular date +1 will be counted in absent column.
For Saturday: same as the above but 8 hours will be considered as 4 hours.'
Sunday should not be counted in anyways.
For this requirement do I need to have a new MySQL table consists of all the 365 days of the year with Sundays & govt holidays with in active status??
or we can do it without having table? if so how to do it, pls explain.
Thanks in advance..
$day_of_week = date("D");
That will show you the current day of the week so that will help you know if it is "Sat" or "Sun".
as for the govt holidays, you will need to store those in a table. Probably the easiest way is to have day, month, year. Then you can find out the current day like this:
$mon = date("m");
$day = date("d");
$year = date("Y");
Just use those variables to construct a query and make sure the current date isnt a holiday:
$sql = "SELECT * FROM holidays WHERE day='".$day."' AND month='".$month."' AND year='".$year."'";
....execute query
I'm writing a top 10 polling system. Pollsters vote weekly on their top 10. How should I store their poll for each week? That is, how do I control what week the poll is in storage (mySQL) or in my PHP (5.x+) calculations?
(System #1) I've previously done this by having a file "week.txt" on the server that I set at 0 and then ran a cron job weekly to update +1. When I'm storing the poll data in the database, I'd just load the file and know what week it was. I'm looking for something more elegant.
The system must:
Be able to start at any time of the year.
Be able to skip weeks.
Not require shuffling of week numbers during calculations.
Be maintenance free by a human, other than a 1-off event (like saying "this is the start date", "this is the end date", once in a blue moon).
Use PHP, mySQL, file or other "standard" server items (except other programming languages or databases).
Not require other software (e.g. "Install Software X, it does this!").
Pollsters are probably non-technical people, so asking them anything other than "Enter your top 10" or "edit your top 10" is not allowed.
Be able to go over the end of the calendar year smoothly (e.g. Start in November and end in March).
Other Information:
Pollsters will only be allowed to vote on a single day.
I'll be running multiple polls at once that have no bearing on each other and thus may have different skip weeks.
My system I've used before won't work because in order to skip weeks, it would need interaction and violate #4 and otherwise can't skip weeks and thus violate #2.
I've thought of 2 systems but they have failures of parts of the above:
(System #2) Use PHP's date("W") when the pollster votes. Thus, the first week they all get week #48 (for example), second week #49, so it would be easily to tell which week is what. The problem is that some polls will go over the calendar year, thus I would end up with 48, 49, 50, 51, 52, 1, 2, 3, 4 and violate #3 above. Also, if we skipped weeks, we could end up with 48, 49, 50, 1, 2, 3 which violates #2 and #8 above.
(System #3) Then, I had the idea to just store the date they enter the poll. I would set a date to calculate from the week prior to the first poll, thus, it would just need to calculate the difference between weeks and I'd know the week number. But there's no easy way to skip weeks violating #2 unless we shuffle days which violates #3.
(System #4) I then had the idea that when a pollster first votes, we just record it as their week 1 vote. When they next vote, it's week 2, and so forth. If they wanted to edit their poll (the same day), they'd just use the edit button and we wouldn't record a new poll, because they'd have signaled it's an edit. The only problem is if a pollster forgets a week, meaning I'd have to go in and correct the data (add a blank week or change the week number they voted but violate #4). This handles the skip weeks just fine. Maybe a cron job would solve this? If someone forgot, a cron job that runs after the poll closes would enter in a blank week. Could be programmed to see the max week number entered, if any userid didn't have that week number, just enter in blank data.
If you can adapt any system above to meet all the criteria, that would be fine as well. I'm looking for a simple and elegant and hands-free solution.
Please ask for any other clarifying information.
When working with week numbers, you should keep in mind that 01.01.2012 is in week 52 (not 1). The question is if you want your polls to be fixed on calendar weeks, or 7-day-offsets from the poll-start-date. Consider your poll started on a friday and ended exactly 7 days after. You'd be crossing the calendar week barrier and thus have 2 "weeks" your users may vote.
I'd probably prefer the offset-approach, as strict calendar binding is usually not helpful anyways. Do you want to answer the question "what are the votes in calendar week 34" or "what are the votes in the third week of polling"?
Calculating the offset is quite simple:
// 0-based
$week_offset = floor(time() - strtotime("2011-11-02") / 7);
I don't know your polling algorithm. I'll just demonstrate with a weighted poll (1-3 stars, 3 being best):
| poll_id | user_id | week_offset | vote |
| 7 | 3 | 0 | 1 |
| 7 | 4 | 0 | 3 |
| 7 | 5 | 0 | 2 |
| 7 | 3 | 1 | 2 |
| 7 | 4 | 1 | 2 |
| 7 | 5 | 2 | 3 |
| 7 | 5 | 5 | 1 |
Running a query like
SELECT
poll_id,
week_offset,
SUM(vote) as `value`,
COUNT(user_id) as `count`,
AVG(vote) as `average`
FROM votes_table
WHERE poll_id = 7
GROUP BY poll_id, week_offset
ORDER BY poll_id, week_offset;
would give you something like
| poll_id | week_offset | value | count | average |
| 7 | 0 | 6 | 3 | 2 |
| 7 | 1 | 4 | 2 | 2 |
| 7 | 2 | 3 | 1 | 3 |
| 7 | 5 | 1 | 1 | 1 |
By now you'll probably have noticed the gap 0, 1, 2, [3], [4], 5.
When grabbing that data from MySQL you have to iterate the results anyways. So where's the problem extending that loop for a gap-filler?
<?php
// your database accessor of heart (mine is PDO)
$query = $pdo->query($above_statement);
$results = array();
$previous_offset = 0;
foreach ($query as $row) {
// calculate offset distance
$diff = $row['week_offset'] - $previous_offset;
// make sure we start at 0 offset
if ($previous_offset === 0 && $row['week_offset'] > 0) {
$diff++;
}
// if distance is greater than a single step, fill the gaps
for (; $diff > 1; $i--) {
$results[] = array(
'value' => 0,
'count' => 0,
'average' => 0,
);
}
// add data from db
$results[] = array(
'value' => $row['value'],
'count' => $row['count'],
'average' => $row['average'],
);
// remember where we were
$previous_offset = $row['week_offset'];
}
// 0 based list of voting weeks, enjoy
var_dump($results);
You might also be able to do the above right in MySQL using a function.
I need this for rent a car price calculation. Cars prices are different according to seasons.
I have a season_dates table like this
id slug start end
1 low 2011-01-01 00:00:00 2011-04-30 00:00:00
2 mid 2011-05-01 00:00:00 2011-06-30 00:00:00
3 high 2011-07-01 00:00:00 2011-08-31 00:00:00
4 mid 2011-09-01 00:00:00 2011-10-31 00:00:00
5 low 2011-11-01 00:00:00 2011-12-31 00:00:00
Users selecting days, for example:
start_day 08/20 end_day 08/25
My query like that:
SELECT * from arac_donemler
where DATE_FORMAT(start, '%m/%d') <= '08/20'
and DATE_FORMAT(end, '%m/%d') >= '08/25'
This gives me high season that's correct.
But what I couldn't handle is: what if user selects a date range between 2 seasons?
For example from 20 August to 05 September.
This time I have to find that date ranges belongs to which seasons?
And I have to calculate how many days per each seasons?
For the example above,
high season ending at 31 August. So 31-20 = 11 days for high season, 5 days for mid season.
How can I provide this separation?
I hope I could explain it.
I tried so many things like join table inside but couldn't succeed it.
I'll let others chime in with the right way to do date comparisons in SQL (yours almost certainly kills indexing for the table), but for a start, you can get exactly the seasons that are relevant by
select * from arac_donemler
where end >= [arrival-date]
and start <= [departure-date]
Then you should do the rest of your processing (figure out how many days in each season and so forth) in the business logic instead of in the database query.
I would store all single days within a table.
This is a simple example.
create table dates (
id int not null auto_increment primary key,
pday date,
slug tinyint,
price int);
insert into dates (pday,slug,price)
values
('2011-01-01',1,10),
('2011-01-02',1,10),
('2011-01-03',2,20),
('2011-01-04',2,20),
('2011-01-05',2,20),
('2011-01-06',3,30),
('2011-01-07',3,30),
('2011-01-08',3,30);
select
concat(min(pday),'/',max(pday)) as period,
count(*) as days,
sum(price) as price_per_period
from dates
where pday between '2011-01-02' and '2011-01-07'
group by slug
+-----------------------+------+------------------+
| period | days | price_per_period |
+-----------------------+------+------------------+
| 2011-01-02/2011-01-02 | 1 | 10 |
| 2011-01-03/2011-01-05 | 3 | 60 |
| 2011-01-06/2011-01-07 | 2 | 60 |
+-----------------------+------+------------------+
3 rows in set (0.00 sec)
EDIT. Version with grandtotal
select
case
when slug is null then 'Total' else concat(min(pday),'/',max(pday)) end as period,
count(*) as days,
sum(price) as price_per_period
from dates
where pday between '2011-01-02' and '2011-01-07'
group by slug
with rollup;
+-----------------------+------+------------------+
| period | days | price_per_period |
+-----------------------+------+------------------+
| 2011-01-02/2011-01-02 | 1 | 10 |
| 2011-01-03/2011-01-05 | 3 | 60 |
| 2011-01-06/2011-01-07 | 2 | 60 |
| Total | 6 | 130 |
+-----------------------+------+------------------+
4 rows in set (0.00 sec)
edit. Stored procedure to populate table
delimiter $$
create procedure calendario(in anno int)
begin
declare i,ultimo int;
declare miadata date;
set i = 0;
select dayofyear(concat(anno,'-12-31')) into ultimo;
while i < ultimo do
select concat(anno,'-01-01') + interval i day into miadata;
insert into dates (pday) values (miadata);
set i = i + 1;
end while;
end $$
delimiter ;
call calendario(2011);
If you have a table RENTAL too (the real version would need a lot of other details in it):
CREATE TABLE Rental
(
start DATE NOT NULL,
end DATE NOT NULL
);
and you populate it with:
INSERT INTO rental VALUES('2011-08-20', '2011-09-05');
INSERT INTO rental VALUES('2011-08-20', '2011-08-25');
then this query produces a plausible result:
SELECT r.start AS r_start, r.end AS r_end,
s.start AS s_start, s.end AS s_end,
GREATEST(r.start, s.start) AS p_start,
LEAST(r.end, s.end) AS p_end,
DATEDIFF(LEAST(r.end, s.end), GREATEST(r.start, s.start)) + 1 AS days,
s.id, s.slug
FROM rental AS r
JOIN season_dates AS s ON r.start <= s.end AND r.end >= s.start;
It yields:
r_start r_end s_start s_end p_start p_end days id slug
2011-08-20 2011-09-05 2011-07-01 2011-08-31 2011-08-20 2011-08-31 12 3 high
2011-08-20 2011-09-05 2011-09-01 2011-10-31 2011-09-01 2011-09-05 5 4 mid
2011-08-20 2011-08-25 2011-07-01 2011-08-31 2011-08-20 2011-08-25 6 3 high
Note that I'm counting 12 days instead of 11; that's the +1 in the days expression. It gets tricky; you have to decide whether if the car is returned on the same day as it is rented, is that one day's rental? What if it is returned the next day? Maybe the time matters? But that gets into detailed business rules rather than general principles. Maybe the duration is the larger of the raw DATEDIFF() and 1? Also note that there is only the rental start and end dates in this schema to identify the rental; a real schema would have some sort of Rental Agreement Number in the rental table.
(Confession: simulated using IBM Informix 11.70.FC2 on MacOS X 10.7.1, but MySQL is documented as supporting LEAST, GREATEST, and DATEDIFF and I simulated those in Informix. The most noticeable difference might be that Informix has a DATE type without any time component, so there are no times needed or displayed.)
But [...] seasons period always same every year. So I thought to compare only days and months. 2011 isn't important. Next years just 2011 will be used. This time problem occurs. For example low season includes November, December and then go to January, February, March, April. If a user selects a date range 01.05.2011 to ...2011 There is no problem. I just compare month and day with DATE_FORMAT(end, '%m/%d'). But if he chooses a range from December to next year January, how am I gonna calculate days?
Notice that 5 entries per year in the Season_Dates table is not going to make an 8" floppy disk break sweat over storage capacity for a good few years, let alone a 500 GiB monster disk. So, by far the simplest thing is to define the entries for 2012 in 5 new rows in the Season_Dates table. That also allows you to handle the fact that in December, the powers-that-be decide the rules will be different (20th December to 4th January will be 'mid', not 'low' season, for example).