I'm writing a script using PHP & MySQL where I can record the shifts I work (HGV driver).
Upon posting the form data PHP calculates shift duration, wages accumulated, overtime, distance driven, etc, and stores it in the MySQL database.
I want to then display all shifts in a table but group them by my pay week which unfortunately starts on a Sunday.
If the pay week was Mon-Sun I wouldn't have this problem as I could use week numbers but I can't due to the week starting on a Sunday.
My code is as follows:
^^^^^^^^^^^^^^^^^^^
// DB Connection //
// Return the earliest shift in the database //
$result = $db->query("SELECT * FROM `shifts` ORDER BY `shift_start` ASC LIMIT 1");
$data = $result->fetch_assoc();
// Establish the previous Sunday //
$week_from = strtotime(date('Y-m-d',mktime(0,0,0,date('m',$data['shift_start']),date('d',$data['shift_start']),date('y',$data['shift_start']))) . 'last sunday');
// PHP Loop Goes Here //
Firstly, is the above code the most efficient way of getting the start date (previous Sunday)?
Secondly, what's the best way to loop through the weeks where there are shifts?
TIA
This is a two part question, so I will try to cover them separately.
Regarding your first question, I would suggest using the MIN() function when selecting the smallest or earliest value in a database, and ensuring you have an index on the "shift_start" column. More information on the difference between MIN() and ORDER BY/LIMIT can be found here.
Then your query would look a something like this:
SELECT MIN(`shift_start`) FROM `shifts`;
Personally, I also find MIN() far more readable.
Now, for the other (and far more complicated) question:
You've not provided much detail on what your database (or the contents) looks like. Since you're using the PHP date function, I am assuming you're saving the timestamps as UNIX instead of MySQL TIMESTAMP/DATETIME types.
Firstly, I would suggest you migrate to using a TIMESTAMP/DATETIME column type. It'll simplify the query you're attempting to run.
If you're unable to change to a TIMESTAMP/DATETIME column, then you can convert a UNIX timestamp to a DATETIME.
MySQL has a YEARWEEK() function that you can use to group by:
SELECT STR_TO_DATE(CONCAT(YEARWEEK(`shift_start`), ' Monday'), '%X%V %W') AS `date`, SUM(`wage`) AS `wage` FROM `shifts` GROUP BY YEARWEEK(`shift_start`);
This will output something similar to:
+------------+------+
| Date | Wage |
+------------+------+
| 2021-11-29 | 50 |
| 2021-12-06 | 15 |
+------------+------+
I'm working on a project written in Laravel 5.2 that has two tables that need to be queried to produce a summary for the amount of records created by year. Here is a simplified layout of the schema with some sample data:
Matters Table
id created_at
-----------------
1 2016-01-05 10:00:00
2 2016-03-09 11:00:00
3 2017-01-03 10:00:00
4 2015-05-06 11:00:00
Notes Table
id created_at
-----------------
1 2015-07-08 10:00:00
2 2016-03-16 11:00:00
3 2017-09-03 10:00:00
4 2017-11-06 11:00:00
Each table has several hundred thousand records, so I'd like to be able to (efficiently) query my data to produce the following results with the counts of each table by year:
year matters notes
----------------------------
2015 1 1
2016 2 1
2017 1 2
I need each column to be sortable. Currently, the fastest way I can think of to do this is to have two queries like the following and then combine the results of the two via PHP:
SELECT YEAR(matters.created_at) AS 'year', COUNT(1) AS 'matters'
FROM matters
GROUP BY YEAR(matters.created_at)
SELECT YEAR(notes.created_at) AS 'year', COUNT(1) AS 'notes'
FROM notes
GROUP BY YEAR(notes.created_at)
But I'm wondering if there is a better way, especially since I have to work in sorting each column based on the user's needs.
Any thoughts?
The year bit could be used to create a JOIN between the tables (and their respective result sets) to produce the combined output that can further be sorted by any field of choice:
SELECT t1.year_rec, IFNULL(t1.matters, 0) AS matters, IFNULL(t2.notes, 0) AS notes
FROM
(
SELECT DATE_FORMAT(created_at, "%Y" ) AS year_rec, COUNT(id) AS matters
FROM matters
GROUP BY year_rec
) t1
LEFT JOIN
(
SELECT DATE_FORMAT(created_at, "%Y" ) AS year_rec, COUNT(id) AS notes
FROM notes
GROUP BY year_rec
) t2
ON t1.year_rec = t2.year_rec
-- ORDER BY --
Demo
Caveat:
Keeping in mind the nature of JOINs and that MySQL doesn't provide a direct means to write FULL OUTER JOIN, you'd notice that if a certain year doesn't have any records in the first table participating in the LEFT JOIN, then that year will be omitted from the final output produced. If we changed the query to have a RIGHT JOIN, then the omission will be carried out based on the second table instead. However, if that is not going to happen in your case (i.e. if you think there will always be records for every year in both the tables), you needn't worry about this situation, but it'll still be good to be aware of this loophole.
For further reading: Full Outer Join in MySQL
I'm querying a postgresql database which holds an agenda-table:
agenda |> id (int) | start (timestamp) | end (timestamp) | facname | .....
I want to make a kind of summary of one day in the form of a 'timeline' consisting of a small picture for every 15 minutes interval: on / off according to the availability of the facility.
Now is it relatively simple to query the database for every 15 minutes and check if a reservation is present and change the img source.
But if you want to make an overview of 10 days and 5 different facilities you'll end up querying the database
10(days) * 36(quaters a day) * 5 (facilities) = 1800 database querys/page load.
So this results in a very heavy pay load.
Is there a way I can reduce the amount of queries and so the payload?
To solve this issue, I think we may first find a way to, given a timestamp, find in which quarter of an hour it belongs to. For instance, the hour 08:38 belongs to quarter 08:30, the 08:51 to 08:45, and so on.
To do that, we can use a function like this:
CREATE FUNCTION date_trunc_quarter(timestamp )
RETURNS TIMESTAMP
LANGUAGE SQL
IMMUTABLE
AS $$
SELECT * FROM
generate_series(
date_trunc('hour',$1),
date_trunc('hour',$1)+interval '1hour',
interval '15min'
) AS gen(quarter)
WHERE gen.quarter < $1
ORDER BY gen.quarter
DESC LIMIT 1
$$;
It uses the generate_series function to generate all the four quarters (e.g. 08:00, 08:15, 08:30 and 08:45) within the same hour as the given timestamp (e.g. 08:38), do get the given hour it uses the well-known date_trunc function. Then, it filters only the quarters which is smaller then the given timestamp, sort it and get the bigger one. As it is always only four values at most, sorting it is not a big issue.
Now, with that you can easily query like this:
SELECT date_trunc_quarter(tstart) AS quarter, count(*)
FROM agenda
GROUP BY quarter
ORDER BY quarter;
I think it is fast enough, and to make it even faster, you can create an expression index on agenda:
CREATE INDEX idx_agenda_quarter ON agenda ((date_trunc_quarter(tstart)));
See this fiddle with a self-contained test case of it all.
I have to make some statistics for my application, so I need an algorithm with a performance as best as possible. I have some several question.
I have a data structure like this in the mysql database:
user_id group_id date
1 5 2012-11-20
1 2 2012-11-01
1 4 2012-11-01
1 3 2012-10-15
1 9 2013-01-18
...
So I need to find the group of some user at a specific date. For example, the group of the user 1 at date 2012-11-15 (15 november 2012) should return the most recent group, which is 2 and 4 (many group at the same time) at date 2012-11-01 (the closest and smaller date).
Normally, I could do a Select where date <= chosen date order by date desc, etc... but that's not the point because if I have 1000 users, it will need 1000 requests to have all the result.
So here are some question:
I have already using the php method to loop through the array to avoid the high number of mysql request, but it's still not good because the array size may be 10000+. Using a foreach (or for?) is quite costly.
So my question is if given an array, ordered by date (desc or asc), what's the fastest way to find the closest index of the element which contain a date smaller (or greater) than a given date; beside using a for or foreach loop to loop through each element.
If there is no solution for the first question, then what kind of data structure would you suggest for this kind of problem.
Note: the date is in mysql format, it's not converted in timestamp when you stored it in an array
EDIT: this is a sql fiddle http://sqlfiddle.com/#!2/dc28d/1
For dos_id = 6, t="2012-11-01" it should returns only 2 and 5 at date "2010-12-10 13:16:58"
Not sure why you'd want to do this in php. Here's some SQL using joins instead to get most recent group(s) for all users given a date. Make sure you've got indexes on date and userid.
SELECT *
FROM test t1
LEFT JOIN test t2
ON t1.userid = t2.userid AND t2.thedate <= '2012-11-15' AND t2.thedate > t1.thedate
WHERE t1.thedate <= '2012-11-15' AND t2.userid IS NULL;
SQLfiddle
Or using your SQLFiddle
SELECT t1.*
FROM dossier_dans_groupe t1
LEFT JOIN dossier_dans_groupe t2
ON t1.dos_id = t2.dos_id AND t2.updated_at <= '2012-11-01'
AND t2.updated_at > t1.updated_at
WHERE t1.updated_at <= '2012-11-01' AND t2.dos_id IS NULL;
This would give you a list of all users and their groups (1 row per group) for the latest date that is smaller than the one you specify (2012-11-15 below).
SELECT user_id, group_id, date FROM table WHERE date <= '2012-11-15' AND NOT EXISTS (SELECT 1 FROM table test WHERE test.user_id = table.user_id AND test.date > table.date and test.date <= '2012-11-15')
Problem - Retrieve sum of subtotals on a half hour interval efficiently
I am using MySQL and I have a table containing subtotals with different times. I want to retrieve the sum of these sales on a half hour interval from 7 am through 12 am. My current solution (below) works but takes 13 seconds to query about 150,000 records. I intend to have several million records in the future and my current method is too slow.
How I can make this more efficient or if possible replace the PHP component with pure SQL? Also, would it help your solution to be even more efficient if I used Unix timestamps instead of having a date and time column?
Table Name - Receipts
subtotal date time sale_id
--------------------------------------------
6 09/10/2011 07:20:33 1
5 09/10/2011 07:28:22 2
3 09/10/2011 07:40:00 3
5 09/10/2011 08:05:00 4
8 09/10/2011 08:44:00 5
...............
10 09/10/2011 18:40:00 6
5 09/10/2011 23:05:00 7
Desired Result
An array like this:
Half hour 1 ::: (7:00 to 7:30) => Sum of Subtotal is 11
Half hour 2 ::: (7:30 to 8:00) => Sum of Subtotal is 3
Half hour 3 ::: (8:00 to 8:30) => Sum of Subtotal is 5
Half hour 4 ::: (8:30 to 9:00) => Sum of Subtotal is 8
Current Method
The current way uses a for loop which starts at 7 am and increments 1800 seconds, equivalent to a half hour. As a result, this makes about 34 queries to the database.
for($n = strtotime("07:00:00"), $e = strtotime("23:59:59"); $n <= $e; $n += 1800) {
$timeA = date("H:i:s", $n);
$timeB = date("H:i:s", $n+1799);
$query = $mySQL-> query ("SELECT SUM(subtotal)
FROM Receipts WHERE time > '$timeA'
AND time < '$timeB'");
while ($row = $query-> fetch_object()) {
$sum[] = $row;
}
}
Current Output
Output is just an array where:
[0] represents 7 am to 7:30 am
[1] represents 7:30 am to 8:00 am
[33] represents 11:30 pm to 11:59:59 pm.
array ("0" => 10000,
"1" => 20000,
..............
"33" => 5000);
You can try this single query as well, it should return a result set with the totals in 30 minute groupings:
SELECT date, MIN(time) as time, SUM(subtotal) as total
FROM `Receipts`
WHERE `date` = '2012-07-30'
GROUP BY hour(time), floor(minute(time)/30)
To run this efficiently, add a composite index on the date and time columns.
You should get back a result set like:
+---------------------+--------------------+
| time | total |
+---------------------+--------------------+
| 2012-07-30 00:00:00 | 0.000000000 |
| 2012-07-30 00:30:00 | 0.000000000 |
| 2012-07-30 01:00:00 | 0.000000000 |
| 2012-07-30 01:30:00 | 0.000000000 |
| 2012-07-30 02:00:00 | 0.000000000 |
| 2012-07-30 02:30:00 | 0.000000000 |
| 2012-07-30 03:00:00 | 0.000000000 |
| 2012-07-30 03:30:00 | 0.000000000 |
| 2012-07-30 04:00:00 | 0.000000000 |
| 2012-07-30 04:30:00 | 0.000000000 |
| 2012-07-30 05:00:00 | 0.000000000 |
| ...
+---------------------+--------------------+
First, I would use a single DATETIME column, but using a DATE and TIME column will work.
You can do all the work in one pass using a single query:
select date,
hour(`time`) hour_num,
IF(MINUTE(`time`) < 30, 0, 1) interval_num,
min(`time`) interval_begin,
max(`time`) interval_end,
sum(subtotal) sum_subtotal
from receipts
where date='2012-07-31'
group by date, hour_num, interval_num;
UPDATE:
Since you aren't concerned with any "missing" rows, I'm also going to assume (probably wrongly) that you aren't concerned that the query might possibly return rows for periods that are not from 7AM to 12AM. This query will return your specified result set:
SELECT (HOUR(r.time)-7)*2+(MINUTE(r.time) DIV 30) AS i
, SUM(r.subtotal) AS sum_subtotal
FROM Receipts r
GROUP BY i
ORDER BY i
This returns the period index (i) derived from an expression referencing the time column. For best performance of this query, you probably want to have a "covering" index available, for example:
ON Receipts(`time`,`subtotal`)
If you are going to include an equality predicate on the date column (which does not appear in your solution, but which does appear in the solution of the "selected" answer, then it would be good to have that column as a leading index in the "covering" index.
ON Receipts(`date`,`time`,`subtotal`)
If you want to ensure that you are not returning any rows for periods before 7AM, then you could simply add a HAVING i >= 0 clause to the query. (Rows for periods before 7AM would generate a negative number for i.)
SELECT (HOUR(r.time)-7)*2+(MINUTE(r.time) DIV 30) AS i
, SUM(r.subtotal) AS sum_subtotal
FROM Receipts r
GROUP BY i
HAVING i >= 0
ORDER BY i
PREVIOUSLY:
I've assumed that you want a result set similar to the one you are currently returning, but in one fell swoop. This query will return the same 33 rows you are currently retrieving, but with an extra column identifying the period (0 - 33). This is as close to your current solution that I could get:
SELECT t.i
, IFNULL(SUM(r.subtotal),0) AS sum_subtotal
FROM (SELECT (d1.i + d2.i + d4.i + d8.i + d16.i + d32.i) AS i
, ADDTIME('07:00:00',SEC_TO_TIME((d1.i+d2.i+d4.i+d8.i+d16.i+d32.i)*1800)) AS b_time
, ADDTIME('07:30:00',SEC_TO_TIME((d1.i+d2.i+d4.i+d8.i+d16.i+d32.i)*1800)) AS e_time
FROM (SELECT 0 i UNION ALL SELECT 1) d1 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 2) d2 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 4) d4 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 8) d8 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 16) d16 CROSS
JOIN (SELECT 0 i UNION ALL SELECT 32) d32
HAVING i <= 33
) t
LEFT
JOIN Receipts r ON r.time >= t.b_time AND r.time < t.e_time
GROUP BY t.i
ORDER BY t.i
Some important notes:
It looks like your current solution may be "missing" rows from Receipts whenever the the seconds is exactly equal to '59' or '00'.
It also looks like you aren't concerned with the date component, you are just getting a single value for all dates. (I may have misread that.) If so, the separation of the DATE and TIME columns helps with this, because you can reference the bare TIME column in your query.
It's easy to add a WHERE clause on the date column. e.g. to get the subtotal rollups for just a single day e.g. add a WHERE clause before the GROUP BY.
WHERE r.date = '2011-09-10'
A covering index ON Receipts(time,subtotal) (if you don't already have a covering index) may help with performance. (If you include an equality predicate on the date column (as in the WHERE clause above, the most suitable covering index would likely be ON Receipts(date,time,subtotal).
I've made an assumption that the time column is of datatype TIME. (If it isn't, then a small adjustment to the query (in the inline view aliased as t) is probably called for, to have the datatype of the (derived) b_time and e_time columns match the datatype of the time column in Receipts.
Some of proposed solutions in other answers are not guaranteed to return 33 rows, when there are no rows in Receipts within a given time period. "Missing rows" may not be an issue for you, but it is a frequent issue with timeseries and timeperiod data.
I've made the assumption that you would prefer to have a guarantee of 33 rows returned. The query above returns a subtotal of zero when no rows are found matching a time period. (I note that your current solution will return a NULL in that case. I've gone and wrapped that SUM aggregate in an IFNULL function, so that it will return a 0 when the SUM is NULL.)
So, the inline query aliased as t is an ugly mess, but it works fast. What it's doing is generating 33 rows, with distinct integer values 0 thru 33. At the same time, it derives a "begin time" and an "end time" that will be used to "match" each period to the time column on the Receipts table.
We take care not to wrap the time column from the Receipts table in any functions, but reference just the bare column. And we want to ensure we don't have any implicit conversion going on (which is why we want the datatypes of b_time and e__time to match. The ADDTIME and SEC_TO_TIME functions both return TIME datatype. (We can't get around doing the matching and the GROUP BY operations.)
The "end time" value for that last period is returned as "24:00:00", and we verify that this is a valid time for matching by running this test:
SELECT MAKETIME(23,59,59) < MAKETIME(24,0,0)
which is successful (returns a 1) so we're good there.
The derived columns (t.b_time and t.e_time) could be included in the resultset as well, but they aren't needed to create your array, and it's (likely) more efficient if you don't include them.
And one final note: for optimal performance, it may be beneficial to load the inline view aliased as t into an actual table (a temporary table would be fine.), and then you could reference the table in place of the inline view. The advantage of doing that is that you could create an index on that table.
One way to make it pure SQL is to use a lookup table. I don't know MySql that well so there maybe alot of improvement to the code. All my code will be Ms Sql..
I would do it something like this:
/* Mock salesTable */
Declare #SalesTable TABLE (SubTotal int, SaleDate datetime)
Insert into #SalesTable (SubTotal, SaleDate) VALUES (1, '2012-08-01 12:00')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (2, '2012-08-01 12:10')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (3, '2012-08-01 12:15')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (4, '2012-08-01 12:30')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (5, '2012-08-01 12:35')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (6, '2012-08-01 13:00')
Insert into #SalesTable (SubTotal, SaleDate) VALUES (7, '2012-08-01 14:00')
/* input data */
declare #From datetime, #To DateTime, #intervall int
set #from = '2012-08-01'
set #to = '2012-08-02'
set #intervall = 30
/* Create lookup table */
DECLARE #lookup TABLE (StartTime datetime, EndTime datetime)
DECLARE #tmpTime datetime
SET #tmpTime = #from
WHILE (#tmpTime <= #To)
BEGIN
INSERT INTO #lookup (StartTime, EndTime) VALUES (#tmpTime, dateAdd(mi, #intervall, #tmpTime))
set #tmpTime = dateAdd(mi, #intervall, #tmpTime)
END
/* Get data */
select l.StartTime, l.EndTime, sum(subTotal) from #SalesTable as SalesTable
join #lookUp as l on SalesTable.SaleDate >= l.StartTime and SalesTable.SaleDate < l.EndTime
group by l.StartTime, l.EndTime
In my query, I'm assuming one datetime field named date. This will give you all the groups starting at whatever datetime you give it to start with:
SELECT
ABS(FLOOR(TIMESTAMPDIFF(MINUTE, date, '2011-08-01 00:00:00') / 30)) AS GROUPING
, SUM(subtotal) AS subtotals
FROM
Receipts
GROUP BY
ABS(FLOOR(TIMESTAMPDIFF(MINUTE, date, '2011-08-01 00:00:00') / 30))
ORDER BY
GROUPING
Always use the proper datatypes for your data. In the case of your date/time columns, it's best to store them as (preferrably UTC zoned) timestamps. This is especially true in that some times don't exist for some dates (for some timzones, hence UTC). You will want an index on this column.
Also, your date/time range isn't going to give you what you want - namely, you're missing anything exactly on the hour (because you use a strict greater-than comparison). Always define ranges as 'lower-bound inclusive, upper-bound exclusive' (so, time >= '07:00:00' AND time < '07:30:00'). This is especially important for timestamps, which have an additional number of fields to deal with.
Because mySQL doesn't have recursive queries, you're going to want a couple of extra tables to pull this off. I'm referencing them as 'permanent' tables, but it would certainly be possible to define them in-line, if necessary.
You're going to want a Calendar table. These are useful for a number of reasons, but here we want them for their listing of dates. This will allow us to show dates that have subtotals of 0, if necessary. You're also going to want a value of times in half-hour increments, for the same reasons.
This should allow you to query your data like so:
SELECT division, COALESCE(SUM(subtotal), 0)
FROM (SELECT TIMESTAMP(calendar_date, clock_time) as division
FROM Calendar
CROSS JOIN Clock
WHERE calendar_date >= DATE('2011-09-10')
AND calendar_date < DATE('2011-09-11')) as divisions
LEFT JOIN Sales_Data
ON occurredAt >= division
AND occurredAt < division + INTERVAL 30 MINUTE
GROUP BY division
(Working example on SQLFiddle, which uses a regular JOIN for brevity)
I found a different solution too and posting it here for reference should anyone stumble upon this. Groups by half hour intervals.
SELECT SUM(total), time, date
FROM tableName
GROUP BY (2*HOUR(time) + FLOOR(MINUTE(time)/30))
Link for more info
http://www.artfulsoftware.com/infotree/queries.php#106