Breaking up a month into individual days on a large table

Breaking up a month into individual days on a large table - php

I've been trying to figure out an effective way to break up a month of click data into individual days for a graph, but most of the queries I've put together so far are taking 20-30 seconds because I'm having trouble thinking of a way to do it without subtables \ subqueries. Best I've come up with so far is:
SELECT
SUM(CASE WHEN ( TIME BETWEEN '2018/04/09' AND '2018/04/10') THEN 1 ELSE 0 END) 9th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/08' AND '2018/04/09') THEN 1 ELSE 0 END) 8th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/07' AND '2018/04/08') THEN 1 ELSE 0 END) 7th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/06' AND '2018/04/07') THEN 1 ELSE 0 END) 6th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/05' AND '2018/04/06') THEN 1 ELSE 0 END) 5th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/04' AND '2018/04/05') THEN 1 ELSE 0 END) 6th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/03' AND '2018/04/04') THEN 1 ELSE 0 END) 4th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/02' AND '2018/04/03') THEN 1 ELSE 0 END) 3rd
FROM
(
SELECT TIME, BIN_IP FROM CLICKS_IN WHERE USER_GROUP = 4 AND TIME BETWEEN '2018/04/02' AND '2018/04/10'
)a;
Explain:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE CLICKS_IN NULL ref USER_GROUP,TIME USER_GROUP 2 const 1614964 26.65 Using where
Or variations thereof, but it's still terribly inefficient given the potential record count (can potentially get 100k+ clicks a day). Also the actual code uses prepared statements so no need to point that out, I put the values in here for clarity sake.
Edit: I've found using the following is a -lot- faster but I'm concerned it will run into problems when I have more user groups on the system.
SELECT SUM(TIME >= '2018/04/09' AND TIME < '2018/04/10') as 9th,
SUM(TIME >= '2018/04/08' AND TIME < '2018/04/09') as 8th,
SUM(TIME >= '2018/04/06' AND TIME < '2018/04/08') as 7th,
SUM(TIME >= '2018/04/05' AND TIME < '2018/04/07') as 6th,
SUM(TIME >= '2018/04/04' AND TIME < '2018/04/06') as 5th,
SUM(TIME >= '2018/04/03' AND TIME < '2018/04/05') as 4th,
SUM(TIME >= '2018/04/02' AND TIME < '2018/04/04') as 3th
FROM CLICKS_IN USE INDEX (TIME)
WHERE TIME BETWEEN '2018/04/02' AND '2018/04/10'
AND USER_GROUP = 4

SELECT SUM(TIME >= '2018-04-09' AND TIME < '2018-04-10') as 9th,
SUM(TIME >= '2018-04-08' AND TIME < '2018-04-09') as 8th
FROM CLICKS_IN
WHERE USER_GROUP = 4
AND TIME >= '2018-04-02'
AND TIME < '2018-04-11'
And make sure you have indexes on the time and user_group columns. Then it should run in a few milliseconds.

You might do this where each day is returned as a row. Pivoting it from rows to columns could be done more efficiently in the calling PHP code.
SELECT
DAYOFMONTH(TIME) as `day`,
COUNT(*) as `numclicks`
FROM `CLICKS_IN`
WHERE USER_GROUP = 4 AND TIME BETWEEN '2018/04/02' AND '2018/04/10'
GROUP BY DAYOFMONTH(TIME)
ORDER BY DAYOFMONTH(TIME)

Related

Get query result of every month between two dates

create table link : https://drive.google.com/file/d/1EEqpW2Y8UkplfQcp_fw0j2byXAxBQXOW/view?usp=sharing
I have a query
SELECT
DATE_FORMAT(seized_date,'%Y-%m') as 'Seized Date',
sum(case when seized_remarks = 'Temporary Seized' then 1 else 0 end) AS seized,
sum(case when (DATE_FORMAT(release_date, '%Y-%m') BETWEEN '2021-01' AND '2021-07') then 1 else 0 end) AS released,
sum(case when (DATE_FORMAT(stock_return_date, '%Y-%m') BETWEEN '2021-01' AND '2021-07') then 1 else 0 end) AS stock_return
FROM mahindra
where
(DATE_FORMAT(seized_date, '%Y-%m') BETWEEN '2021-01' AND '2021-07')
GROUP BY DATE_FORMAT(seized_date,'%Y-%m')
which gives result as
Date Seized Release Stock Return
2021-01 1 0 0
2021-03 1 0 0
2021-04 1 0 0
2021-05 5 1 0
2021-06 6 0 1
2021-07 2 0 0
here i didn't get the result of february 2021. I want to get the result of all months between this two dates even if the seized_date does not exist

Looks like you need in something like
SELECT
dates.`Seized Date`,
COALESCE(SUM(mahindra.seized_remarks = 'Temporary Seized'), 0) AS seized,
COALESCE(COUNT(mahindra.release_date), 0) AS released,
COALESCE(COUNT(mahindra.stock_return_date), 0) AS stock_return
FROM ( SELECT '2021-01' `Seized Date` UNION ALL
SELECT '2021-02' UNION ALL
SELECT '2021-03' UNION ALL
SELECT '2021-04' UNION ALL
SELECT '2021-05' UNION ALL
SELECT '2021-06' UNION ALL
SELECT '2021-07' ) dates
LEFT JOIN mahindra ON DATE_FORMAT(mahindra.seized_date, '%Y-%m') = dates.`Seized Date`
GROUP BY dates.`Seized Date`

As #Akina says, if the seized_date value does not exist anywhere in your table, you cannot expect it to be present in your results at all.
You need to create a column containing all required dates and then you can do something like perform a join onto that column.
Here's an example of how I might do it.
SELECT TO_CHAR(DATEADD('MONTH', -n, (CURRENT_DATE+1)),'YYYY-MM') AS seized_date
FROM (
SELECT ROW_NUMBER() OVER () AS n
FROM mahindra LIMIT 10
)
ORDER BY seized_date DESC
This is creates the following output.
Code explanation:
The inner query is based on a window function.
The window function itself is simple. We're basically telling the computer to assign a value of 10 to n. To a human, "10" is a numeric value assigned to a specific thing or count of things (eg the temperature is 10 degrees, or I have 10 apples), but to the processor/computer, it doesn't mean much on its own. At least not in our scenario. So we simply tell the processor to count some rows ROW_NUMBER and when it finds 10 rows, that's what 10 looks like. You can use LIMIT to make this greater or fewer than 10 months.
In the outer query we just take today's date CURRENT_DATE and subtract n months from it as in (DATEADD('MONTH', -n, (CURRENT_DATE+1).
In our case, n is 10 months.
Now you have a column of dates, formatted as you per your requirements YYYY-MM.
You can write the query such that you LEFT JOIN your precessed data set to these dates on their corresponding values.
The reason why this is a good way of doing things is that you don't have to manually enter any dates or use a UNION join. You let the window function do the work for you, meaning you can go back in time as far as you need/or want very easily by changing the LIMIT value. This allows for greater efficiency in the event where you need to go back over multiple years, for example.

How can I fill missing date-time values for an sql result in php

I have an sql request that returns record counts for every 5 minutes (or 15 minutes).
So I can show that on a line graphic. But this data by itself returns wrong graphic, because some time spans have no records so no date returns for that intervals and this is causes to show a wrong graphic.
Here is my sql code.
SELECT
YEAR(postdate) as Y, MONTH(postdate) as M, DAY(postdate) as D, HOUR(postdate) as H, MINUTE(postdate),
FLOOR(MINUTE(postdate) / 5) * 5 AS MinIntVal,
SUM(CASE type WHEN 'ins' THEN 1 ELSE 0 END) AS Instagram,
SUM(CASE type WHEN 'twi' THEN 1 ELSE 0 END) AS Twitter,
SUM(1) as TotalPost
FROM
entries
WHERE
postdate IS NOT NULL
AND postdate >= DATE_ADD(CURDATE(), INTERVAL -5 DAY)
GROUP BY
YEAR(postdate), MONTH(postdate), DAY(postdate), MinIntVal
ORDER BY D DESC, H DESC, MinIntVal Desc
And result is below
But the desired / expected result should be looked like below
So in php how can I add missing 'empty' date values or someone suggested I should add an other table only contains dates, but I have no idea how it should work.

MySQL fill empty dates

I have been trying to get my head around this for a time now and can't find a solution: I am querying for time-entries with a result like this:
2015-02-10: 13
2015-02-11: 16
2015-02-13: 11
As you can see I am missing two days in the array because there are no entries for these days. My google-fu brought me some solutions for this problem but none seem to work for my specific code:
SELECT
DATE(time_entries.start) AS date,
COUNT(time_entries.id) AS entries,
SUM(CASE WHEN user_id = 4 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS me,
SUM(CASE WHEN user_id = 3 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS ph
FROM time_entries
LEFT JOIN calendar_table
ON time_entries.start=calendar_table.dt
WHERE start BETWEEN CURRENT_DATE - INTERVAL 1 MONTH AND CURDATE()
GROUP BY date
ORDER BY date
I created the calendar_table with this help: https://www.brianshowalter.com/calendar_tables
Please help!
Best,
Chris

Try with right join. Your query is using your time_entries records to match the calendar table, and finds nothing because they're not there.
By using right join, you'll use calendar_table records first.
SELECT
DATE(time_entries.start) AS date,
COUNT(time_entries.id) AS entries,
SUM(CASE WHEN user_id = 4 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS me,
SUM(CASE WHEN user_id = 3 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS ph
FROM time_entries
RIGHT JOIN calendar_table
ON time_entries.start=calendar_table.dt
WHERE start BETWEEN CURRENT_DATE - INTERVAL 1 MONTH AND CURDATE()
GROUP BY date
ORDER BY date

Working out the amount of free dates in a given time period

I have a fun one for you. I have a database with the date columns free_from and free_until. What I need to find is the amount of days between now and 1 month today which are free. For example, if the current date was 2013/01/15 and the columns were as follows:
free_from | free_until
2013/01/12| 2013/01/17
2013/01/22| 2013/01/26
2013/01/29| 2013/02/04
2013/02/09| 2013/02/11
2013/02/14| 2013/02/17
2013/02/19| 2013/02/30
The answer would be 16
as 2 + 4 + 6 + 2 + 2 + 0 = 16
The first row only starts counting at the 15th rather than the 12th
since the 15th is the current date.
The last row is discounted because none of the dates are within a
month of the current date.
The dates must be counted as it the free_from date is inclusive and
the free_until date is exclusive.
I'm assuming DATEDIFF() will be used somewhere along the line, but I can't, for the life of me, work this one out.
Thanks for your time!
Edit: This is going into PHP mysql_query so that might restrict you a little concerning what you can do with MYSQL.

SET #today = "2013-01-15";
SET #nextm = DATE_ADD(#today, INTERVAL 1 month);
SET #lastd = DATE_ADD(#nextm, INTERVAL 1 day);
SELECT
DATEDIFF(
IF(#lastd> free_until, free_until, #lastd),
IF(#today > free_from, #today, free_from)
)
FROM `test`
WHERE free_until >= #today AND free_from < #nextm
That should work. At least for your test data. But what day is 2013/02/30? :-)
Dont forget to change #today = CURDATE();

The best I can think of is something like:
WHERE free_until > CURDATE()
AND free_from < CURDATE() + INTERVAL '1' MONTH
That will get rid of any unnecessary rows. Then on the first row do in PHP:
date_diff(date(), free_until)
On the last row, do:
date_diff(free_from, strtotime(date("Y-m-d", strtotime($todayDate)) . "+1 month"))
Then on intermediate dates do:
date_diff(free_from, free_until)
Something to that effect, but this seems extremely clunky and convoluted...

From the top of my mind... first do a:
SELECT a.free_from AS a_from, a.free_until AS a_until, b.free_from AS b_from
FROM availability a
INNER JOIN availability b ON b.free_from > a.free_until
ORDER BY a_from, b_from
This probably will return a set of rows where for each row interval you have next i.e. greater intervals. The results are ordered strategically. You can then wrap the results in a partial group by:
SELECT * FROM (
SELECT a.free_from AS a_from, a.free_until AS a_until, b.free_from AS b_from
FROM availability a
INNER JOIN availability b ON b.free_from > a.free_until
ORDER BY a_from, b_from
) AS NextInterval
GROUP BY a_from, b_until
In the above query, add a DATE_DIFF clause (wrap it in SUM() if necessary):
DATE_DIFF(b_until, a_from)

Mysql GROUP BY and COUNT for multiple WHERE clauses

Simplified Table structure:
CREATE TABLE IF NOT EXISTS `hpa` (
`id` bigint(15) NOT NULL auto_increment,
`core` varchar(50) NOT NULL,
`hostname` varchar(50) NOT NULL,
`status` varchar(255) NOT NULL,
`entered_date` int(11) NOT NULL,
`active_date` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `hostname` (`hostname`),
KEY `status` (`status`),
KEY `entered_date` (`entered_date`),
KEY `core` (`core`),
KEY `active_date` (`active_date`)
)
For this, I have the following SQL query which simply totals up all records with the defined status.
SELECT core,COUNT(hostname) AS hostname_count, MAX(active_date) AS last_active
FROM `hpa`
WHERE
status != 'OK' AND status != 'Repaired'
GROUP BY core
ORDER BY core
This query has been simplified to remove the INNER JOINS to unrelated data and extra columns that shouldn't affect the question.
MAX(active_date) is the same for all records of a particular day, and should always select the most recent day, or allow an offset from NOW(). (it's a UNIXTIME field)
I want both the count of: (status != 'OK' AND status != 'Repaired')
AND the inverse... count of: (status = 'OK' OR status = 'Repaired')
AND the first answer divided by the second, for 'percentage_dead' (Probably just as fast to do in post processing)
FOR the most recent day or an offset ( - 86400 for yesterday, etc..)
Table contains about 500k records and grows by about 5000 a day so a single SQL query as opposed to looping would be real nice..
I imagine some creative IF's could do this. You expertise is appreciated.
EDIT: I'm open to using a different SQL query for either todays data, or data from an offset.
EDIT: Query works, is fast enough, but I currently can't let the users sort on the percentage column (the one derived from bad and good counts). This is not a show stopper, but I allow them to sort on everything else. The ORDER BY of this:
SELECT h1.core, MAX(h1.entered_date) AS last_active,
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS good_host_count,
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS bad_host_count
FROM `hpa` h1
LEFT OUTER JOIN `hpa` h2 ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY h1.core
ORDER BY ( bad_host_count / ( bad_host_count + good_host_count ) ) DESC,h1.core
Gives me:
#1247 - Reference 'bad_host_count' not supported (reference to group function)
EDIT: Solved for a different section. The following works and allows me to ORDER BY percentage_dead
SELECT c.core, c.last_active,
SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) AS good_host_count,
SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) AS bad_host_count,
( SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) * 100/
( (SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) )+(SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) ) ) ) AS percentage_dead
FROM `agent_cores` c
LEFT JOIN `dead_agents` d ON c.core = d.core
WHERE d.active = 1
GROUP BY c.core
ORDER BY percentage_dead

If I understand, you want to get a count of the status of OK vs. not OK hostnames, on the date of the last activity. Right? And then that should be grouped by core.
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY core
ORDER BY core;
This is a variation of the "greatest-n-per-group" problem that I see a lot in SQL questions on StackOverflow.
First want to choose only the rows that have the latest activity date per hostname, which we can do by doing an outer join for rows with the same hostname and a greater active_date. Where we find no such match, we already have the latest rows for each given hostname.
Then group by core and count the rows by status.
That's the solution for today's date (assuming no row has an active_date in the future). To restrict the result to rows N days ago, you have to restrict both tables.
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
AND h2.active_date <= CURDATE() - INTERVAL 1 DAY)
WHERE h1.active_date <= CURDATE() - INTERVAL 1 DAY AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;
Regarding the ratio between OK and broken hostnames, I'd recommend just calculating that in your PHP code. SQL doesn't allow you to reference column aliases in other select-list expressions, so you'd have to wrap the above as a subquery and that's more complex than it's worth in this case.
I forgot you said you're using a UNIX timestamp. Do something like this:
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
AND h2.active_date <= UNIX_TIMESTAMP() - 86400)
WHERE h1.active_date <= UNIX_TIMESTAMP() - 86400 AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Breaking up a month into individual days on a large table - php

Related

Get query result of every month between two dates

How can I fill missing date-time values for an sql result in php

MySQL fill empty dates

Working out the amount of free dates in a given time period

Mysql GROUP BY and COUNT for multiple WHERE clauses

Categories

Resources