Mysql GROUP BY and COUNT for multiple WHERE clauses - php

Simplified Table structure:
CREATE TABLE IF NOT EXISTS `hpa` (
`id` bigint(15) NOT NULL auto_increment,
`core` varchar(50) NOT NULL,
`hostname` varchar(50) NOT NULL,
`status` varchar(255) NOT NULL,
`entered_date` int(11) NOT NULL,
`active_date` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `hostname` (`hostname`),
KEY `status` (`status`),
KEY `entered_date` (`entered_date`),
KEY `core` (`core`),
KEY `active_date` (`active_date`)
)
For this, I have the following SQL query which simply totals up all records with the defined status.
SELECT core,COUNT(hostname) AS hostname_count, MAX(active_date) AS last_active
FROM `hpa`
WHERE
status != 'OK' AND status != 'Repaired'
GROUP BY core
ORDER BY core
This query has been simplified to remove the INNER JOINS to unrelated data and extra columns that shouldn't affect the question.
MAX(active_date) is the same for all records of a particular day, and should always select the most recent day, or allow an offset from NOW(). (it's a UNIXTIME field)
I want both the count of: (status != 'OK' AND status != 'Repaired')
AND the inverse... count of: (status = 'OK' OR status = 'Repaired')
AND the first answer divided by the second, for 'percentage_dead' (Probably just as fast to do in post processing)
FOR the most recent day or an offset ( - 86400 for yesterday, etc..)
Table contains about 500k records and grows by about 5000 a day so a single SQL query as opposed to looping would be real nice..
I imagine some creative IF's could do this. You expertise is appreciated.
EDIT: I'm open to using a different SQL query for either todays data, or data from an offset.
EDIT: Query works, is fast enough, but I currently can't let the users sort on the percentage column (the one derived from bad and good counts). This is not a show stopper, but I allow them to sort on everything else. The ORDER BY of this:
SELECT h1.core, MAX(h1.entered_date) AS last_active,
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS good_host_count,
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS bad_host_count
FROM `hpa` h1
LEFT OUTER JOIN `hpa` h2 ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY h1.core
ORDER BY ( bad_host_count / ( bad_host_count + good_host_count ) ) DESC,h1.core
Gives me:
#1247 - Reference 'bad_host_count' not supported (reference to group function)
EDIT: Solved for a different section. The following works and allows me to ORDER BY percentage_dead
SELECT c.core, c.last_active,
SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) AS good_host_count,
SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) AS bad_host_count,
( SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) * 100/
( (SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) )+(SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) ) ) ) AS percentage_dead
FROM `agent_cores` c
LEFT JOIN `dead_agents` d ON c.core = d.core
WHERE d.active = 1
GROUP BY c.core
ORDER BY percentage_dead

If I understand, you want to get a count of the status of OK vs. not OK hostnames, on the date of the last activity. Right? And then that should be grouped by core.
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY core
ORDER BY core;
This is a variation of the "greatest-n-per-group" problem that I see a lot in SQL questions on StackOverflow.
First want to choose only the rows that have the latest activity date per hostname, which we can do by doing an outer join for rows with the same hostname and a greater active_date. Where we find no such match, we already have the latest rows for each given hostname.
Then group by core and count the rows by status.
That's the solution for today's date (assuming no row has an active_date in the future). To restrict the result to rows N days ago, you have to restrict both tables.
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
AND h2.active_date <= CURDATE() - INTERVAL 1 DAY)
WHERE h1.active_date <= CURDATE() - INTERVAL 1 DAY AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;
Regarding the ratio between OK and broken hostnames, I'd recommend just calculating that in your PHP code. SQL doesn't allow you to reference column aliases in other select-list expressions, so you'd have to wrap the above as a subquery and that's more complex than it's worth in this case.
I forgot you said you're using a UNIX timestamp. Do something like this:
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
AND h2.active_date <= UNIX_TIMESTAMP() - 86400)
WHERE h1.active_date <= UNIX_TIMESTAMP() - 86400 AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;

Related

Breaking up a month into individual days on a large table

I've been trying to figure out an effective way to break up a month of click data into individual days for a graph, but most of the queries I've put together so far are taking 20-30 seconds because I'm having trouble thinking of a way to do it without subtables \ subqueries. Best I've come up with so far is:
SELECT
SUM(CASE WHEN ( TIME BETWEEN '2018/04/09' AND '2018/04/10') THEN 1 ELSE 0 END) 9th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/08' AND '2018/04/09') THEN 1 ELSE 0 END) 8th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/07' AND '2018/04/08') THEN 1 ELSE 0 END) 7th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/06' AND '2018/04/07') THEN 1 ELSE 0 END) 6th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/05' AND '2018/04/06') THEN 1 ELSE 0 END) 5th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/04' AND '2018/04/05') THEN 1 ELSE 0 END) 6th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/03' AND '2018/04/04') THEN 1 ELSE 0 END) 4th,
SUM(CASE WHEN ( TIME BETWEEN '2018/04/02' AND '2018/04/03') THEN 1 ELSE 0 END) 3rd
FROM
(
SELECT TIME, BIN_IP FROM CLICKS_IN WHERE USER_GROUP = 4 AND TIME BETWEEN '2018/04/02' AND '2018/04/10'
)a;
Explain:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE CLICKS_IN NULL ref USER_GROUP,TIME USER_GROUP 2 const 1614964 26.65 Using where
Or variations thereof, but it's still terribly inefficient given the potential record count (can potentially get 100k+ clicks a day). Also the actual code uses prepared statements so no need to point that out, I put the values in here for clarity sake.
Edit: I've found using the following is a -lot- faster but I'm concerned it will run into problems when I have more user groups on the system.
SELECT SUM(TIME >= '2018/04/09' AND TIME < '2018/04/10') as 9th,
SUM(TIME >= '2018/04/08' AND TIME < '2018/04/09') as 8th,
SUM(TIME >= '2018/04/06' AND TIME < '2018/04/08') as 7th,
SUM(TIME >= '2018/04/05' AND TIME < '2018/04/07') as 6th,
SUM(TIME >= '2018/04/04' AND TIME < '2018/04/06') as 5th,
SUM(TIME >= '2018/04/03' AND TIME < '2018/04/05') as 4th,
SUM(TIME >= '2018/04/02' AND TIME < '2018/04/04') as 3th
FROM CLICKS_IN USE INDEX (TIME)
WHERE TIME BETWEEN '2018/04/02' AND '2018/04/10'
AND USER_GROUP = 4
SELECT SUM(TIME >= '2018-04-09' AND TIME < '2018-04-10') as 9th,
SUM(TIME >= '2018-04-08' AND TIME < '2018-04-09') as 8th
FROM CLICKS_IN
WHERE USER_GROUP = 4
AND TIME >= '2018-04-02'
AND TIME < '2018-04-11'
And make sure you have indexes on the time and user_group columns. Then it should run in a few milliseconds.
You might do this where each day is returned as a row. Pivoting it from rows to columns could be done more efficiently in the calling PHP code.
SELECT
DAYOFMONTH(TIME) as `day`,
COUNT(*) as `numclicks`
FROM `CLICKS_IN`
WHERE USER_GROUP = 4 AND TIME BETWEEN '2018/04/02' AND '2018/04/10'
GROUP BY DAYOFMONTH(TIME)
ORDER BY DAYOFMONTH(TIME)

Get sql by consecutive hours

I have a question. Suppose I have this table in SQL:
date user_id
2015-03-17 00:06:12 143
2015-03-17 01:06:12 143
2015-03-17 02:06:12 143
2015-03-17 09:06:12 143
2015-03-17 10:10:10 200
I want to get the number of consecutive hours. For example, for user 143, I want to get 2 hours, for user 200 0 hours. I tried like this :
select user_id, TIMESTAMPDIFF(HOUR,min(date), max(date)) as hours
from myTable
group by user_id
But this query fetches all non-consecutive hours. Is it possible to solve the problem with a query, or do I need to post-process the results in PHP?
Use a variable to compare with the previous row.
SELECT user_id, SUM(cont_hour) FROM (
SELECT
user_id,
IF(CONCAT(DATE(#prev_date), ' ', HOUR(#prev_date), ':00:00') - INTERVAL 1 HOUR = CONCAT(DATE(t.date), ' ', HOUR(t.date), ':00:00')
AND #prev_user = t.user_id, 1, 0) AS cont_hour
, #prev_date := t.date
, #prev_user := t.user_id
FROM
table t
, (SELECT #prev_date := NULL, #prev_user := NULL) var_init_subquery
WHERE t.date BETWEEN <this> AND <that>
ORDER BY t.date
) sq
GROUP BY user_id;
I made the comparison a bit more complicated than you expected, but I thought it's necessary, that you don't just compare the hour, but also, that it's the same date (or the previous day, when it's around midnight).
you can read more about user variables here
As a short explanation: The ORDER BY is very important, as well as the order in the SELECT clause. The #prev_date holds the "previous row", cause we assign the value of the current row after we made our comparison.
Another version using temporary variables:
SET #u := 0;
SET #pt := 0;
SET #s := 0;
SELECT `user_id`, MAX(`s`) `conseq` FROM
(
SELECT
#s := IF(#u = `user_id`,
IF(UNIX_TIMESTAMP(`date`) - #pt = 3600, #s + 1, #s),
0) s,
#u := `user_id` `user_id`,
#pt := UNIX_TIMESTAMP(`date`) pt
FROM `users`
ORDER BY `date`
) AS t
GROUP BY `user_id`
The subquery sorts the rows by date, then compares user_id with the previous value. If user IDs are equal, calculates the difference between date and the previous timestamp #pt. If the difference is an hour (3600 seconds), then the #s counter is incremented by one. Otherwise, the counter is reset to 0:
s user_id pt
0 143 1426529172
1 143 1426532772
2 143 1426536372
2 143 1426561572
0 200 1426565410
The outer query collects the maximum counter values per user_id, since the maximum counter value corresponds to the last counter value per user_id.
Output
user_id conseq
143 2
200 0
Note, the query accepts the difference of exactly 1 hour. If you want a more flexible condition, simply adjust the comparison. For example, you can accept a difference in interval between 3000 and 4000 seconds as follows:
#s := IF(#u = `user_id`,
IF( (UNIX_TIMESTAMP(`date`) - #pt) BETWEEN 3000 AND 4000, #s + 1, #s),
0) s

PHP / MysQl limit user posts in month - day - hours

I am working on a php script, I have an admin control panel to add users, and I need to add a few options like user monthly posts - user daily posts - user hourly posts, let's say I set user monthly post to 30 and user daily posts is 10 and user hourly post is 5, that will be:
The user can post only 5 posts per 1 hour and 10 posts per day from the monthly 30 posts limit, if user monthly post is used, he can't add posts in this month and the next month i want to automatically add another 30 posts!!
My user table name is (user):
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(255) DEFAULT NULL,
`password` varchar(255) DEFAULT NULL,
`monthly` int(2) unsigned NOT NULL,
`daily` int(10) unsigned NOT NULL,
`hourly` int(10) unsigned NOT NULL,
And my post table name is user_post:
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`created_dt` datetime NOT NULL,
`user` int(10) unsigned NOT NULL,
I need to know :
how to make the monthly user column updated monthly to 30 if I registered the user with 30 monthly post limit.
when user is logged in and he want to post, how to check if he have more monthly, daily, hourly posts remaining!
Can anyone help me to see how I can do that, thank you my friends
Add a "datetime posted" column to your "user posts" table. For this example, we'll call the new column `created_dt` (with a dataype of DATETIME. We'll also assume that the name of the table is `user_post`.
When a row is inserted, populate the new column with the current date and time.
When a user attempts to post another row, you could perform a check whether any limit has been exceeded.
SELECT SUM(1) AS cnt_past_month
, SUM(p.created_dt >= NOW() + INTERVAL -7 DAY) AS cnt_past_week
, SUM(p.created_dt >= NOW() + INTERVAL -1 HOUR) AS cnt_past_hour
FROM user_post p
WHERE p.user_id = ?
AND p.created_dt >= NOW() - INTERVAL 1 MONTH
You can then compare the values returned to the limits for the user, to see if any limits have been exceeded, or would be exceeded if another post is added.
For optimal performance of this query, you will want an index
ON user_post(userid, created_dt)
You could get the limits for the user within the query itself...
SELECT q.count_past_month
, m.limit_past_month
, q.count_past_week
, m.limit_past_week
, q.count_past_hour
, m.limit_past_hour
FROM ( SELECT p.userid
, SUM(1) AS cnt_past_month
, SUM(p.created_dt >= NOW() - INTERVAL 7 DAY) AS cnt_past_week
, SUM(p.created_dt >= NOW() - INTERVAL 1 HOUR) AS cnt_past_hour
FROM user_post p
WHERE p.user_id = ?
AND p.created_dt >= NOW() - INTERVAL 1 MONTH
) q
CROSS
JOIN ( SELECT MIN(l.limit_per_month)
, MIN(l.limit_per_week)
, MIN(l.limit_per_hour)
FROM user_limit l
WHERE l.user_id = ?
) m
With this approach, you won't need a bunch of unnecessary DML to increment counters, and reset counters. Any change you make to the limits for user would could take effect immediately.
And you could use a value of "0" to specify "no limit". Your logic for doing comparisons would need to take that into account.
That's how I would do it.
I could also do the comparisons of the count to the limit in the query itself, returning the "number of posts remaining" until the limit is exceeded.
SELECT m.limit_past_month-IFNULL(q.count_past_month,0) AS remaining_past_month
, m.limit_past_week -IFNULL(q.count_past_week ,0) AS remaining_past_week
, m.limit_past_hour -IFNULL(q.count_past_hour ,0) AS remaining_past_hour
FROM (
The mechanics of the "ban" (no posts allowed) and "unlimited" (no limits on posts) would need to be worked out. For example using 0 to represent a ban, and a NULL to represent "no limit".
With that, we'd know that when the query returns a column with value less than or equal to zero, it would mean that a limit has been exceeded (or would be exceeded by another post.) All other values (NULL or positive integer) in the column would mean a "next post" would be allowed.
before inserting a posts you can run a similar query like this query to find out your posts for hourly,daily,and monthly and check that in code and decide to allow/not allow user to add posts.
without knowing your schema i just created some possible names for columns from your post table. and of course your userid (1) should be replaced with the userid of the user you're trying to prevent adding posts.
SELECT SUM(CASE WHEN created_at > CURRENT_TIMESTAMP() - INTERVAL 1 HOUR THEN 1
ELSE 0
END) AS hourly_posts,
SUM(CASE WHEN created_at > CURRENT_TIMESTAMP() - INTERVAL 1 DAY THEN 1
ELSE 0
END) AS daily_posts,
SUM(CASE WHEN created_at > CURRENT_TIMESTAMP() - INTERVAL 1 MONTH THEN 1
ELSE 0
END) AS monthly_posts
FROM posts
WHERE userid = 1
I have updated Tin Tran's code:
SELECT SUM(CASE WHEN (created_at > CURRENT_TIMESTAMP() - INTERVAL 1 HOUR) and date_format(CURRENT_TIMESTAMP(),'%H') = date_format(created_at,'%H')) THEN 1
ELSE 0
END) AS hourly_posts,
SUM(CASE WHEN (created_at > CURRENT_TIMESTAMP() - INTERVAL 1 DAY and date_format(CURRENT_TIMESTAMP(),'%D') = date_format(created_at,'%D')) THEN 1
ELSE 0
END) AS daily_posts,
SUM(CASE WHEN (created_at > CURRENT_TIMESTAMP() - INTERVAL 1 MONTH and date_format(CURRENT_TIMESTAMP(),'%m') = date_format(created_at,'%m')) THEN 1
ELSE 0
END) AS monthly_posts
FROM posts
WHERE userid = 1

MySQL fill empty dates

I have been trying to get my head around this for a time now and can't find a solution: I am querying for time-entries with a result like this:
2015-02-10: 13
2015-02-11: 16
2015-02-13: 11
As you can see I am missing two days in the array because there are no entries for these days. My google-fu brought me some solutions for this problem but none seem to work for my specific code:
SELECT
DATE(time_entries.start) AS date,
COUNT(time_entries.id) AS entries,
SUM(CASE WHEN user_id = 4 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS me,
SUM(CASE WHEN user_id = 3 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS ph
FROM time_entries
LEFT JOIN calendar_table
ON time_entries.start=calendar_table.dt
WHERE start BETWEEN CURRENT_DATE - INTERVAL 1 MONTH AND CURDATE()
GROUP BY date
ORDER BY date
I created the calendar_table with this help: https://www.brianshowalter.com/calendar_tables
Please help!
Best,
Chris
Try with right join. Your query is using your time_entries records to match the calendar table, and finds nothing because they're not there.
By using right join, you'll use calendar_table records first.
SELECT
DATE(time_entries.start) AS date,
COUNT(time_entries.id) AS entries,
SUM(CASE WHEN user_id = 4 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS me,
SUM(CASE WHEN user_id = 3 THEN TIMESTAMPDIFF(MINUTE, start, end) ELSE 0 END) AS ph
FROM time_entries
RIGHT JOIN calendar_table
ON time_entries.start=calendar_table.dt
WHERE start BETWEEN CURRENT_DATE - INTERVAL 1 MONTH AND CURDATE()
GROUP BY date
ORDER BY date

Mysql addition and add them as new column

I want to fetch 2 coulmns count and do their total as a new column.
How can I do this?
i wrote this query, but this is returning wrong total.
SELECT count(case when `status`='1' then 1 else 0 end) AS HOT,
count(case when `status`='5' then 1 end)
AS Special_Case,count(case when 1=1 then 1 end) AS TOTAL
FROM `tbl_customer_conversation` group by
date(`dt_added`),user_id
COUNT will only give the times a record is matched, which in your query will always return 1. Because the values can either be 1 or 0. So count(1) is also 1 and count(0) is also 1.
AS, you want the total number of HOT cases and SPECIAL_CASE you have to use SUM.
SELECT
SUM(case when `status`='1' then 1 else 0 end) AS HOT,
SUM(case when `status`='5' then 1 end) AS Special_Case,
SUM(case when `status` = '1' or `status` = '5' then 1 end) AS TOTAL
FROM `tbl_customer_conversation`
group by date(`dt_added`),user_id

Categories