I would like to implement a fidelity program, similar to the one on stackoverflow, on my website.
I want to be able to give some kind of reward to users who have visited my website for 30 days in a row.
[MySQL] What would be the best table architecture?
[PHP] What kind of algorithm should I use to optimize this task?
I prefer more raw data in the database than the approach that #Matt H. advocates. Make a table that records all logins to the site (or, if you prefer, new session initiations) along with their time and date:
CREATE TABLE LoginLog (
UserId INT NOT NULL REFERENCES Users (UserId),
LoginTime TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
)
CREATE INDEX IX_LoginLog USING BTREE ON LoginLog (UserId ASC, LoginTime DESC)
Just insert the UserId into the table on login. I, of course, made some assumptions about your database, but I think you will be able to adapt.
Then, to check for discrete logins for each of the preceding thirty days:
SELECT COUNT(*)
FROM (SELECT DATE(log.LoginTime) AS LoginDate,
COUNT(*) AS LoginCount
FROM LoginLog log
WHERE log.LoginTime >= DATE(DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 30 DAYS))
GROUP BY LoginDate
HAVING COUNT(*) > 0) a
If the result is 30, you're golden.
I will admit that I haven't touched MySQL in a while (working mainly on SQL Server and on PostgreSQL lately), so if the syntax is off a bit, I apologize. I hope the concept makes sense, though.
From your description above, this could be accomplished fairly simply and with one table.
| ID | Table PK, auto-incrementing
| EMAIL | website visitor unique ID. Ostensibly an email, but could be any piece of data that uniquely ID's the visitor
| FIRST_CONSECUTIVE_DAY | timestamp
| LAST_CONSECUTIVE_DAY | timestamp
| HAS_BEEN_REWARDED | bool, default false(0)
Thats it for the table. :)
The algorithm is in three parts:
When a user logs in, once they have been verified...
1) Check the users LAST_CONSECUTIVE_DAY. If the LAST_CONSECUTIVE_DAY is today, do nothing. If the LAST_CONSECUTIVE_DAY is yesterday, set LAST_CONSECUTIVE_DAY to todays date. Otherwise, set FIRST_CONSECUTIVE_DAY and LAST_CONSECUTIVE_DAY to todays date.
2) Use TIMESTAMPDIFF to compare LAST_CONSECUTIVE_DAY and FIRST_CONSECUTIVE_DAY by the DAY unit. If it returns 30 go on to step 3, otherwise move on with the application.
3) Your user has visited the website every single day for 30 days in a row! Congratulations! Check HAS_BEEN_REWARDED to see if they have done it before, if still false give them a prize and mark HAS_BEEN_REWARDED as true.
Related
I have a MySQL query question.
In my web app I record the active listeners on my Shoutcast server in a MySQL database table, which includes "created", a datetime field for when they tuned in, and an "updated", a datetime field for the latest time the server polled the Shoutcast server (each minute). Plus, I also retrieve the duration in seconds of there listening session, plus the uid (aka session id) which is unique to each session.
What I would like to do is count the amount of listeners per hour, for example 13:00 = 20 listeners, but I would like to include not only those who are tuned in, so "created" datetime field, but also any listeners who where still listening from the previous hour, so the "update" datetime field.
What query would I need to achieve this. I would only generate 1 days worth of results at a time.
I understand how it would use something similar to "COUNT(id) AS hits" and "GROUP BY", but I'm not sure how to factor in the datetime fields, as the "update" datetime field is constantly updated, as long as the user is still listening. And some users can remain listening for 3 hours+.
Edit
The the main parts of the database schema is: id (int 20), created (datetime), updated (datetime), uid (int 20), duration (int 10).
The desired result would look something similar to:
(Time / hits) 0900 => 10, 1000 => 15, 1100 => 5, 1200 => 8, 1300 => 25
and so on...
This is a query I've used to filter results by country, which uses group by and count():-
SELECT country, COUNT(id) AS hits FROM listeners_log WHERE YEAR(created) = YEAR(NOW()) AND MONTH(created) = MONTH(NOW()) AND duration >= 60
The query also has an added filter on the end, to filter out session that are less than 60 seconds long.
To elaborate a little more, the created field reflects when a user connects/starts a listening session. For example, they tune in at 2016-03-21 15:00:00 that is reflected in the created field. But if they're still listening in 1 hour's time, the update field will read 2016-03-21 16:00:00, but the created field will remain the same.
Update:
I've come up with the following SQL, but this only counts the inital connection, indicated by the created field, and ignores if a use remains connected from one hour to a next.
SELECT HOUR(created), COUNT(id) AS hits FROM listeners_log WHERE DATE(created) = CURDATE() group by HOUR(created)
So you will be only needing to query on 'update' datetime field, because even if a new user comes its entry gets created with 'created' datetime as well as in next update under 'update' datetime, so now your concern will be only to query on 'update' datetime for any particular hour.
SELECT COUNT(id) FROM TABLE_NAME
WHERE update_Field BETWEEN DATE(NOW() - INTERVAL 1 HOUR);
This is a sample query to help you out, I don't have schema now to test it.
Modify it in your way and
Let me know if it doesn't work.
I have a structural MySQL question about storing events in a database with dates.
Say that an organiser would select a range of dates, eg:
["19/12/2014","20/12/2014","26/12/2014","27/12/2014","02/01/2015","03/01/2015","09/01/2015","10/01/2015"]
The event needs to be saved into a table, I'm thinking about creating a many-to-many table with the structure:
event_id | start_date | end_date
Now when thinking about it, this would mean that I'd need to convert the date array into an array of object with beginning - end date.
Now the alternative would be to just create a table that looks like this:
event_id | event_date
And create a separate record for every date.
The purpose is obviously to check which events should be sent back to the client within a given date range.
Which of the two options seems to common / viable?
It is pretty crucial for the setup.
Depends. If the first event ends on the date of the second event, you can go with event_id | event_date, but otherwise I'd go with the first option.
If you don't have the end date somehow, then how will you be able to tell the client the range of dates for the event?
I would go with setup that contains event duration (in seconds) - it's flexible.
event_id (int) | start_date (datetime) | duration (int)
In this case when event duration does not matter - put 0 there in other case just put the number o seconds so you will be able to store event which lasts days or just a few hours or minutes.
(sorry for bad english and poor skill)
Hello! I've got a mysql database which contains four columns and a cron job as a script, which requesting a status of a user every 10 minutes.
DataBase columns:
ID UID STATUS CHECK_AT
ID - just a sequence number (1,2,3 and so on). Each time a script writing something into the DB, the number grows up.
UID - Key value. Let's say it's ID of a user. All DB contains about 3-5 differents UID
STATUS - with values 1 or 0. Let's say 1 is online, 0 is offline. Online status timeout is 10 minutes.
CHECK_AT - Time and date of script work, like 2013-10-01 00:30:01
Logic: every 10 minutes script is checking specific UIDs (written in other table) for online (1) or offline (0).
What I;m trying to do:
To output summary online time of specific UIDs for a day; week; month etc
I guess it should be elementary, like
select count(id) from DB_NAME where date(check_at) = '2013-10-01';
for a one day
select count(uid) from user_activity where date(check_at) between '2013-10-01' and '2013-10-07';
For a few days and so on.
But, my skill is to low to know, how I can count only online time (status=1) for a date.
Can you give me some advices, please?
you could add your conditions in WHERE clause like:
select count(id) from your_table where date(check_at) = '2013-10-01' AND status = 1;
OR
select count(uid) from user_activity where
date(check_at) between '2013-10-01' AND '2013-10-07'
AND status = 1;
I'm working on a module where the system would be able to determine where the logs of a flexi-time schedule belong...
Here's what I'm trying to do. I have a table called office_schedule with fields and values:
emp_ID time_in time_out
1 8:00:00 9:00:00
1 9:30:00 12:00:00
1 13:30:00 17:00:00
The example table Above 'office_schedule' Contains the values of schedule of a single employee in a single day. Given that I have another table called 'office_logs' with a value:
emp_ID log_in log_out
1 8:40:00 11:30:00
I searching for a query that would take the employee's logs and try to determine which value in 'office_schedule' table the logs belong to, by calculating the most value of time it has covered.
for example, if I query using the logs in 'office logs' table, it would match the second value of 'office_schedule' table, because the logs cover more span of time in the 'office_schedule' table's second value than the others.
i hope this is understandable enough.
please help...
Assuming the time cells are defined as TIME and not as VARCHAR, I would try something like that (but maybe there is a better way):
SELECT * FROM `office_logs` as log LEFT JOIN `office_schedule` AS sched ON log.`emp_ID` = sched.`emp_ID` WHERE log.`emp_ID` = 1 ORDER BY (ABS(sched.`Time_in` - log.`log_in`) + ABS(sched.`Time_out` - log.`log_out`)) ASC LIMIT 1;
It calculates the absolute difference between the log in and log out times of an employee to each of his scheduled time in and time out. The return is ordered by the smallest difference.
Maybe this helps.
please I need help with this (for better understanding please see attached image) because I am completely helpless.
As you can see I have users and they store their starting and ending datetimes in my DB as YYYY-mm-dd H:i:s. Now I need to find out overlaps for all users according to the most frequent time range overlaps (for most users). I would like to get 3 most frequented datatime overlaps for most users. How can I do it?
I have no idea which mysql query should I use or maybe it would be better to select all datetimes (start and end) from database and process it in php (but how?). As stated on image results should be for example time 8.30 - 10.00 is result for users A+B+C+D.
Table structure:
UserID | Start datetime | End datetime
--------------------------------------
A | 2012-04-03 4:00:00 | 2012-04-03 10:00:00
A | 2012-04-03 16:00:00 | 2012-04-03 20:00:00
B | 2012-04-03 8:30:00 | 2012-04-03 14:00:00
B | 2012-04-06 21:30:00 | 2012-04-06 23:00:00
C | 2012-04-03 12:00:00 | 2012-04-03 13:00:00
D | 2012-04-01 01:00:01 | 2012-04-05 12:00:59
E | 2012-04-03 8:30:00 | 2012-04-03 11:00:00
E | 2012-04-03 21:00:00 | 2012-04-03 23:00:00
What you effectively have is a collection of sets and want to determine if any of them have non-zero intersections. This is the exact question one asks when trying to find all the ancestors of a node in a nested set.
We can prove that for every overlap, at least one time window will have a start time that falls within all other overlapping time windows. Using this tidbit, we don't need to actually construct artificial timeslots in the day. Simply take a start time and see if it intersects any of the other time windows and then just count up the number of intersections.
So what's the query?
/*SELECT*/
SELECT DISTINCT
MAX(overlapping_windows.start_time) AS overlap_start_time,
MIN(overlapping_windows.end_time) AS overlap_end_time ,
(COUNT(overlapping_windows.id) - 1) AS num_overlaps
FROM user_times AS windows
INNER JOIN user_times AS overlapping_windows
ON windows.start_time BETWEEN overlapping_windows.start_time AND overlapping_windows.end_time
GROUP BY windows.id
ORDER BY num_overlaps DESC;
Depending on your table size and how often you plan on running this query, it might be worthwhile to drop a spatial index on it (see below).
UPDATE
If your running this query often, you'll need to use a spatial index. Because of range based traversal (ie. does start_time fall in between the range of start/end), a BTREE index will not do anything for you. IT HAS TO BE SPATIAL.
ALTER TABLE user_times ADD COLUMN time_windows GEOMETRY NOT NULL DEFAULT 0;
UPDATE user_times SET time_windows = GeomFromText(CONCAT('LineString( -1 ', start_time, ', 1 ', end_time, ')'));
CREATE SPATIAL INDEX time_window ON user_times (time_window);
Then you can update the ON clause in the above query to read
ON MBRWithin( Point(0,windows.start_time), overlapping_windows.time_window )
This will get you an indexed traversal for the query. Again only do this if your planning on running the query often.
Credit for the spatial index to Quassoni's blog.
Something like this should get you started -
SELECT slots.time_slot, COUNT(*) AS num_users, GROUP_CONCAT(DISTINCT user_bookings.user_id ORDER BY user_bookings.user_id) AS user_list
FROM (
SELECT CURRENT_DATE + INTERVAL ((id-1)*30) MINUTE AS time_slot
FROM dummy
WHERE id BETWEEN 1 AND 48
) AS slots
LEFT JOIN user_bookings
ON slots.time_slot BETWEEN `user_bookings`.`start` AND `user_bookings`.`end`
GROUP BY slots.time_slot
ORDER BY num_users DESC
The idea is to create a derived table that consists of time slots for the day. In this example I have used dummy (which can be any table with an AI id that is contiguous for the required set) to create a list of timeslots by adding 30mins incrementally. The result of this is then joined to bookings to be able to count the number of books for each time slot.
UPDATE For entire date/time range you could use a query like this to get the other data required -
SELECT MIN(`start`) AS `min_start`, MAX(`end`) AS `max_end`, DATEDIFF(MAX(`end`), MIN(`start`)) + 1 AS `num_days`
FROM user_bookings
These values can then be substituted into the original query or the two can be combined -
SELECT slots.time_slot, COUNT(*) AS num_users, GROUP_CONCAT(DISTINCT user_bookings.user_id ORDER BY user_bookings.user_id) AS user_list
FROM (
SELECT DATE(tmp.min_start) + INTERVAL ((id-1)*30) MINUTE AS time_slot
FROM dummy
INNER JOIN (
SELECT MIN(`start`) AS `min_start`, MAX(`end`) AS `max_end`, DATEDIFF(MAX(`end`), MIN(`start`)) + 1 AS `num_days`
FROM user_bookings
) AS tmp
WHERE dummy.id BETWEEN 1 AND (48 * tmp.num_days)
) AS slots
LEFT JOIN user_bookings
ON slots.time_slot BETWEEN `user_bookings`.`start` AND `user_bookings`.`end`
GROUP BY slots.time_slot
ORDER BY num_users DESC
EDIT I have added DISTINCT and ORDER BY clauses in the GROUP_CONCAT() in response to your last query.
Please note that you will will need a much greater range of ids in the dummy table. I have not tested this query so it may have syntax errors.
I would not do much in SQL, this is so much simpler in a programming language, SQL is not made for something like this.
Of course, it's just sensible to break the day down into "timeslots" - this is statistics. But as soon as you start handling dates over the 00:00 border, things start to get icky when you use joins and inner selects. Especially with MySQL which does not quite like inner selects.
Here's a possible SQL query
SELECT count(*) FROM `times`
WHERE
( DATEDIFF(`Start`,`End`) = 0 AND
TIME(`Start`) < TIME('$SLOT_HIGH') AND
TIME(`End`) > TIME('$SLOT_LOW'))
OR
( DATEDIFF(`Start`,`End`) > 0 AND
TIME(`Start`) < TIME('$SLOT_HIGH') OR
TIME(`End`) > TIME('$SLOT_LOW')
Here's some pseudo code
granularity = 30*60; // 30 minutes
numslots = 24*60*60 / granularity;
stats = CreateArray(numslots);
for i=0, i < numslots, i++ do
stats[i] = GetCountFromSQL(i*granularity, (i+1)*granularity); // low, high
end
Yes, that makes numslots queries, but no joins no nothing, hence it should be quite fast. Also you can easily change the resolution.
And another positive thing is, you could "ask yourself", "I have two possible timeslots, and I need the one where more people are here, which one should I use?" and just run the query twice with respective ranges and you are not stuck with predefined time slots.
To only find full overlaps (an entry only counts if it covers the full slot) you have to switch low and high ranges in the query.
You might have noticed that I do not add times between entries that could span multiple days, however, adding a whole day, will just increase all slots by one, making that quite useless.
You could however add them by selecting sum(DAY(End) - DAY(Start)) and just add the return value to all slots.
Table seems pretty simple. I would keep your SQL query pretty simple:
SELECT * FROM tablename
Then when you have the info saved in your PHP object. Do the processing with PHP using loops and comparisons.
In simplest form:
for($x, $numrows = mysql_num_rows($query); $x < $numrows; $x++){
/*Grab a row*/
$row = mysql_fetch_assoc($query);
/*store userID, START, END*/
$userID = $row['userID'];
$start = $row['START'];
$end = $row['END'];
/*Have an array for each user in which you store start and end times*/
if(!strcmp($userID, "A")
{
/*Store info in array_a*/
}
else if(!strcmp($userID, "B")
{
/*etc......*/
}
}
/*Now you have an array for each user with their start/stop times*/
/*Do your loops and comparisons to find common time slots. */
/*Also, use strtotime() to switch date/time entries into comparable values*/
Of course this is in very basic form. You'll probably want to do one loop through the array to first get all of the userIDs before you compare them in the loop shown above.