I have to work on CRON which will be sending email to subscriber weekly on the day they get subscribed. For example if user A subscribed on Thursday and user B subscribed on Wednesday then user A will get mail on every Thursday and user B on every Wednesday.
Now my approach will be following:
1- First get the day of the week of current(TODAY) date and assign in a variable
2- Running the SELECT query and fetch all subscriber IDs who's subscription day's is similar to the day of Today's Date. I am planning to use MYSQL's dayofweek() to extract day from Week,
3- Once getting all IDs then send last 7 day activities to those subscribers via email.
Thing thing which is making me a bit puzzled is DAYOFWEEK() function which column based and looks costly. What alternative would you suggest?(Assuming the table would have lots of data)
Per-row functions rarely scale well as the database table grows.
The first thing you should do is make sure there's actually a performance problem to solve. Always start with third normal form and regress only if you find such a problem, otherwise your effort is wasted. It may be that the speed is not that bad in which case stick with 3NF.
If it turns out there is a performance problem, one way to solve it is to add and indexed column called weekday that will hold the day of the week the user subscribed.
This is technically breaking 3NF since that attribute is dependent on the date of subscription which is unlikely to be part of the key. It may also come to disagree with that subscription date if you update one or the other independently.
But you can mitigate the problem by having an insert/update trigger which forces the weekday column to agree with the subscription date, ensuring that they never disagree.
Then your query simply becomes something like:
dow = Now.dayOfWeek()
rowSet = executeQuery ("select sub_id from subscribers where weekday = ?", dow)
and then processing each of those subscribers (or as one big honkin' query if you wish).
The fact that you're not having to retrieve every row to do a getWeekDay (subscription_date) and filter the rows should massively improve the query speed.
The vast majority of databases are read far more often than written and, by shifting the cost of the calculation to the insert/update, you effectively amortise that cost over all selects.
Assuming your subscribers subscribe for more than a week (since you send out their stuff once a week), that will be more efficient than calculating on the select.
And, although this takes up more space in your table (due to the extra column and index), have a look at the ratio of "My query isn't fast enough" questions compared to "My database is too big" questions. The former far outweigh the latter.
Related
I am creating a system that requires a schedular for a particular task. Users may pick from times 24 hours a day, 7 days a week.
I came up with a few options for the database storage, but I don't think either one is the most efficient design, so I'm hoping for some possible alternatives that may be more efficient.
On the user side I created a grid of buttons with 2 loops to create the days, and the times, and I set each a unique value of $timeValue = "d".$j."-t".$i;
So d1-t0 will be Saturday at Midnight d3-t12= Tuesday at Noon, and so forth.
So, in the database I was first going to simply have a ID, day, time set up, but that would result in a possible 168 rows per event
Then I tried an ID, day, and time 0-23 (a column for each hour of the day) And I was simply going to have a boolean set up. 0 if not selected, 1 if it is.
This would result in 7 rows per event, but I think querying that data might be a pain.
I need to perform a few functions on this data. On each day, list the number of selected times into an array. But I don't believe having a select statement of SELECT * from schedule where time0, =1 or time1= 1 .... ect will work, nor will it produce the desired array. (times=(0,3,5,6,7...)
So, this isnt going to work well.
My overall system will need to also know every event that has each time selected for a mass posting.
"Select * from table where time = $time (0-23) and day= $day (1-7)
Do action with data...
So with this requirement, I'm going to assume that storing the times as an array within the database is likely not the most efficient way either.
So am I stuck with needing up to 168 rows of data per event, or is there a better way I am missing? Thanks
Update:
To give a little more clarity on what I need to accomplish:
Users will be creating event campaigns in which other users can bid on various time slots for something to happen. There will likely be 10-100 thousand of these campaigns at any one time and they are ongoing until the creator stops them. The campaign creators can define the time slots available for their campaign.
At the designated time each day the system will find every campaign that has an event scheduled and perform the event.
So the first requirement is to know which time slots are available for the campaign, and then I need the system to quickly identify campaigns that have an event on each hour and day and perform it automatically.
I'm working on a calendar application where you can set events which can last multiple days. On a given date there can be multiple events, already started or starting that day. The user will view everything as a calendar.
Now, my question is: which of the following is the right approach? Is there an even better approach?
1) use 2 tables, events and events_days and store the days of every event in the event_days table
2) use just one table with the events stored with a date_from field and a date_to field and generate every time with mysql and php a list of days for the desired range with the events
Obviously the first option will require much more db storage, while the second one will require a bigger work from the server to generate the list (every single time the user asks for it).
The db storage shouldn't be a problem for now, but i don't know if will be the same in the future. And i fear the second option will need too many resources.
I have used both approaches. Here is a list of pros and cons that I have noticed:
Two tables: events(id) and events_dates(eventid, date)
Pros:
Query to check if there are events on a given date or between given dates is trivial:
SELECT eventid FROM events_dates WHERE date BETWEEN '2015-01-01' AND '2015-01-10'
Query to select the list of dates for an event is trivial
SELECT date FROM events_dates WHERE eventid = 1
Cons:
While selecting the list of dates is trivial, inserting the list of dates for an event requires scripting (or a table-of-dates)
Additional measures required to make sure data remains consistent, for example, when inserting an event spanning three days you need four insert queries
This structure not suitable in situations where time is involved, for example, meetings schedule
One table: events(id, start, end)
Cons:
Query to check if there is are events on a given date or between given dates is tricky.
Query to select the list of dates for an event is tricky.
Pros:
Inserting an event is trivial
This structure suitable in situations where time is involved
I have a website where people record (or log) the distance of their runs. I want to create a leader board that will automatically reset to zero after the month is over. If I could save their total distance for that month as well, that would be ideal. Every run is tied to IDs and I have a variable that adds up the monthly distance, but obviously when they log a new run that changes. I don't know how I would make it record this month only and not freak out if they log in advance.
Any help would be appreciated.
I have tried making a monthly distance value for MySQL so each time they log, if it is that month, it will add to it. But how should I make it reset?
You're tracking the date of each run, right? Then you should be able to do something like this and avoid having to store the totals:
SELECT SUM(Runs.Miles) as MonthlyTotal
FROM Runs
where MONTH(Runs.Date) = MONTH(CURDATE()) and YEAR(Runs.Date) = YEAR(CURDATE())
(Presumably you'd also filter by or group by the UserID. Also, I suspect it may be more efficient to pre-calculate the beginning and end of the month and use BETWEEN in the where clause. Consult our friend EXPLAIN.)
If your read load is too high to make this practical, you could store the monthly total in a different table and update it with a trigger every time a run is added or updated, and via cron at the start of every month. This kind of de-normalization always makes things more complicated, so I avoid it until I have reason to believe it's necessary.
I am developing a website which will have 200.000 pages. There is also a browse section, which shows most popular, highest rated etc. documents. However this section will become almost static couple of weeks later, after launch. So I also would like to implement a filtering system which will show today's, this week's, this month's most popular items, just like youtube.
Just like this:
http://www.youtube.com/videos?c=2
How should I implement this function? Do I need another table, which will have a new entry for every document each day?
docid, date, view_count, rating
So I will get today's row for filtering by using a day, or calculate a week (7 rows) for filtering by using week? It seems not efficient. Do you have any suggestions?
I am using LAMP stack by the way.
Thanks,
Assuming you timestamp the records in your table, you should be able to put a where clause that limits the timestamp to whatever timeframe you want.
You can cache the result, especially the longer ones, for long enough to make the request inconsequential.
EDIT
But perhaps you mean most popular today, not most popular that was added today?
In which case, I don't have an answer.
The most direct approach is to save the timestamp and the resource id each time the resource is shown in recent_views(what, when). Daily/weekly/monthly charts can be created with appropriate WHERE clauses like WHERE when > $beginOfPeriod AND when < $endOfPeriod.
For performance reasons you can aggregate the values each night, save the sums in separate tables like daily_views(what, sum) and truncate the source table.
I guess I would calculate the date's in code and then pass them as arguments, to the SQL you are using.
I would do it using a compiler. Youtube probably does that too, considering the amount of traffic and the response times.
The principle is easy to understand. You log every every view or rating in a page_view table. You define periods at which the compilation occurs (hourly, daily, weekly, monthly). Every time you hit the good time (e.g.: end of the day), you execute the compiler, which essentially execute a query à-la...
SELECT * FROM page_view WHERE date > $from_date AND date < $to_date
... and store the result. This probably works better in a cron job.
The next time you need to display the information, you can just fetch the stored result and display it without re-computation. There are a variety of storage methods you can use: a MySQL table (e.g.: page_view_compiled), memcached, etc.
So, I've previously developed an employee scheduling system in php. It was VERY inefficient. When I created a new schedule, I generated a row in a table called 'schedules' and, for every employee affected by that schedule, I generated a row in a table called 'schedule_days' that gave there start and stop time for that specific date. Also, editing the schedules was a wreck too. On the editing page, I pulled every user from the database from the specific schedule and printed it out on the page. It was very logical, but it was very slow.
You can imagine how long it takes to load around 15 employees for a week long schedule. That would be 1 query for the schedule, 1 query for each user, and 7 queries for each day for every user.. If I have 15 users thats too many queries. So I'm simply asking, whats someone else's view on the best way to do this?
For rotation based schedules, you want to use an exclusion based system. If you know that employee x works in rotation y within date range z, then you can calculate the individual days for that employee on the fly. If they're off sick/on course/etc., add an exclusion to the employee for that day. This will make the database a lot smaller than tracking each day for each employee.
table employee {EmployeeID}
table employeeRotations {EmployeeRotationID, EmployeeID, RotationID, StartDate, EndDate}
table rotation {RotationID, NumberOfDays, StartDate}
table rotationDay {RotationDayID, RotationID, ScheduledDay, StartTime, EndTime}
table employeeExceptions {EmployeeExceptionID, ExceptionDate, ExceptionTypeID (or whatever you want here)}
From there, you can write a function that returns On/Off/Exception for any given date or any given week.
Sounds like you need to learn how to do a JOIN rather than doing many round trips to the server for each item.