Not sure the best way to describe what I'm trying to do, so bear with me. I'm working in PHP with a mysql database.
I have a database of 10,000 records. Suppose I want to update 10000 / 365 each day throughout the year. Each record gets updated 1x per year, and nicely spread throughout the year.
One easy way to do this is to select all records, then for each, if ID % 365 = $day_of_year, update that record. I'm not worried about leap year.
Is there a way I can select only the records from the database that I need (around 27), rather than selecting all 10,000 and looping through each? This is a cron job that I will run in the middle of the night, so maybe this is a moot point. Still, it bugs me that I have to brute force my way through all 10,000. Would love to find a more elegant solution by only pulling the tiny fraction needed.
Add a column to your table indicating the day of the year the record should be updated.
Then add an event that runs once a year that resets that column values and calculates it new - 10000 records spread on the days of the year.
Then add another event that runs every night updating the records for the day.
Thanks for the response, especially Juergen and Mike Pomax. Here's the solution.
I added a new column and populated each row with a random value between 0 and 364. I could have populated it some other way but this was easy and close enough.
Here is my pseudocode:
$day_of_year = date('z');
if($day_of_year < 365){ //don't do on Dec 31 of leap year
$qry = "Select ... WHERE ID = $day_of_year";
... //do my thing
}
Works great. Thanks again.
Related
Here is the scenario.
I have a schedule job running every minute which inserts data into a MYSQL table "demo". The total number of records per day are 60*24 = 1440.
Table demo already has 55000 records.
I want to clean records less than today's date. Therefore I am using below code to do the work daily at 10.00 AM.
$demo = Demo::whereDate('created_at','<', Carbon::today());
if(count($demo) > 0)
{
$demo->delete();
}
Now a point will come where at the same time I am inserting to the same table and deleting from the same table.
I want to know that it will be safe? Or there will be an error or any other impact.
I don't think this will be an issue, since Carbon::today() returns 00:00:00 as time and you are executing the deletion job at 10:00:00. Only records that are inserted more than 10 hours ago will be deleted.
According to me, there should not be any problem. If there are two requests at the same time MySQL server will handle them by itself. The same thing happens when a website is loaded. There is a lot of call at the same time.
There is no problem. Imagine you made 1000 requests per minutes, no one of them will overwrites an other one.
How should I store Birthdate's in MySQL so that I can easily update everyone's Age on a daily basis via a Cron Job?
Does it even make sense to store the Age AND the Birthdate so that when searches involving the Age are made, I don't have to calculate each Age on-the-fly and waste CPU resources?
If so, how should I 1) store the Birthdate, and 2) calculate the Age each day?
I can imagine the daily cron script first filtering out the user's whose Birthdate month is not the current month, then filtering out the user's whose Birthdate day is not the current day, and then incrementing by one the age of each user that is left.
Does this make sense? If so, how would I do that? Is there a better way to do all of this?
The simple answer is don't; never store a persons age. It changes for each person yearly but, as you say, you have to check that it's correct for every person daily.
Only store the date of birth, and then calculate the age when selecting from the database. It's only today - date of birth so takes almost no CPUs at all.
EDIT:
To expand upon my comment in ManseUK's answer there's also the possibility of failure. What happens if your server / database is down? Or your update fails to run at its specified time? Or someone comes along and runs it manually after the update already been run for that date? Or someone turns off your scheduler? There's no danger of this happening if you calculate Age as you select from the database.
To select where age is between 25 and 30 years and assuming a DATE column dateofbirth your query would be something like:
select *
from users
where dateofbirth between date_add( curdate(), interval -30 year )
and date_add( curdate(), interval -25 year )
Ensure users is indexed on dateofbirth.
No, don't store age, just calculate it in your queries. As for the birthday, I prefer to have all my date/time in unix timestamps (because I hate to deal with portability across date-format-changing locale settings)
Does it even make sense to store the Age
No.
I don't have to calculate each Age on-the-fly and waste CPU resources?
As a matter of fact, you'd waste a zillion more "CPU resources" (of which you have too vague idea to be concerned of) with your everyday update approach.
Is there a better way to do all of this?
Store the birthdate and calculate the age at select time
what if you want to find out all the ones whose Age is greater than 25 but less than 30?
this is quite trivial query like this
WHERE birth_date BETWEEN date_sub(curdate(), INTERVAL 25 YEAR)
AND date_sub(curdate(), INTERVAL 30 YEAR)
the query would using an index (if any) and thus be blazing fast, without any [unnecessary] denormalizations
Im going to go against the majority all of the answers here.
I would store both ...
updating the age is quick and simple - a single mysql query could run every day and its done
calculating the age is time consuming when you have lots of page views - amount of times its viewed far outweighs the number of changes
Just imagine a table scenario - a table with 100 or 1000 rows that shows the age of a person ... how long is that going to take to compute ???
I always thought that Stackoverflow calculated the Reputation dynamically but you can see on the Stackoverflow data explorer that they dont - see the User object in the schema on the right. Its recorded and updated each time its changed - I would guess that this is purely because of the amount of times its viewed far outweighs the number of changes
I don't think it's totally true that computing age dynamically takes a lot of memory.
Why not create a table CALENDAR with 365 rows 1 row for each day of an year. And store a list of userid against the day corresponding to their birthday.
For each day just refer the table entry for that day and refresh the age of only those selected users.
This will reduce the complexity greatly even when the user base increases.
I am creating a system that requires a schedular for a particular task. Users may pick from times 24 hours a day, 7 days a week.
I came up with a few options for the database storage, but I don't think either one is the most efficient design, so I'm hoping for some possible alternatives that may be more efficient.
On the user side I created a grid of buttons with 2 loops to create the days, and the times, and I set each a unique value of $timeValue = "d".$j."-t".$i;
So d1-t0 will be Saturday at Midnight d3-t12= Tuesday at Noon, and so forth.
So, in the database I was first going to simply have a ID, day, time set up, but that would result in a possible 168 rows per event
Then I tried an ID, day, and time 0-23 (a column for each hour of the day) And I was simply going to have a boolean set up. 0 if not selected, 1 if it is.
This would result in 7 rows per event, but I think querying that data might be a pain.
I need to perform a few functions on this data. On each day, list the number of selected times into an array. But I don't believe having a select statement of SELECT * from schedule where time0, =1 or time1= 1 .... ect will work, nor will it produce the desired array. (times=(0,3,5,6,7...)
So, this isnt going to work well.
My overall system will need to also know every event that has each time selected for a mass posting.
"Select * from table where time = $time (0-23) and day= $day (1-7)
Do action with data...
So with this requirement, I'm going to assume that storing the times as an array within the database is likely not the most efficient way either.
So am I stuck with needing up to 168 rows of data per event, or is there a better way I am missing? Thanks
Update:
To give a little more clarity on what I need to accomplish:
Users will be creating event campaigns in which other users can bid on various time slots for something to happen. There will likely be 10-100 thousand of these campaigns at any one time and they are ongoing until the creator stops them. The campaign creators can define the time slots available for their campaign.
At the designated time each day the system will find every campaign that has an event scheduled and perform the event.
So the first requirement is to know which time slots are available for the campaign, and then I need the system to quickly identify campaigns that have an event on each hour and day and perform it automatically.
I am trying to set up a database to record the last 30 days of information for each user. The data will be recorded once a day (i.e. by a cron job) and will be the value of an item (i.e. constantly changes).
What would be the best way to structure this? I was thinking of setting a table and then just storing the 30 days in the table and deleting the 31st day as I add the new day with the cron job (and shifting all of the others up one day) but this doesn't seem very efficient..
Thanks for the help.
What you can do is store the current date with each entry, then in your cron job, delete all entries that are greater than thirty days old.
For example (with MySQL),
DELETE FROM user_statistics WHERE DATEDIFF(NOW(), date_of_record) > 30;
Store the user data with its own date and delete the oldest when you exceed your limit. No need to shift anything.
I'd log by actual date using a DATE column. You can query up "last 30 days" pretty easily in MySQL.
As for purging, the cron job can delete anything older than 30 days pretty easily as well. Or, since it's so easy to ignore anything older than 30 days, you might even choose to not delete older records (at least not every day).
I'm creating a calendar that displays a timetable of events for a month. Each day has several parameters that determine if more events can be scheduled for this day (how many staff are available, how many times are available etc).
My database is set up using three tables:
Regular Schedule - this is used to create an array for each day of the week that outlines how many staff are available, what hours they are available etc
Schedule Variations - If there are variations for a date, this overrides the information from the regular schedule array.
Events - Existing events, referenced by the date.
At this stage, the code loops through the days in the month and checks two to three things for each day.
Are there any variations in the schedule (public holiday, shorter hours etc)?
What hours/number of staff are available for this day?
(If staff are available) How many events have already been scheduled for this day?
Step 1 and step 3 require a database query - assuming 30 days a month, that's 60 queries per page view.
I'm worried about how this could scale, for a few users I don't imagine that it would be much of a problem, but if 20 people try and load the page at the same time, then it jumps to 1200 queries...
Any ideas or suggestions on how to do this more efficiently would be greatly appreciated!
Thanks!
I can't think of a good reason you'd need to limit each query to one day. Surely you can just select all the values between a pair of dates.
Similarly, you could use a join to get the number of events scheduled events for a given day.
Then do the loop (for each day) on the array returned by the database query.
Create a table:
t_month (day INT)
INSERT
INTO t_month
VALUES
(1),
(2),
...
(31)
Then query:
SELECT *
FROM t_month, t_schedule
WHERE schedule_date = '2009-03-01' + INTERVAL t_month.day DAY
AND schedule_date < '2009-03-01' + INTERVAL 1 MONTH
AND ...
Instead of 30 queries you get just one with a JOIN.
Other RDBMS's allow you to generate rowsets on the fly, but MySQL doesn't.
You, though, can replace t_month with ugly
SELECT 1 AS month_day
UNION ALL
SELECT 2
UNION ALL
...
SELECT 31
I faced the same sort of issue with http://rosterus.com and we just load most of the data into arrays at the top of the page, and then query the array for the relevant data. Pages loaded 10x faster after that.
So run one or two wide queries that gather all the data you need, choose appropriate keys and store each result into an array. Then access the array instead of the database. PHP is very flexible with array indexing, you can using all sorts of things as keys... or several indexes.