I have table in my mysql as follows :
+----+---------------+--------------+----------------------+---------------------+
| id | report_name | report_id | rinterval | last_run |
+----+---------------+--------------+----------------------+---------------------+
| 1 | test report 1 | 434234234234 | every morning | 2016-05-20 12:55:07 |
| 2 | test report 2 | 3434232 | every sunday morning | 2016-05-20 12:55:07 |
| 3 | test report 3 | 342423423 | never | 2016-05-20 12:55:07 |
| 4 | test report 4 | 4324234 | every morning | 2016-05-20 12:55:07 |
+----+---------------+--------------+----------------------+---------------------+
I am trying to create a php script (preferably) that when is called, runs the appropriate report. I would like some suggestions on the best way to do that.
Let's assume I set a cron job to call the script every morning, and the intervals are as above (+similar): (every morning, every sunday, twice a month, etc). Also let's assume that the scripts should not run automatically if less than 24 hours have passed. Also a manual call can be initiated.
I was thinking something like this :
Call script
Find what day and time it is
Find what day it is
Select * from above where day and time more than 24 hours
Iterate the above records and run the reports (report is run like http://example.com/report_name/report_id)
If rinterval reads "every morning" - run the report
If rinterval reads "every Sunday morning" - run the report if it is Sunday (and similar for other days using a case)
If rinterval reads "never" - do not run the report
if rinterval reads "twice a month" - find the last day run and see if the interval is more than 15 - if yes run it. (or similar)
In all the above cases, on succesfull run, update the last_run timestamp.
One of my problems is what happens if I run a manual call - or if I want to run 2 manual calls 2 minutes apart for testing. If I run the report say on Monday afternoon, I still want it to be run on Tuesday morning. Should I introduce another column that indicates if this is manual call or automatic? I still need to know that the run was made if it is manual, but do not want to break the schedule as the report must be run before 08.00 in the morning.
What are your thoughts? I am sure there is a more efficient way to do this. I am open to all suggestions, I am doing this from scratch.
One of my problems is what happens if I run a manual call - or if I want to run 2 manual calls 2 minutes apart for testing
That's depends on what your scripts are doing.
I still need to know that the
Then simply keep log of invocations (separate table) with i.e. name of script, date of invocation, way of invocation.
Related
I'm trying to write a function to move an scheduled task. The schedule can not overlap with any other event. My user inputs are as follows:
schedule_id (int)
new_start_time (DATETIME)
My table structure is as follows:
Schedules
| schedule_id | start_time | end_time | task_id
| 1 | 2015-12-21 02:00:00 | 2015-12-21 04:00:00 | 1
| 2 | 2015-12-21 08:30:00 | 2015-12-21 09:30:00 | 1
| 3 | 2015-12-22 01:00:00 | 2015-12-22 02:00:00 | 2
Tasks
| task_id | name | max_duration
| 1 | do things | 2
| 2 | do stuff | 1
A user has between start_time and end_time to start a "task". The user can not begin the "task" until that window. Once that user begins the task they have whatever the max_duration for that task ID is to complete it. There is also a 15 minute window to set up for the next task. That means a user who starts a task 1 second before the end of the window still has max_duration amount of time to complete the task. Therefore the "actual window" that nothing can be scheduled in is start_time to (end_time+max_duration+15). I would like to move an event (or insert a new one) but I must check for overlaps. Essentially I must ensure:
Does the start_time from user input run into any other schedule's end_time+max_duration+15?
Does the end_time+max_duration+15 run into any other schedule's start time. end_time is simply obtained by taking the new start_time and adding the original duration (end_time = (orig_end_time-orig_start_time)+start_time
For example, the above table is valid for schedule_id's 1 and 2 because a user can start any time between 2:00 and 4:00. Assuming he starts right at the end, 3:59:59 the event will last at max until 5:59:59. Even with the cleanup window of 15 minutes this still leads to 6:14:59 and since the next schedule starts at 8:30 this is ok.
I've been wrapping my head around this for hours. I would like to do it in pure MySQL however I am considering using PHP if I really have to. Even in PHP this problem seems difficult. Sure I could grab every schedule with a start time a day or two earlier and an end time a day or two later then compare my interval but that seems very hacky.
Any ideas?
I have a mysql queue that manages tasks for several php workers that run every minute via cron job.
I'll simplify everything to make it more understandable.
For the mysql part I have 2 tables:
worker_info
worker_id | name | hash | last_used
1 | worker1 | d8f9zdf8z | 2014-03-03 13:00:01
2 | worker2 | odfi9dfu8 | 2014-03-03 13:01:01
3 | worker3 | sdz7std74 | 2014-03-03 13:02:03
4 | worker4 | duf8s763z | 2014-03-03 13:02:01
...
tasks
task_id | times_run | task_id | workers_used
1 | 3 | 2932 | 1,6,3
2 | 2 | 3232 | 6,8
3 | 6 | 5321 | 3,2,6,10,5,20
4 | 1 | 8321 | 3
...
Tasks is a table to keep track of the tasks:
task_id identifies each task, times_run is the number of times a task has been successfully executed. task_id is a number the php script needs for its routines.
workers_used is a text field that holds the ids of all worker_infos that have been processed for this task. I don't want the same worker_info multiple times per task, only one time.
worker_info is a table that holds some infos the php script needs to do its job along with last_used which is a global indicator for when this worker was last used.
Several php scripts work on the same tasks and I need the values to be precise as each worker_info should be used only 1 time for each task.
The PHP cron jobs include all the same routines:
the script performs a mysql query to get a task.
1. SELECT * FROM tasks ORDER BY times_run ASC LIMIT 1 We are always working with 1 job at a time
The script locks the worker_info table to avoid that one worker_info gets selected multiple times from a tasks query
2. LOCK TABLES worker_info WRITE
Then it gets a list of all worker_infos not used for this task, sorted by last_used
3. SELECT * FROM worker_info WHERE worker_id NOT IN($workers_used) ORDER BY last_used ASC LIMIT 1
Then it updates the last_used parameter so the same worker_info won't get selected in the meantime when the task still runs
4. UPDATE workder_info Set last_used = NOW() WHERE worker_id = $id
Finally the lock gets released
5. UNLOCK TABLES
The php script performs its routines and if the task was successful it gets updated
6. UPDATE tasks SET times_run = times_run + 1, workers_used = IF(workers_used = '', '$worker_id', CONCAT(workers_used,', $worker_id')) I know it's very bad practice to perform the workers_used this way not using a second table to declare the dependencies but I'm a bit scared of the space it would take.
One Task can have several thousand workers_used and I have several thousand tasks themselves. This way the table would quickly become bigger than 1 million entries and I fear that this could slow down things a lot so I went with this way of storage.
Then the script performs step 2-6 10 times for each task before going back to step 1 selecting a new task and doing everything again.
Now this setup has served me well for about one year but now that I need to have 50+ php scripts active on this queue system, I get more and more problems in terms of performance.
PHP queries take up to 20 seconds and I cannot scale everymore like I need, if I just run more PHP scripts, the mysql server crashes.
I want no data loss if the system crashes, therefore I'm writing every change into the db as it happens. Also when I created the system I had problems with the workers_used because when 10 php scripts work on 1 task it occured very often that one worker_info data was used multiple times in the same task which I do not want.
Therefore I introduced the LOCK which fixed this but I suspect it to be the bottleneck of the system. If one worker locks the table to perform its actions, all other 49 php workers need to wait for that which is bad.
Now my questions are:
Is this implementation even good? Should I stick to it or throw it over and do something else?
Is this LOCK even my problem or does something else might slow down the system?
How can I improve this setup to make it a lot faster?
//Edit As suggested by jeremycole:
I suppose I need to update the worker_info table in order to implement the changes:
worker_info
worker_id | name | hash | tasks_owner | last_used
1 | worker1 | d8f9zdf8z | 1 | 2014-03-03 13:00:01
2 | worker2 | odfi9dfu8 | NULL | 2014-03-03 13:01:01
3 | worker3 | sdz7std74 | NULL | 2014-03-03 13:02:03
4 | worker4 | duf8s763z | NULL | 2014-03-03 13:02:01
...
And then change the routine to:
SET autocommit=0 Set autocommit to 0 so the queries won't get autocommitted
1. SELECT * FROM tasks ORDER BY times_run ASC LIMIT 1 Select a Task to process
2. START TRANSACTION
3. SELECT * FROM worker_info WHERE worker_id NOT IN($workers_used) AND tasks_owner IS NULL ORDER BY last_used ASC LIMIT 1 FOR UPDATE
4. UPDATE worker_info SET last_used = NOW(), tasks_owner = $task_id WHERE worker_id = $worker_id
5. COMMIT
Do PHP routine and if successful:
6. UPDATE tasks SET times_run = times_run + 1, workers_used = IF(workers_used = '', '$worker_id', CONCAT(workers_used,', $worker_id'))
That should be it or am I wrong at some point?
Is the tasks_owner really needed or would it be sufficient to change the last_used date?
It may be useful to read my answer to another question about how to implement a job queue in MySQL here:
MySQL deadlocking issue with InnoDB
In short, using LOCK TABLES for this is quite unnecessary and unlikely to yield good results.
In the project (in codeigniter) I am working, a user can create a task and set its repeat mode as (Once/Daily/Weekly) where
Daily - Task will appear for the same time everyday in future
Weekly - Task will appear every Monday (say if task is being added on Monday)
Once - Task will get added only for today
Now every task created by user creates a record in database,
For example, suppose a task is created today(13-01-2014) from 2:00-3:00 with repeat mode as Daily, this will create a record against this (13-01-2014) date but I can't add the same task at that time for all future dates.
And also user can change/edit the mode of task anytime then that should not repeat thereafter.
Can anyone plz explain me the concept of how this repeating mode works? I mean when actually to create a task for future dates, or how to maintain the same in database.
"Explain the concept of repeat mode" is a pretty vague request. However, I think I understand what piece is missing.
I assume you have some kind of taskId, which is a unique key for each task. What you need is a batchId as well. Your end result would look something like this:
+----------+----------+----------------------+
|taskId |batchId |description |
|----------|----------|----------------------|
| 1 | | Some meeting |
| 2 | | Another meeting |
| 3 | 1 | Daily meeting |
| 4 | 1 | Daily meeting |
| 5 | 1 | Daily meeting |
| 6 | 2 | Go to the gym! |
| 7 | 2 | Go to the gym! |
| 8 | 2 | Go to the gym! |
| 9 | 2 | Go to the gym! |
| 10 | | Yet another meeting |
+----------+----------+----------------------+
Having a batchId lets you group these events in the case you need to modify all the tasks at once, but still lets you modify each task individually if need be, thanks to the taskId.
The actual implementation of this batchId is up to you. For example, it can be:
a random string generated on-the-fly
a hash of the first taskId to ensure that their always unique
a foreign key in a separate table that auto-generates a batchId as its key
Use the one that best suits your needs, or make one up yourself.
I just made up taskId and batchId. Replace those with whatever makes sense to you.
so i have a database that have a set of events that are suppose to happen and end at certain time. So say my table is (the event here is an actual event, not a code thing)
Event | TimeStart | TimeEnd | Day
A | 0800 | 1400 | 2
B | 2000 | 2300 | 3
C | 1200 | 1900 | 4
What i want is that i want the event to occur IF its between the TimeEnd and TimeEnd + 3 hours. The problem that i encounter is what if its 2300? I used the days with date(N) that correspond to monday (1) - Sunday (7)
So if its Event B, i need to have a code that take the current time, reduce it by 3 hours (so that i doesn't go to day 4) then i get the TimeEnd and add on 3 hours to it.
The problem now that i tested is that How do i make Event B to be on Day 3? even if i used strtotime() Event B still shows day 4 after it passed 2400 hour.
To make the matter clearer, i am creating a voting poll that only starts after the event ended and it only last for 3 hours.
Edit found the answer, didn't know using "Monday this week" made sense for date() :D
Your representation of times in the table is kinda weird. If each event will happen only one time (doesn't repeat for instance every monday), I suggest you use unix timestamp to represent the times. Then you will have no such problems.
I have a table with 200 rows. I'm running a cron job every 10 minutes to perform some kind of insert/update operation on the table. The operation needs to be performed only on 5 rows at a time every time the cron job runs. So in first 10 mins records 1-5 are updated, records 5-10 in the 20th minute and so on.
When the cron job runs for the 20th time, all the records in the table would have been updated exactly once. This is what is to be achieved at least. And the next cron job should repeat the process again.
The problem:
is that, every time a cron job runs, the insert/update operation should be performed on N rows (not just 5 rows). So, if N is 100, all records would've been updated by just 2 cron jobs. And the next cron job would repeat the process again.
Here's an example:
This is the table I currently have (200 records). Every time a cron job executes, it needs to pick N records (which I set as a variable in PHP) and update the time_md5 field with the current time's MD5 value.
+---------+-------------------------------------+
| id | time_md5 |
+---------+-------------------------------------+
| 10 | 971324428e62dd6832a2778582559977 |
| 72 | 1bd58291594543a8cc239d99843a846c |
| 3 | 9300278bc5f114a290f6ed917ee93736 |
| 40 | 915bf1c5a1f13404add6612ec452e644 |
| 599 | 799671e31d5350ff405c8016a38c74eb |
| 56 | 56302bb119f1d03db3c9093caf98c735 |
| 798 | 47889aa559636b5512436776afd6ba56 |
| 8 | 85fdc72d3b51f0b8b356eceac710df14 |
| .. | ....... |
| .. | ....... |
| .. | ....... |
| .. | ....... |
| 340 | 9217eab5adcc47b365b2e00bbdcc011a | <-- 200th record
+---------+-------------------------------------+
So, the first record(id 10) should not be updated more than once, till all 200 records are updated once - the process should start over once all the records are updated once.
I have some idea on how this could be achieved, but I'm sure there are more efficient ways of doing it.
Any suggestions?
You could use a Red/Black system (like for cluster management).
Basically, all your rows start out as black. When you run your cron, it will mark the rows it updated as "Red". Once all the rows are red, you switch, and now start turning all the red rows to be black. You keep this alternation going, and it should allow you to effectively mark rows so that you do not update them twice. (You could store whatever color goal you want in a file or something so that it is shared between crons)
I would just run the PHP script every 10/5 minutes with cron, and then use PHP's time and date functions to perform the rest of the logic. If you cannot time it, you could store a position marking variable in a small file.