Time Rescheduling Logic - php

I am working on a scheduler-like code (in PHP if that matters) and encountered an interesting thing: it's easy to reschedule a recurring task, but what if, for some reason, it was run significantly later, than it was supposed to?
For example, let's say a job needs to run every hour and it's next scheduled run is 13.05.2021 18:00, but it runs at 13.05.2021 20:00. Now normal rescheduling logic will be taking the original scheduled time and adding recurrence frequency (1 hour in this case), but that would make the new time 13.05.2021 19:00, which can cause to run this job twice. We could, theoretically, use the time for "last run" but it can be something like 13.05.2021 20:03, which would make new time 13.05.2021 21:03.
Now my question is: what logic can we use so that in this case next time would be 13.05.2021 21:00? I've tried googling something like this, but was not able to find anything. And I do see, that Event Scheduler in Windows, for example, does reschedule jobs in a way, that I want to do that.

I actually found a pretty easy way to do what I needed, so posting it as an answer.
If we have a value of frequency in seconds (in my case, at least) and we have the original nextrun, which is when a task was supposed to be run initially, then the logic is as follows:
We need to get current time (time(), UTC_TIMESTAMP() or whatever).
We need to compare current time against nextrun and get the difference between them in seconds.
We then calculate how many iterations of the task could have been completed in the amount of those seconds by dividing the time difference by frequency.
We round up the resulting value (ceil()). If we have a value lower than 1, we may want to sanitize it.
We multiply this rounded up value by frequency, which will give us a different result than on step 2, which is the salt of this method.
We add the resulting number of seconds to nextrun.
And that's it. This does not guarantee, that you won't ever have a task run twice, if it ended just a few seconds before the time value on step 6, but to my knowledge MS Event Scheduler has the same "flaw".
Since I am doing this calculation in SQL, here's how this would look in SQL (at least for MySQL/MariaDB):
UPDATE `cron__schedule` SET `nextrun`=TIMESTAMPADD(SECOND, IF(CEIL(TIMESTAMPDIFF(SECOND, `nextrun`, UTC_TIMESTAMP())/`frequency`) > 0, CEIL(TIMESTAMPDIFF(SECOND, `nextrun`, UTC_TIMESTAMP())/`frequency`), 1)*`frequency`, `nextrun`)
To explain by referencing the logic above:
UTC_TIMESTAMP()
TIMESTAMPDIFF(SECOND, `nextrun`, UTC_TIMESTAMP()) - time comparison in seconds.
TIMESTAMPDIFF(...)/`frequency`
CEIL(...) to round up the value. IF(...) is used to sanitize, since we can get 0 seconds, that will result in us not changing the time, at all.
CEIL(...)*`frequency`
TIMESTAMPADD(...)
I do not like having to use TIMESTAMPDIFF(...) twice because of IF(...), but I do not know a way to avoid that without moving to a stored procedure, which feels like an overkill. Besides, as far as I know, MySQL should calculate this value only once regardless. But, if someone can advise me on a cleaner approach, I'll update the answer.

There isn't a right or wrong in this situation, it really depends on your business logic and how you want to build this.
WordPress and Drupal, two of the largest CMSs out there have faced this problem, too, which boils down to "poor man's cron" versus "system cron". For a "poor man's cron", these systems rely on someone hitting the website in order to "wake" the scheduler up, and if no one visits your site in a month, your tasks don't run, either. Both of these systems instead recommend using the system's cron to be more consistent and "wake up" the scheduler at certain intervals. I would encourage you to explore this in your system, too.
The next problem is, how are you storing your recurrence? Do you have (effectively) a table with every possible run time? So for an hourly run there's 24 entries? Or is there just a single task that has an ideal run date/time? The latter is generally easier to control compared to the former which has a lot of duplicated data being stored.
Then, do tasks reschedule themselves, does the scheduler do that, or is there a middle ground where the scheduler asks the task for the next best run? Figuring this out is very important and there's some nuances.
Another thing to think about, what happens if a task runs earlier than planned? For instance, does the world break if a task runs as 01:00 and 01:15, or is it just sub-optimal.
Generally when I build these types of systems, my tasks conform to a pattern (interface in OOP) and support a "next run time". The scheduler pulls all of the tasks from a data store that have an expired "next run time" and runs them. Doing this, there's no chance for a single task to exist at both 01:00 and 02:00 because it will only exist in the data store once, for instance at 01:00. If the scheduler then wakes up at 01:15, it finds the 01:00 task which has expired and runs it, and then it asks the task for the next run. The task looks at the clock (or time as provided by the scheduler if you are running in a distributed environment) and the task performs its own logic to determine that. If the logic is every hour, you can add 60 minutes from "now" and then remove the minutes portions, so 01:15 becomes 02:00.
Throw some exception handling and possibly database transactions into this mix to guarantee that a task can't fail but still get rescheduled, too.

Related

What is the most efficient way to record JSON data per second

Reason
I've been building a system that pulls data from multiple JSON sources. The data being pulled is constantly changing and I'm recording what the changes are to a SQL database via a PHP script. 9 times out of 10 the data is different and therefore needs recording.
The JSON needs to be checked every single second. I've been successfully using a cron task every minute with a PHP function that loops 60 times over.
The problem I'm now having is that the more JSON sources I want to check the slower the PHP file runs, meaning the next cron get's triggered before the previous has finished. It's all starting to feel way too unstable and hacky.
Question
Assuming the PHP script is already the most efficient it can be, what else can be done?
Should I be using multiple cron tasks?
Should something else other then PHP be used?
Are cron tasks even suitable for this sort of problem?
Any experience, best practices or just plan old help will be very much appreciated.
Overview
I'm monitoring for active race sessions and recording each driver and then each lap a driver completes. Laps are recorded only once a driver crosses the start/finish line and I do not know when race sessions may or may not be active or when a driver crosses the line. Therefore I have been checking every second for new data to record.
Each venue where a race session may be active has a separate URL to receive JSON data from. The more venue's I add to my system to monitor the slower the script takes to run.
I've currently 19 venues and the script takes circa 12 seconds to complete. Since I'm running a cron job every minute and looping the script every second. I'm assuming I have at the very least 12 scripts running every second. It just doesn't seem like the most efficient way to do it to me. Of course, it worked a charm back when I was only checking 1 single venue.
There's a cycle to your operations. It is.
start your process by reading the time witn $starttime = time();.
compute the next scheduled time by taking the time plus 60 seconds. $nexttime = $starttime + 60;
do the operations you must do (read a mess of json feeds)
compute how long is left in the minute $timeleft = $nexttime - time();.
sleep until the next scheduled time if ($timeleft > 0) sleep ($timeleft);
set $starttime = $nexttime.
jump back to step 2.
Obviously, if $timeleft is ever negative, you're not keeping up with your measurements. If $timeleft is always negative, you will get further and further behind.
The use of cron every minute is probably wasteful, because it takes resources to fire up a new process and get it going. You probably want to make your process run forever, and use a shell script that monitors it and restarts it if it crashes.
This is all pretty obvious. What's not so obvious is that you should keep track of your individual $timeleft values for each minute over your cycle of measurements. If they vary daily, you should track for a whole day. If they vary weekly you should track for a week.
Then you should should look at the worst (smallest) values of $timeleft. If your 95th percentile is less than about 15 seconds, you're running out of resources and you need to take action. You need a margin like 15 seconds, so your system doesn't move into overload.
If your system has zero tolerance for late sampling of data, you should look at the single worst value of $timeleft, not the 95th percentile. You should give yourself a more generous margin than 15 seconds.
So-called hard real time systems allocate a time slot to each operation, and crash if the operation exceeds the time slot. In your case the time slot is 60 seconds and the operation is reading a certain number of feeds. Crashing is pretty drastic, but measuring is mandatory.
The simplest action to take is to start running multiple worker processes. Give some of your feeds to each process. php runs single-threaded so multiple processes probably will help, at least until you get to three or four of them.
Then you will need to add another computer, and divide your feeds among worker processes on those multiple computers.
A language environment that parses JSON faster than php does might help, but only if the time it takes to parse the JSON is more important than the time it takes to wait for it to arrive.

Running a PHP script or function at an exact point in the future

I'm currently working on a browser game with a PHP backend that needs to perform certain checks at specific, changing points in the future. Cron jobs don't really cut it for me as I need precision at the level of seconds. Here's some background information:
The game is multiplayer and turn-based
On creation of a game room the game creator can specify the maximum amount of time taken per action (30 seconds - 24 hours)
Once a player performs an action, they should only have the specified amount of time to perform the next, or the turn goes to the player next in line.
For obvious reasons I can't just keep track of time through Javascript, as this would be far too easy to manipulate. I also can't schedule a cron job every minute as it may be up to 30 seconds late.
What would be the most efficient way to tackle this problem? I can't imagine querying a database every second would be very server-friendly, but it is the direction I am currently leaning towards[1].
Any help or feedback would be much appreciated!
[1]:
A user makes a move
A PHP function is called that sets 'switchTurnTime' in the MySQL table's game row to 'TIMESTAMP'
A PHP script that is always running in the background queries the table for any games where the 'switchTurnTime' has passed, switches the turn and resets the time.
You can always use a queue or daemon. This only works if you have shell access to the server.
https://stackoverflow.com/a/858924/890975
Every time you need an action to occur at a specific time, add it to a queue with a delay. I've used beanstalkd with varying levels of success.
You have lots of options this way. Here's two examples with 6 second intervals:
Use a cron job every minute to add 10 jobs, each with a delay of 6 seconds
Write a simple PHP script that runs in the background (daemon) to adds an a new job to the queue every 6 seconds
I'm going with the following approach for now, since it seems to be the easiest to implement and test, as well as deploy on different kinds of servers/ hosting, while still acting reliably.
Set up a cron job to run a PHP script every minute.
Within that script, first do a query to find candidates that will have their endtime within this minute.
Start a while-loop, that runs until 59 seconds have passed.
Inside this loop, check the remianing time for each candidate.
If teh time limit has passed, do another query on that specific candidate to ensure the endtime hasn't changed.
If it has, re-add it to the candidates queue as nescessary. If not, act accordingly (in my case: switch the turn to the next player).
Hope this will help somebody in the future, cheers!

Daemon- running tasks at certain times

I've created a PHP daemon that runs.. and its main concern is polling FTP servers at a set interval.
Now, there is now a need to add the same functionality, but at set times as well (so say 7PM on Monday).
How would I modify the service to perform tasks at certain times of the day?
I know I could do something like IF date() == date task should run then ..., but if one of the loops takes longer than normal, it might miss running the task.
Any ideas of how to achieve this?

Start a php code (competition in which some users are) for users, logged in or not

Ok I know the title doesn't really tell you what my problem is but I'll try it now.
I am developing a game. People can subscribe their animals for a race. That race starts at a specific time. It is a race for which ALL users can subscribe. So the calculation of which animal is first, second etc. happens in an php file that is executed, every 2mins there is a new calculation for about 1h. So there are 30 calculations. But ofc. this code is not connected to the logged in user. The logged in user can click on the LIVE button to see the current result.
Example: There is a race at 17.00 later today. 15 animals subscribed, from 4 players and they can all check how their animals are doing.
I do not want someone to post me the full code but I want to know how I should let a php code run for about 1 hour (so execute code, sleep 2min, new calculation, sleep 2min and so on) on my server or so. So it is not connected to the user.
I thought about cron jobs but that is really not the solution for this I believe.
Thank you for reading :p
Two approaches:
You use an algorithm which will always come to the same conclusion, regardless of when it is run and who runs it. You just define the starting parameters, then at any time you can calculate the result (or the intermediate result at any point in time between start and finish) when needed. So any user can at any time visit your site and the algorithm will calculate the current standings on the fly from some fixed starting condition.
Alternatively, you keep all data in a central data store and actually update the data in certain intervals; any user can request the current standings at any time and the latest data from the datastore will be used. You will still need an algorithm that has traits of the one described above, since you're likely explicitly not actually running the simulation in real time. Just every x seconds, you run your calculations again, calculating what is supposed to have changed from the last time you ran them.
In essence, any algorithm you use needs this approach. Even a "realtime" program simply keeps looping, changing values little by little from their previous state. The interval between theses changes can be arbitrarily stretched out, to the point where you calculate nothing until it becomes necessary. In the meantime, you just store all the data you need in a database.
Cron jobs are the wright way i think. Check this out when you are not so good with algorithm:How To: PHP with Cron job Maybe you have to use different cron jobs.

Update mysql data while not actually using it

How can I set up a program in which a certain piece of data for a user is updated every hour. One example I can give is Mafia Wars. When you obtain property, your money is incremented every set amount of time based on which property it is. I'm not asking to spit out code for me, but rather to guide me in the right direction for a solution. I tried looking into cron jobs, but that only runs a script on a set time. Different users are going to be using this, and they may have different times to update their information. So thus, cron jobs are not applicable here.
You could still have cron jobs, just lots of them (not one per user, but maybe one per minute).
Also, Mafia Wars strikes me as not very interactive, so it may be enough to just update the data (after the fact) when the user (or some other part of the system) next looks at it. So when you log in after 37 hours, you get all the updates for the last 37 hours retroactively applied. Cheap trick, but if there is no need for a consistent global view, that might work, too.
A solution that I came up with when wondering how to implement such a thing is that whenever the player saves the game, the game saves the current time. Then, when the player loads the game back up, it calculates how many minutes have passed and figures out how much money the game should give the player. Then, you could update the SQL database to reflect the changes.
Why do you dismiss cron jobs? Have a cron job that runs a script in short intervals. Within this script, include logic to check which specific updates on the database have to be done.
A cron job that runs something is your friend.
What that something is, is up to you. It could be a PHP script that runs some mysql queries or procedures, or it could straight mysql command from the command line.
Either way, Cron (and other similiar tools) are exactly the bill for these tasks. It's lightweight, on nearly every server in the land, lots of help avaliable for it, and it 99.9999% of the time, it just works!

Categories