Context
I'm currently implementing a feature to schedule notifications for a specific period through a web form using PHP and Firebase.
To send the notification I use Firebase and it sends notifications to Android/Ios.
To schedule the notification I use the AT linux service, as it seems to suit better than cron, as cron runs at certain frequencies and AT does not, it runs at a specific time.
man page about the AT: man page AT
Sample code
/usr/bin/php `send_notification.php` | at 2021-07-11 15:40
This will create a file on linux that will run in the period 2021-07-11 15:40 only once.
Problems
The AT service, like CRON, creates files inside a directory on the operating system that represent the jobs.
1 - If a machine on AWS is scaled, jobs would likely be duplicated and consequently send notifications more than once. (Note: I don't know much about machine scaling, but I believe it should happen)
2 - And if the machine is in downtime due to the inclusion of some functionality or something like that, I believe that the way it is currently the job would not be executed.
3 - Another problem, but not the main one, would be if I was using a docker container. As Ubuntu + PHP are inside the container, the job files would probably be lost if I restarted the container, so in this case I believe that a solution would be to use volume, but that would not be my problem now, as currently the application uses only one machine on AWS EB with the PHP image.
Doubts
Is there any solution I can apply to solve this duplicate job problem using PHP?
Is the approach using AT the most suitable? I see a lot of people talking to use CRON, but CRON will run the job several times and for me that's not what I'm looking for.
I think you need a place where scheduled and finished notifications will be persisted, independently on what you are using, cron or at.
If I had such a task, I would stay with a solution like this: run special script, "scheduler.php" each 1 (or more, e.g. 5) mins by cron, which will check some log file(or remote database in case of several machines) and look if there are any new lines. If new line present and it contains timestamp in the past and status "sceduled", than script will lock it and run your "sender.php". After that it will mark the line as "done". Each line in a storage should contain a timestamp to run and one of three statuses "scheduled", "running" and "done".
With such approach you could plan new notifications by adding a line with needed time and status "scheduled" to the storage. Note, that there can be a little delay between scheduled time and actual notification depending on the cron interval, but I suppose it is not critical.
This will allow you to run any number of crons on different machines and guarantee that each job will be done once.
Important: if you will adopt this scheme, be sure that your scheduler.php reads and updates a storage in a single atomic operation, to prevent race conditions between several crons. File locks, or "select for update" will do.
Related
Question
Using PHP & Jquery how would you execute code after a given amount of time, say 1 month (even after the user has closed the browser etc)
Scenario
I've wanted to build an application that does something in an amount of time specified by the user, "sort of like hootsuite". But i cant get my head around how it would work.
I know you can use node.js (I struggle to understand and implement this in any of my laravel projects...) but even then wouldnt the server be filled with stress if say 1000 people had something waiting to be executed on the server for a whole month or even a year while still handling other user requests?
I've looked around a bit and CRON jobs came up but this doesnt sound like what i was looking for! Im not sure, ill be grateful if anyone can explain to me how they think i could go about it
Essentially what you're looking for is a scheduling system. The reason why the UNIX cron tool has come up in your searches is because it is a scheduling tool; it allows UNIX users to schedule tasks to happen at certain times. Other operating systems also have task schedulers.
Schedulers
The principal implementation strategy for a scheduler is some kind of polling mechanism, i.e., a software component which periodically checks to see if there are any scheduled tasks which are now due to be executed and, if so, executes them.
Implementation strategies
In order to implement something like this you would need a way to store information about scheduled tasks (e.g. when they're supposed to happen, who they belong to, what they're supposed to do). For example, you might use a database management system, or a file on disk.
You would also need a component to do the polling. This could be a daemon process (i.e. a process which is always running in the background) which includes a sleep (or wait or timeout) call which allows it to check at intervals for scheduled tasks, rather than constantly checking (and thereby most likely consuming all the CPU cycles!). Or it could be a program (in PHP if you like) which is itself run by cron on the host system, say, every five minutes which checks for scheduled tasks and then executes in, perhaps in separate processes. If you were to use cron, there are numerous PHP wrappers to help such as https://packagist.org/packages/peppeocchi/php-cron-scheduler.
Services
However, instead of implementing all this yourself, you may consider making use of an existing service. There seem to be several options, including at least one free (within limits) service: https://atrigger.com/.
I have a setup where there are several application servers running php-fpm service and they all share a GlusterFS mount for the application code and other assets. In the current deploy process, the files get updated directly on the file server and many times to reflect changes the application service must be reloaded. To achieve that, the deployment script needs to get into every server and issue a reload command but with autoscaling, the number of servers is not the same at every moment.
Overall, I am working on sketching a couple of alternatives to solution this problem:
First one, more artesanal and not perfect, as a proof of concept, would be a cron job that will run every X minutes on the application machines and look for a file that should contain a unique info like it's hostname or IP address. If it matches, it will not take action but if not, it will reload and write itself within the file. On the deployment procedure, the script would clear the file and all servers should get reloaded in the next cron run.
Second, using a more sophisticated approach like a message queue or notification service where the running applications machine would subscribe to at boot time and wait for an order to reload. Deploy script would then publish a notification to get all servers aware it is time. A similar cron job from the previous method would then notice that and reload the app server.
Would any of that make sense? Is there any simpler or more standard way to trigger a broadcast for the applications servers running at a given moment in the deploy procedure without having to ssh to each and issuing the reload command? Any other advice you can provide or other suggestions?
Thanks!
I'm putting together my first commercial PHP application, it's nothing really huge as I'm still eagerly learning PHP :)
Right now I'm still in the conceptual stage of planning my application but I run into one problem all the time, the application is supposed to be self-hosted by my customers, on their own servers and will include some very long running scripts, depending on how much data every customer enters in his application.
Now I think I have two options, either use cronjobs, like for example let one or multiple cronjobs run at a time that every customer can set himself, OR make the whole processing of data as daemons that run in the background...
My question is, since it's a self-hosted application (and every server is different)... is it even recommended to try to write php that starts background processes on a customers server, or is this more something that you can do reliably only on your own server...?
Or should I use cronjobs for these long running processes?
(depending on the amount of data my customers will enter in the application, a process could run 3+ hours)
Is that even a problem that can be solved, reliably, with PHP...? Excuse me if this should be a weird question, I'm really not experienced with PHP daemons and/or long running cronjobs created by php.
So to recap everything:
Commercial self-hosted application, including long running processes, cronjobs or daemons? And is either or maybe both also a reliable solution for a paid application that you can give to your customers with a clear conscience because you know it will work reliable on all kinds of different servers...?
EDIT*
PS: Sorry, I forgot to mention that the application targets only Linux servers, so everything like Debian, Ubuntu etc etc.
Short answer, no, don't go for background process if this will be a client hosted solution. If you go towards the ASP concept (Application Service Provider... not Active Server Pages ;)) then you can do some wacky stuff with background processes and external apps connecting to your sql servers and processing stuff for you.
What i suggest is to create a strong task management backbone and link that to a solid task processing infrastructure. I'll recommend you read an old post i did quite some time ago regarding background processes and a strategy i had adopted to fix long running processes:
Start & Stop PHP Script from Backend Administrative Webpage
Happy reading...
UPDATE
I realize that my old post is far from easy to understand so here goes:
You need 2 models: Job and JobQueue, 2 controller: JobProcessor, XYZProcessor
JobProcessor is called either by a user when a page triggers or using a cronjob as you wish. JobProcessor::process() is the key that starts the whole processing or continues it. It loads the JobQueues and asks the job queues if there is work to do. If there is work to do, it asks the jobqueue to start/continue it's job.
JobQueue Model: Used to queue several JOBS one behind each other and controls what job is currently current by keep some kind of ID and STATE about which job is running.
Job Model: Represents exactly what needs to be done, it contains for example the name of the controller that will process the data, the function to call to process the data and a serialized configuration property that describe what must be done.
XYZController: Is the one that contains the processing method. When the processing method is called, the controller must load everything it needs to memory and then process each individual unit of work as fast as possible.
Example:
Call of index.php
Index.php creates a jobprocessor controller
Index.php calls the jobprocessor's process()
JobProcessor::Process() loads all the queues and processes them
For each JobQueue::Process(), the job queue loads it's possible Jobs and detects if one is currently running or not. If none is running, it starts the next one by calling Job::Process();
Job::Process() creates the XYZController that will work the task at hand. For example, my old system had an InvoicingController and a MassmailingController that worked hand in hand.
Job::Process() calls XYZController::Prepare() so that it loads it's information to process. (For example, load a batch of emails to process, load a batch of invoices to create)
Job::Process() calls XYZController::RunWorkUnit() so that it processes a single unit of work (For example, create one invoice, send one email)
Job::Process() asks JobProcessingController::DoIStillHaveTimeToProcess() and if so, continues processing the next element.
Job::Process() runs out of time and calls XYZController::Cleanup() so that all resources are released
JobQueue::Process() ends and returns to JobController
JobController::Process() is about to end? Open a socket, call myself back so i can start another round of processing until i don't have anything to do anymore
Handle the request from the user that start in position #1.
Ultimately, you can instead open a socket each time and ask the processor to do something, or you can queue a CronJob to call your processor. This way your users won't get stuck waiting for the 3/4 work units to complete each time.
Its worth noting that, in addition to running daemons or cron jobs, you can kick off long running processes from a web request (but note that it must run outside of the webserver process group) and of course asynchronous message processing (which is essentially a variant on the batch approach).
All four of these approaches are very different in terms of how they behave, how concurrency and timing are managed. The factors which make them all different are the same ones you omitted from your question - so it's not really possible to answer.
Unfortunately all rely on facilities which are very different between MSWindows and POSIX systems - so although PHP will run on both, if you want to sell your app on both platforms it's going to need 2 versions.
Maybe you should talk to your potential customer base and ask them what they want?
I'm looking for better solution to handling our cron tasks in a load balanced environment.
Currently have:
PHP application running on 3 CentOS servers behind a load balancer.
Tasks that need to be run periodically but only on a single machine at a time.
Good old cron set up to run those tasks on the first server.
Problems if the first server is out of play for whatever reason.
Looking for:
Something more robust and de-centralized.
Load balancing the tasks so multiple tasks would run only once but on random/different servers to spread the load.
Preventing not having the tasks run when the first server goes down.
Being able to manage tasks and see aggregate reports ideally using a web interface.
Notifications if anything goes wrong.
The solution doesn't need to be implemented in PHP but it would be nice as it would allow us to easily tweak it if needed.
I have found two projects that look promissing. GNUBatch and Job Scheduler. Will most likely further test both but I wonder if someone has better solution for the above.
Thanks.
You can use this small library that uses redis to create a temporary timed lock:
https://github.com/AlexDisler/MutexLock
The servers should be identical and have the same cron configuration. The server that will be first to create the lock will also execute the task. The other servers will see the lock and exit without executing anything.
For example, in the php file that executes the scheduled task:
MutexLock\Lock::init([
'host' => $redisHost,
'port' => $redisPort
]);
// check if a lock was already created,
// if it was, it means that another server is already executing this task
if (!MutexLock\Lock::set($lockKeyName, $lockTimeInSeconds)) {
return;
}
// if no lock was created, execute the scheduled task
scheduledTaskThatRunsOnlyOnce();
To run the tasks in a de-centralized way and spread the load, take a look at: https://github.com/chrisboulton/php-resque
It's a php port of the ruby version of resque and it stores the data in the same exact format so you can use https://github.com/resque/resque-web or http://resqueboard.kamisama.me/ to monitor the workers and see reports
Assuming you have a database available not hosted on one of those 3 servers;
Write a "wrapper" script that goes in cron, and takes the program you're running as its argument. The very first thing it does is connect to the remote database, and check when the last time an entry was inserted into a table (created for this wrapper). If the last insertion time is greater than when it was supposed to run, then insert a new record into the table with the current time, and execute the wrapper's argument (your cron job).
Cron up the wrapper on each server, each set X minutes behind the other (server A runs at the top of the hour, server B runs at 5 minutes, C at 10 minutes, etc).
The first server will always execute the cron first, so the other two servers never will. If the first server goes down, the second server will see it hasn't ran, and will run it.
If you also record in the table which server it was that executed the job, you'll have a log of when/where the script was executed.
Wouldn't this be an ideal situation for using a message / task queue?
I ran into the same problem but came up with this litte repository:
https://github.com/incapption/LoadBalancedCronTask
I am using MYSQL as my database and PHP as my programming language.I wanted to run a cron job which would run until the current system date matches the "deadline(date)" column in my database table called "PROJECT".Once the dates are same an update query has to run which would change the status(field of project table) from "open" to "close".
I am not really sure if cron jobs are the best way or I could use triggers or may be something else.Also I am using Apache as my web server and my OS is windows vista.
Also which is the best way to do it? PHP scheduler or cron jobs or any other method? can anybody enlighten me?
I think your concept needs to change.
PHP cannot schedule a job, neither can MySQL. Triggers in MySQL execute when a mysql query occurs, not at a specific time. Neither
This limitation usually isn't a problem in web development. The reason is because your PHP application should control all data going in and out. Usually, this means just the HTML that displays that data, or other formats to users, or other programs.
In your case you can think about it this way. The deadline is a set date. You can treat it as data, and save it to your database. When the deadline occurs is not important, it is that the data you have sent in your database is viewed correctly.
When a request is made to your application, check if the date of the deadline is in the past, if it is, then display that the project is closed - or update that the project is closed, just before display.
There really is no reason to update data independantly of your PHP application.
Usually, the only things you want to schedule are jobs that would affect your application in terms of load, or that need to be done only once, or where concurrency or time is an issue.
In your case none of those apply.
PS: I haven't tried PHPscheduler but I can guess it isn't a true scheduler. Cron is a deamon that sleeps until a given task is due in its queue, executes the task, then sleeps till the next one is due (at least thats what it does in the current algorithm). PHP cannot do that without the sockets and fork extensions, as special setup. So PHPscheduler is most likely just checking if a date for a task has expired, on each load of a webpage (whenever PHP executes a page). This is no different then you just checking if the date on the project has expired, without the overhead of PHPScheduler.
I would always go for a cron job for anything scheduling related.
The big bonus point is that you can echo info out as well and it get's emailed to you.
You'll find once you start using cronjobs, it's hard to stop.
cron does not exist, per se, in vista, but what does exist is the standard windows scheduling manager which you can run with a command line like "php -q -f myfile.php" which will execute the php file at the given time.
you can also use a port of the cron program, there are many out there.
if it is not critical to the second, any windows scheduling application will do, just be sure to have you PHP bin path in your PATH variable for simplicity.
For Windows CRON jobs I cannot recommend PyCron enough.
While CRON and Windows Scheduled Tasks are the tried and true ways of scheduling jobs/tasks to run on a regular basis, there are use cases where having a different scheduled task in CRON/Windows can become tedious. Namely when you want to let users schedule things to run, or for instances where you prefer simplicity/maintainability/portability/etc or all of the above.
In cases where I prefer to not use CRON/Windows for scheduled tasks, I build into the application a task scheduling system. This still requires 1 CRON job or Windows Task to be scheduled. The idea is to store Job details in the database (job name, job properties, last run time, run interval, anything else that is important for your implementation). You then schedule a "Master" job in CRON or Windows which handles running all of your other jobs for you. You'll need this master job to run at least as often as your shortest interval; if you want to be able to schedule jobs that run every minute the master job needs to run every minute.
You can then launch each scheduled job in the background from PHP with minimal effort (if you want). In memory constrained systems you can monitor memory usage or keep track of the PIDs (various methods) and limit to N jobs running at a given time.
I've had a great deal of success with this method, YMMV however based on your needs and your implementation.
how about PHPscheduler..R they not better than cronjobs? I think crons would be independent of the application hence would be difficult if one has to change the host..i am not really sure though..It would be great if anyone can comment on this!! Thanks!