Does anybody know how WordPress stores it's CronJob events? I'm developing a plugin with multiple concurrent CronJobs, which behaves really strange. When configuring the plugin the first Event will generate some page data over a period of roughly 10-15 mins and is split into multiple packages. These packages will reschedule themselves to get the maximum running time, without hitting the script execution limit. However when the first CronJob is executed, the user can start a second one (not the same one, it's from another section), which will always result in the second one being scheduled, staying in standby and getting removed after the first one has finished an execution.
We had problems with long running CronJobs and the database cache before: Some of our data is bundled into an option and inserting data into this package will overwrite changes made outside of the CronJob. Maybe something similar is happening here. For reference: The reshedule of the first CronJob happens inside said CronJob. Could that be a problem too?
This is how the error is behaving:
Init
Cron 1 is sheduled to a past timestamp.
Cron 1 is starting.
Cron 2 is sheduled to a past timestamp.
Cron 1 is working.
Cron 1 is finished.
Cron 1 is resheduled to a new timestamp.
Cron 2 gets removed from the event queue.
Cron 1 is starting...
I have checked everything that correlates to the scripts themselves: The events are properly registered, have a unique argument (just in case) and even pull a new version of the database options they change, before doing so. Limits are set beforehand and every related function is wrapped in a try-catch-block.
My questions so far: Does anybody know what can cause a CronJob do get deleted (besides "wp_clear_sheduled_hook")? Does WordPress store the events as an option? Can a CronJob overwrite these settings, when it is running for a long time?
Thanks for your help and greetings
SOLUTION: Thanks #kyon147 for pointing out that WordPress is using the wp-options table to store information about the sheduled events. In case anyone has similar problems: Wordpress will load ALL options into it's cache, when it is called. Meaning when starting Cron1 the "cron"-array with your events might look like this:
array('cron1' => 'time')
When something is changing this option while the script is still runing, this change will not be reflected to the script. Meaning the array will still be as above, even when an event is added from another script/session. So when resheduling the event INSIDE Cron1 WordPress took the array above, not the new one. This resulted in the changes being reset to the state, when Cron1 was started and thus the event appearing missing.
Related
I'm registering an event like:
wp_schedule_single_event( time(), 'do_custom_hook', array( $body ) );
Above that, I have added the following action:
add_action('do_custom_hook', 'process_custom_hook', 10);
function process_custom_hook($body){
// custom code here
}
However, sometimes the process_custom_hook function fires, while other times, it simply doesn't (it fires most times, though. It's missed about 10%-20% of the time)
The event is always returning true, meaning that it must have registered.
Also, while testing, I made sure that the arguments (body) is always different.
Any reason why this may occur?
From the codex:
Note that scheduling an event to occur before 10 minutes after an existing event of the same name will be ignored, unless you pass unique values for $args to each scheduled event.
If your custom hook is only working some of the time, then this might be an avenue to look at. If you require the hook to be handled immediately, then it might be prudent to look at giving a hook a unique name, or passing unique values to the hook.
If you do not need your job to execute immediately, then you could look at utilising wp_next_scheduled() to determine when the job will next run, and set a job to run after the next scheduled job.
It's also worth noting that if this task is something which seems to have consistent logic behind it (as seems to be the case) - why not store the job information in to the database and run a cron job every 5-10 minutes to pick up any new jobs from the database and handle them as such? This would avoid needing to deal with the behaviour of wp_schedule_single_event().
According to the official documentation on this instance,
Scheduling an event to occur within 10 minutes of an existing event with the same action hook will be ignored unless you pass unique $args values for each scheduled event. which you have stated tht you did but maybe a double check will help.
It is dependent on when a user visits the site so the action will trigger when someone visits your WordPress site if the scheduled time has passed.
Documentation also says you could use wp_next_scheduled() to prevent duplicate events and use wp_schedule_event() to schedule a recurring event.
The schedule might return true in certain instances where it run but was ignored. so it did run but it was ignored.
I would suggest a detailed log of everything that is sent and received so you can see for yourself if what is occuring is same as what you are confident on.
here are a few links with similar issues and documentation you could look at.
I hope this helps. if not, lets figure it out together.
https://developer.wordpress.org/reference/functions/wp_schedule_single_event/
https://wordpress.stackexchange.com/questions/15475/using-wp-schedule-single-event-with-arguments-to-send-email
https://rudrastyh.com/wordpress/wp_schedule_single_event.html
http://hookr.io/functions/wp_schedule_single_event/
From Wordpress Document:
WP-Cron works by checking, on every page load, a list of scheduled
tasks to see what needs to be run. Any tasks due to run will be called
during that page load.
WP-Cron does not run constantly as the system cron does; it is only
triggered on page load.
Scheduling errors could occur if you schedule
a task for 2:00PM and no page loads occur until 5:00PM.
I think your cron event may be missed because there is no page loads occur the scheduled time.
Here is a solution for your problem:
Hooking WP-Cron Into the System Task Scheduler
As previously mentioned, WP-Cron does not run continuously, which can
be an issue if there are critical tasks that must run on time. There
is an easy solution for this. Simply set up your system’s task
scheduler to run on the intervals you desire (or at the specific time
needed). The easiest solution is to use a tool to make a web request
to the wp-cron.php file...
In my case this exact issue occurred when also Woocommerce's action scheduler was running. Action Scheduler is a cron task manager that ships with Woocommerce, but also other plugins like for instance wp-mail-smtp.
I had exactly the same issue and couldn't figure out what was wrong. I've tried to debug the Wordpress code, and came to the conclusion that when a task was scheduled (meaning, the moment it was added to the scheduled tasks) within 10 second of each whole minute, it just got removed straight away. It seemed some sort of racing condition where the action scheduler just popped it of the stack without the normal wp cron being able to execute it, because the task was already gone.
I need to also say, that I've setup crontab calling wp-cron.php every minute on the minute (instead of the 'fake cron' of Wordpress).
When I replaced wp_schedule_single_event with the as_enqueue_async_action function of the Action Scheduler, no tasks were dropped anymore.
I think an alternative is deinstalling anything that uses Action Scheduler, but I haven't tried that.
You are using cron from Wordpress but some plugins disable or prevent CRON from working. My recommendation in this case is either you create this schedule through the Server Cron or you install a plugin to reaffirm your schedule.
I had the same problem it was even a little difficult to find ... And as you will see it doesn't have a current update, but for me it works. https://wordpress.org/plugins/wp-crontrol
&
https://br.wordpress.org/plugins/advanced-cron-manager/
you can identify which Cron you want to edit and thus have more precision in your edition. There are other plugins that can and should be related to this one. you mentioned about cron's schedule. That's why I indicated this one. So, you can know the Chrome configuration on the calendar. WP Cron you can edit your Cron schedule
I currently have a scheduled console command that runs every 5 minutes without overlap like this:
$schedule->command('crawler')
->everyFiveMinutes()
->withoutOverlapping()
->sendOutputTo('../_laravel/storage/logs/scheduler-log.txt');
So it works great, but I currently have about 220 pages that takes about 3 hours to finish in increments of 5 minutes because I just force it to crawl 10 pages at each interval since each page takes like 20-30 seconds to crawl due to various factors. Each page is a record in the database. If I end up having 10,000 pages to crawl, this method would not work because it would take more than 24 hours and each page is supposed to be re-crawled once a day.
So my vendor allows up to 10 concurrent requests (or more with higher plans), so what's the best way to run it concurrently? If I just duplicate the scheduler code, does it run the same command twice or like 10 times if I duplicated it 10 times? Any issues that would cause?
And then I need to pass on parameters to the console such as 1, 2, 3, etc... in which I could use to determine which pages to crawl? i.e. 1 would be 1-10 records, 2 would be next 11-20 records, and so on.
Using this StackOverfow answer, I think I know how to pass it along, like this:
$schedule->command('crawler --sequence=1')
But how do I read that parameter within my Command class? Does it just become a regular PHP variable, i.e. $sequence?
Better to use queue for job processing
on cron, add all jobs to queue
Run multiple queue workers, which will process jobs in parallel
Tip: It happened with us.
It might happen that job added previously is not complete, yet cron adds the same task in queue again. As queues works sequentially. To save yourself from the situation, you should in database mark when a task is completed last time, so you know when to execute the job (if it was seriously delayed)
I found this on the documentation, I hope this is what you're looking for:
Retrieving Input
While your command is executing, you will obviously need to access the
values for the arguments and options accepted by your application. To
do so, you may use the argument and option methods:
Retrieving The Value Of A Command Argument
$value = $this->argument('name');
Retrieving All Arguments
$arguments = $this->argument();
Retrieving The Value Of A Command
Option
$value = $this->option('name');
Retrieving All Options
$options = $this->option();
source
I have to populate and update one of my MySql database table using a complex and expensive query, based on selection from other table's data. This table doesn't need to be always fully updated when i make a query on it, but i'd like to have a cyclic update every 5 minutes.
This automatic update should be infinite and i need to be sure that it never stops.
After some research, i've found some solution, but i don't know which is better for security and performance.
One of these could be my goal:
Don't create table and make complex query from php every time to get the desired result
Create a php script that repeats cyclically and update table db, maybe using Cron Job.
Update table using a sql event
I think that first solution could be to expensive since query is complex and there could be many request every second, but the result is always updated. I don't have experience about Cron Job, so i can't know if it could be a good idea or not. For the third solution, i still don't have database privileges to run events, but i'd like to know if it could be a valid solution.
All other solutions are welcome, thanks.
Do not use cron. Think about what will happen if one instance goes beyond 5 minutes and the next starts up. Eventually you will have hundreds of copies bogged down stumbling over each other.
Instead have a single job in a loop doing the update. (OK, you could have a cron job to perform a "keep-alive" task of restarting the query if it dies.)
The job would
CREATE TABLE new ...
INSERT INTO new SELECT complex-stuff...
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
loop.
I would opt for Cron Job.
It doesn't clog any request, since it's executed from the operating system.
You can define which user executes the script (cron -u apache -e).
Easy to define interval. (i.e. every 5 minutes */5 * * * * php /path/to/script.php).
It's loggable.
Additional Notes
I had a cron job running under root and it worked just fine. My problem was that the project had a private logging mechanism that each log file would be created by apache user. By running it from root, sometimes the file would be created by root and after that, the scripts being executed by apache would not be able to APPEND to the log.
I also had an emailing script that would run once every 2 minutes that got stuck for 1h. Turns out, because of a bug in the application, an invalid email address (somethingwithoutatsign.com) was inserted into the database, which made the PHPMailer library throws errors. After that, I added a catch block that would send an email to me whenever an exception was thrown. Now, if the script stops running because of bad execution, I get to know right away.
Here's what I'm trying to accomplish in high-level pseudocode:
query db for a list of names (~100)
for each name (using php) {
query a 3rd party site for xml based on the name
parse/trim the data received
update my db with this data
Wait 15 seconds (the 3rd party site has restrictions and I can only make 4 queries / minute)
}
So this was running fine. The whole script took ~25 minutes (99% of the time was spent waiting 15 seconds after every iteration). My web host then made a change so that scripts will timeout after 70 seconds (understandable). This completely breaks my script.
I assume I need to use cronjobs or command line to accomplish this. I only understand the basic us of cronjobs. Any high level advice on how to split up this work in a cronjob? I am not sure how a cronjob could parse through a dynamic list.
cron itself has no idea of your list and what is done already, but you can use two kinds of cron-jobs.
The first cron-job - that runs for example once a day - could add your 100 items to a job queue.
The second cron-job - that runs for example once every minute in a certain period - can check if there are items in the queue, execute one (or a few) and remove it from the queue.
Note that both cron-jobs are just triggers to start a php script in this case and you have two different scripts, one to set the queue and one to process part of a queue so almost everything is still done in php.
In short, there is not much that is different. Instead of executing the script via modphp or fcgi, you are going to execute it via command line php /path/to/script.php.
Because this is a different environment than http, some things obviously don't work. Sessions, cookies, get and post variables. Output gets send to stdout instead of the browser.
You can pass arguments to your script by using $argv.
i have a big script written in php, which should import a lot of informations in a prestashop installation, using webservices, this script is written in "sections" I mean, there is a function that import the categories, another one that import products, then manufacturers, and so on, there are about 7 - 10 functions called in the main script. Basically I assume that this script must run for about an hour, passing from a function to the next one and so on since it arrives at the last function, then return some values and stops until the next night.
i would like to understand if it could be better :
1) impose a time limit of 30 minutes everytime i enter a new function (this will prevent the timeout)
2) make a chain of pages, each one with a single function call (and of course the time limit)
or any other idea... i would like to :
know if a function has been called (maybe using a global variable?)
be sure that the server will execute the function in order (so the pages chain)...
i hope to have beeen clear, otherwise i'll update the question.
edits:
the script is executed by another server that will call a page, the other server is "unkown" from me, so I simply know only that this page is called (they could also call the function by going on the page) but anyway i have no controll on it.
For any long running scripts, I would run it through the commandline, probably with a cronjob to kick it off. If it's triggered from the outside, I would create a job queue (for example in the database) where you insert a new row to signify that it should run, along with any variable input params. Then the background job would run - say - every 5 minutes, check if there's a new job in the queue. If there's not, just exit. If there is, mark that it has begun work and start processing. When done, mark that it's done.
1 hour of work is a looooooooong time though. Nothing you can do to optimise that?
You can increase the time limit for execution of a script as much as you want using :
set_time_limit(seconds);
And also for long running scripts you need a more memory. you can increase the memory limit using :
ini_set('memory_limit','20M');
And second other thing you have to make sure is that you are running your script on a dedicated server because if you are using a shared server you server will kill automatically long running scripts.