I have a number of scripts that run many times every day. Some run every 5 minutes, others every 10, 15, 60 minutes etc. Most of the scripts are dealing with large amounts of data, where they loop through and process each row.
My host's cron does not allow me to decide when each script starts, I can only choose the number of minutes apart, so the scripts that run every 5 minutes, run at 00,05,10,15...
and the scripts that run every 10 minutes run at 00,10,20,30...
etc.
That means a lot of scripts start running at 00.
That wouldn't really be a problem except for the fact that the server is also used by regular users that often only need to do something that shouldn't take a lot of time. But if they happen to be doing it when a lot of those scripts are running, everything slows down and they are not able to do even those simple things without waiting for a long time.
I read somewhere that the sleep function would stop the CPU from processing data and allow smaller tasks to run. Do you know if that is the case? If I were to insert a sleep(1) after each loop in a lot of these scripts, would that improve the performance that is seen by individual users doing small tasks, i.e. allow them to skip the line?
Example
Script A that runs every 5 minutes
foreach ($a_hundred_things as $one_thing) {
do_this($one_thing);
// would adding sleep(1); here improve performance?
}
Script B that runs when a user clicks a button
do_this_thing_that_only_takes_a_minute();
At the moment, if a script like Script A (and many others like it) is running, the user will sometimes have to wait a long time (few minutes) for Script B to run. I'd like B to be able to somehow take priority as that's something that cannot wait, while the automatic scripts can afford to be delayed.
Would adding the sleep(1) in Script A (and all other similar scripts running at the same time) improve the performance?
I know I could try to test it, but there are a lot of scripts involved and this doesn't seem to happen every time.
First and foremost, usleep() let you specify how many microseconds you want to sleep for, and is the superior function.
The following are kludges, but can help soften the blow while you implement a better solution:
Insert a random wait of a few seconds at the beginning of a cron so that they don't all start up at once. This prevents your machine getting completely slammed every 15 minutes while everything's intializing at once.
usleep(mt_rand(0,10000000));
Ghetto throttling. Measure the time it takes to perform one loop iteration, then sleep for that long. You've effectively cut performance in half.
while($condition) {
$start = microtime(true);
// do work
usleep( (microtime(true) - $start) * 1000000) );
}
Actual solutions that you should work on, note that these are not mutually exclusive:
Farm off periodic/async tasks to one or more other machines. There's no reason to negatively impact the performance of user-facing infrastructure with work that has no direct bearing on user requests.
Use a queue/worker model for async tasks, and only use cron to periodically submit jobs for processing. This would roughly equate to:
Create a generalized worker process.
Set up a task queue. [I recommend AMQP-compatible queue systems, and personally prefer RabbitMQ]
Spin up one or more worker processes, something like Supervisord makes it simple to keep a small herd of workers running.
Oops! You've implemented both of those? That's called "Horizontal Scaling" and how we build modern apps in 2020. Now you can spin up machines and processes on-demand and as-needed, maybe even automatically. Sorry for the convenience. :P
Related
I have a daily cron job which takes about 5 minutes to run (it does some data gathering and then various database updates). It works fine, but the problem is that, during those 5 minutes, the site is completely unresponsive to any requests, HTTP or otherwise.
It would appear that the cron job script takes up all the resources while it runs. I couldn't find anything in the PHP docs to help me out here - how can I make the script know to only use up, say, 50% of available resources? I'd much rather have it run for 10 minutes and have the site available to users during that time, than have it run for 5 minutes and have user complaints about downtime every single day.
I'm sure I could come up with a way to configure the server itself to make this happen, but I would much prefer if there was a built-in approach in PHP to resolving this issue. Is there?
Alternatively, as plan B, we could redirect all user requests to a static downtime page while the script is running (as opposed to what's happening now, which is the page loading indefinitely or eventually timing out).
A normal script can't hog up 100% of resources, resources get split over the processes. It could slow everything down intensly, but not lock all resources in (without doing some funky stuff). You could get a hint by doing top -s in your commandline, see which process takes up a lot.
That leads to conclude that something locks all further processes. As Arkascha comments, there is a fair chance that your database gets locked. This answer explains which table type you should use; If you do not have it set to InnoDB, you probally want that, at least for the locking tables.
It could also be disk I/O if you write huge files, try to split it into smaller read/writes or try to place some of the info (e.g. if it are files with lists) to your database (assuming that has room to spare).
It could also be CPU. To fix that, you need to make your code more efficient. Recheck your code, see if you do heavy operations and try to make those smaller. Normally you want this as fast as possible, now you want them as lightweight as possible, this changes the way you write code.
If it still locks up, it's time to debug. Turn off a large part of your code and check if the locking still happens. Continue turning on code untill you notice locking. Then fix that. Try to figure out what is costing you so much. Only a few scripts require intense resources, it is now time to optimize. One option might be splitting it into two (or more) steps. Run a cron that prepares/sanites the data, and one that processed the data. These dont have to run at syncronical, there might be a few minutes between them.
If that is not an option, benchmark your code and improve as much as you can. If you have a heavy query, it might improve by selecting only ID's in the heavy query and use a second query just to fetch the data. If you can, use your database to filter, sort and manage data, don't do that in PHP.
What I have also implemented once is a sleep every N actions.
If your script really is that extreme, another solution could be moving it to a time when little/no visitors are on your site. Even if you remove the bottleneck, nobody likes a slow website.
And there is always the option of increasing your hardware.
You don't mention which resources are your bottleneck; CPU, memory or disk I/O.
However if it is CPU or memory you can do something this in you script:
http://php.net/manual/en/function.sys-getloadavg.php
http://php.net/manual/en/function.memory-get-usage.php
$yourlimit = 100000000;
$load = sys_getloadavg();
if ($load[0] > 0.80 || memory_get_usage() > $yourlimit) {
sleep(5);
}
Another thing to try would be to set your process priority in your script.
This requires SU though, which should be fine for a cronjob?
http://php.net/manual/en/function.proc-nice.php
proc_nice(50);
I did a quick test for both and it work like a charm, thanks for asking I have cronjob like that as well and will implement it. It looks like the proc_nice only will do fine.
My test code:
proc_nice(50);
$yourlimit = 100000000;
while (1) {
$x = $x+1;
$load = sys_getloadavg();
if ($load[0] > 0.80 || memory_get_usage() > $yourlimit) {
sleep(5);
}
echo $x."\n";
}
It really depend of your environment.
If using a unix base, there is built-in tools to limit cpu/priority of a given process.
You can limit the server or php alone, wich is probably not what you are looking for.
What you can do first is to separate your task in a separate process.
There is popen for that, but i found it much more easier to make the process as a bash script. Let''s name it hugetask for the example.
#!/usr/bin/php
<?php
// Huge task here
Then to call from the command line (or cron):
nice -n 15 ./hugetask
This will limit the scheduling. It mean it will low the priority of the task against others. The system will do the job.
You can as well call it from your php directly:
exec("nice -n 15 ./hugetask &");
Usage: nice [OPTION] [COMMAND [ARG]...] Run COMMAND with an adjusted
niceness, which affects process scheduling. With no COMMAND, print the
current niceness. Niceness values range from
-20 (most favorable to the process) to 19 (least favorable to the process).
To create a cpu limit, see the tool cpulimit which has more options.
This said, usually i am just putting some usleep() in my scripts, to slow it down and avoid to create a funnel of data. This is ok if you are using loops in your script. If you slow down your task to run in say 30 minutes, there won't be much issues.
See also proc_nice http://php.net/manual/en/function.proc-nice.php
proc_nice() changes the priority of the current process by the amount
specified in increment. A positive increment will lower the priority
of the current process, whereas a negative increment will raise the
priority.
And sys_getloadavg can also help. It will return an array of the system load in the last 1,5, and 15 minutes.
It can be used as a test condition before launching the huge task.
Or to log the average to find the best day time to launch huge task. It can be susrprising!
print_r(sys_getloadavg());
http://php.net/manual/en/function.sys-getloadavg.php
You could try to delay execution using sleep. Just cause your script to pause between several updates of your database.
sleep(60); // stop execution for 60 seconds
Although this depends a lot on the kind of process you are doing in your script. Maybe or not helpful in your case. Worth a try, so you could
Split your queries
do the updates in steps with sleep inbetween
References
Using sleep for cron process
I could not describe it better than the quote in the above answer:
Maybe you're walking the database of 9,000,000 book titles and updating about 10% of them. That process has to run in the middle of the day, but there are so many updates to be done that running your batch program drags the database server down to a crawl for other users.
So modify the batch process to submit, say, 1000 updates, then sleep for 5 seconds to give the database server a chance to finish processing any requests from other users that have backed up.
Sleep and server resources
sleep resources depend on OS
adding sleep to allevaite server resources
Probably to minimize you memory usage you should process heavy and lengthy operations in batches. If you query the database using an ORM like doctrine you can easily use existing functions
http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/batch-processing.html
It's hard to tell what exactly the issue may be without having a look at your code (cron script). But to confirm that the issue is caused by the cron job you can run the script manually and check website responsiveness. If you notice the site being down when running the cron job then we would have to have a look at your script in order to come up with a solution.
Many loops in your cron script might consume a lot of CPU resources.
To prevent that and reduce CPU usage simply put some delays in your script, for example:
while($long_time_condition) {
//Do something here
usleep(100000);
}
Basically, you are giving the processor some time to do something else.
Also you can use the proc_nice() function to change the process priority. For example proc_nice(20);//very low priority. Look at this question.
If you want to find the bottlenecks in your code you can try to use Xdebug profiler.
Just set it up in your dev environment, start the cron manually and then profile any page. Also you can profile your cron script as well php -d xdebug.profiler_enable=On script.php, look at this question.
If you suspect that the database is your bottleneck than import pretty large dataset (or entire database) in your local database and repeat the steps, logging and inspecting all the queries.
Alternatively if it possible setup the Xdebug on the staging server where the server is as close as possible to production and profile the page during cron execution.
Reason
I've been building a system that pulls data from multiple JSON sources. The data being pulled is constantly changing and I'm recording what the changes are to a SQL database via a PHP script. 9 times out of 10 the data is different and therefore needs recording.
The JSON needs to be checked every single second. I've been successfully using a cron task every minute with a PHP function that loops 60 times over.
The problem I'm now having is that the more JSON sources I want to check the slower the PHP file runs, meaning the next cron get's triggered before the previous has finished. It's all starting to feel way too unstable and hacky.
Question
Assuming the PHP script is already the most efficient it can be, what else can be done?
Should I be using multiple cron tasks?
Should something else other then PHP be used?
Are cron tasks even suitable for this sort of problem?
Any experience, best practices or just plan old help will be very much appreciated.
Overview
I'm monitoring for active race sessions and recording each driver and then each lap a driver completes. Laps are recorded only once a driver crosses the start/finish line and I do not know when race sessions may or may not be active or when a driver crosses the line. Therefore I have been checking every second for new data to record.
Each venue where a race session may be active has a separate URL to receive JSON data from. The more venue's I add to my system to monitor the slower the script takes to run.
I've currently 19 venues and the script takes circa 12 seconds to complete. Since I'm running a cron job every minute and looping the script every second. I'm assuming I have at the very least 12 scripts running every second. It just doesn't seem like the most efficient way to do it to me. Of course, it worked a charm back when I was only checking 1 single venue.
There's a cycle to your operations. It is.
start your process by reading the time witn $starttime = time();.
compute the next scheduled time by taking the time plus 60 seconds. $nexttime = $starttime + 60;
do the operations you must do (read a mess of json feeds)
compute how long is left in the minute $timeleft = $nexttime - time();.
sleep until the next scheduled time if ($timeleft > 0) sleep ($timeleft);
set $starttime = $nexttime.
jump back to step 2.
Obviously, if $timeleft is ever negative, you're not keeping up with your measurements. If $timeleft is always negative, you will get further and further behind.
The use of cron every minute is probably wasteful, because it takes resources to fire up a new process and get it going. You probably want to make your process run forever, and use a shell script that monitors it and restarts it if it crashes.
This is all pretty obvious. What's not so obvious is that you should keep track of your individual $timeleft values for each minute over your cycle of measurements. If they vary daily, you should track for a whole day. If they vary weekly you should track for a week.
Then you should should look at the worst (smallest) values of $timeleft. If your 95th percentile is less than about 15 seconds, you're running out of resources and you need to take action. You need a margin like 15 seconds, so your system doesn't move into overload.
If your system has zero tolerance for late sampling of data, you should look at the single worst value of $timeleft, not the 95th percentile. You should give yourself a more generous margin than 15 seconds.
So-called hard real time systems allocate a time slot to each operation, and crash if the operation exceeds the time slot. In your case the time slot is 60 seconds and the operation is reading a certain number of feeds. Crashing is pretty drastic, but measuring is mandatory.
The simplest action to take is to start running multiple worker processes. Give some of your feeds to each process. php runs single-threaded so multiple processes probably will help, at least until you get to three or four of them.
Then you will need to add another computer, and divide your feeds among worker processes on those multiple computers.
A language environment that parses JSON faster than php does might help, but only if the time it takes to parse the JSON is more important than the time it takes to wait for it to arrive.
I'm using the DigitalOcean API to create droplets when my web application needs the extra resources. Because of the way DigitalOcean charges (min. one hour increments), I'd like to keep a created server on for an hour so it's available for new tasks after the initial task is completed.
I'm thinking about formatting my script this way:
<?php
createDroplet($dropletid);
$time = time();
// run resource heavy task
sleep($time + 3599);
deleteDroplet($dropletid);
Is this the best way to achieve this?
It doesn't look like a good idea, but the code is so simple, nothing can compete with that. You would need to make sure your script can run, at least, for that long.
Note that sleep() should not have $time as an argument. It sleeps for the given number of seconds. $time contains many, many seconds.
I am worried that the script might get interrupted, and you will never delete the droplet. Not much you can do about that, given this script.
Also, the sleep() itself might get interrupted, causing it to sleep much shorter than you want. A better sleep would be:
$remainingTime = 3590;
do {
$remainingTime = sleep($remainingTime);
} while ($remainingTime > 0);
This will catch an interrupt of sleep(). This is under the assumption that FALSE is -1. See the manual on sleep(): http://php.net/manual/en/function.sleep.php
Then there's the problem that you want to sleep for exactly 3599 seconds, so that you're only charged one hour. I wouldn't make it so close to one hour. You have to leave some time for DigitalOcean to execute stuff and log the time. I would start with 3590 seconds and make sure that always works.
Finally: What are the alternatives? Clearly this could be a cron job. How would that work? Suppose you execute a PHP script every minute, and you have a database entry that tells you which resource to allocate at a certain start time and deallocate at a certain expire time. Then that script could do this for you with an accurary of about a minute, which should be enough. Even if the server crashes and restarts, as long as the database is intact and the script runs again, everything should go as planned. I know, this is far more work to implement, but it is the better way to do it.
I'm currently working on a browser game with a PHP backend that needs to perform certain checks at specific, changing points in the future. Cron jobs don't really cut it for me as I need precision at the level of seconds. Here's some background information:
The game is multiplayer and turn-based
On creation of a game room the game creator can specify the maximum amount of time taken per action (30 seconds - 24 hours)
Once a player performs an action, they should only have the specified amount of time to perform the next, or the turn goes to the player next in line.
For obvious reasons I can't just keep track of time through Javascript, as this would be far too easy to manipulate. I also can't schedule a cron job every minute as it may be up to 30 seconds late.
What would be the most efficient way to tackle this problem? I can't imagine querying a database every second would be very server-friendly, but it is the direction I am currently leaning towards[1].
Any help or feedback would be much appreciated!
[1]:
A user makes a move
A PHP function is called that sets 'switchTurnTime' in the MySQL table's game row to 'TIMESTAMP'
A PHP script that is always running in the background queries the table for any games where the 'switchTurnTime' has passed, switches the turn and resets the time.
You can always use a queue or daemon. This only works if you have shell access to the server.
https://stackoverflow.com/a/858924/890975
Every time you need an action to occur at a specific time, add it to a queue with a delay. I've used beanstalkd with varying levels of success.
You have lots of options this way. Here's two examples with 6 second intervals:
Use a cron job every minute to add 10 jobs, each with a delay of 6 seconds
Write a simple PHP script that runs in the background (daemon) to adds an a new job to the queue every 6 seconds
I'm going with the following approach for now, since it seems to be the easiest to implement and test, as well as deploy on different kinds of servers/ hosting, while still acting reliably.
Set up a cron job to run a PHP script every minute.
Within that script, first do a query to find candidates that will have their endtime within this minute.
Start a while-loop, that runs until 59 seconds have passed.
Inside this loop, check the remianing time for each candidate.
If teh time limit has passed, do another query on that specific candidate to ensure the endtime hasn't changed.
If it has, re-add it to the candidates queue as nescessary. If not, act accordingly (in my case: switch the turn to the next player).
Hope this will help somebody in the future, cheers!
I need some implementation advice. I have a MYSQL DB that will be written to remotely for tasks to process locally and I need my application which is written in PHP to execute these tasks imediatly as they come in.
But of course my PHP app needs to be told when to run. I thought about using cron jobs but my app is on a windows machine. Secondly, I need to be constantly checking every few seconds and cron can only do every minute.
I thought of writing a PHP daemon but I am getting consued on hows its going to work and if its even a good idea!
I would appreciate any advice on the best way to do this.
pyCron is a good CRON alternative for Windows:
Since this task is quite simple I would just set up pyCron to run the following script every minute:
set_time_limit(60); // one minute, same as CRON ;)
ignore_user_abort(false); // you might wanna set this to true
while (true)
{
$jobs = getPendingJobs();
if ((is_array($jobs) === true) && (count($jobs) > 0))
{
foreach ($jobs as $job)
{
if (executeJob($job) === true)
{
markCompleted($job);
}
}
}
sleep(1); // avoid eating unnecessary CPU cycles
}
This way, if the computer goes down, you'll have a worst case delay of 60 seconds.
You might also want to look into semaphores or some kind of locking strategy like using an APC variable or checking for the existence of a locking file to avoid race conditions, using APC for example:
set_time_limit(60); // one minute, same as CRON ;)
ignore_user_abort(false); // you might wanna set this to true
if (apc_exists('lock') === false) // not locked
{
apc_add('lock', true, 60); // lock with a ttl of 60 secs, same as set_time_limit
while (true)
{
$jobs = getPendingJobs();
if ((is_array($jobs) === true) && (count($jobs) > 0))
{
foreach ($jobs as $job)
{
if (executeJob($job) === true)
{
markCompleted($job);
}
}
}
sleep(1); // avoid eating unnecessary CPU cycles
}
}
If you're sticking with the PHP daemon do yourself a favor and drop that idea, use Gearman instead.
EDIT: I asked a related question once that might interest you: Anatomy of a Distributed System in PHP.
I'll suggest something out of the ordinary: you said you need to run the task at the point the data is written to MySQL. That implies MySQL "knows" something should be executed.
It sounds like perfect scenario for MySQL's UDF sys_exec.
Basically, it would be nice if MySQL could invoke an external program once something happened to it.
If you use the mentioned UDF, you can execute a php script from within - let's say, INSERT or UPDATE trigger.
On the other hand, you can make it more resource-friendly and create MySQL Event (assuming you're using appropriate version) that would use sys_exec to invoke a PHP script that does certain updates at predefined intervals - that reduces the need for Cron or any similar program that can execute something at predefined intervals.
i would definately not advise to use cronjobs for this.
cronjobs are a good thing and very useful and easy for many purposes, but as you describe your needs, i think they can produce more complications than they do good. here are some things to consider:
what happens if jobs overlap? one takes longer to execute than one minute? are there any shared resources/deadlocks/tempfiles? - the most common method is to use a lock file, and stop the execution if its occupied right at the start of the program. but the program also has to look for further jobs right before it completes. - this however can also get complicated on windows machines because they AFAIK don't support write locks out of the box
cronjobs are a pain in the ass to maintain. if you want to monitor them you have to implement additional logic like a check when the program last ran. this however can get difficult if your program should run only on demand. the best way would be some sort of "job completed" field in the database or delete rows that have been processed.
on most unix based systems cronjobs are pretty stable now, but there are a lot of situatinos where you can break your cronjob system. most of them are based on human error. for example a sysadmin not exiting the crontab editor properly in edit mode can cause all cronjobs to be deleted. a lot of companies also have no proper monitoring system for the reasons stated above and notice as soon as their services experience problems. at this point often nobody has written down/put under version control which cronjobs should run and wild guessing and reconstruction work begins.
cronjob maintaince can be further complicated when external tools are used and the environment is not a native unix system. sysadmins have to gain knowledge of more programs and they can have potential errors.
i honestly think just a small script that you start from the console and let open is just fine.
<?php
while(true) {
$job = fetch_from_db();
if(!$job) {
sleep(10)
} else {
$job->process();
}
}
you can also touch a file (modify modification timestamp) in every loop, and you can write a nagios script that checks for that timestamp getting out of date so you know that your job is still running...
if you want it to start up with the system i recommend a deamon.
ps: in the company i work there is a lot of background activity for our website (crawling, update processes, calculations etc...) and the cronjobs were a real mess when i started there. their were spread over different servers responsible for different tasks. databases were accessed wildly accross the internet. a ton of nfs filesytems, samba shares etc were in place to share resouces. the place was full of single points of failures, bottlenecks and something constantly broke. there were so many technologies involved that it was very difficult to maintain and when something didnt work it needed hours of tracking down the problem and another hour of what that part even was supposed to do.
now we have one unified update program that is responsible for literally everyhing, it runs on several servers and they have a config file that defines the jobs to run. eveyrthing gets dispatched from one parent process doing an infinite loop. its easy to monitor, customice, synchronice and everything runs smoothly. it is redundant, it is syncrhonized and the granularity is fine. so it runs parallel and we can scale up to as many servers as we like.
i really suggest to sit down for enough time and think about everything as a whole and get a picture of the complete system. then invest the time and effort to implement a solution that will serve fine in future and doesnt spread tons of different programs throughout your system.
pps:
i read a lot about the minimum interaval of 1/5 minutes for cronjobs/tasks. you can easily work around that with an arbitrary script that takes over that interval:
// run every 5 minutes = 300 secs
// desired interval: 30 secs
$runs = 300/30; // be aware that the parent interval needs to be a multiple of the desired interval
for($i=0;$i<$runs;$i++) {
$start = time();
system('myscript.php');
sleep(300/10-time()+$start); // compensate the time that the script needed to run. be aware that you have to implement some logic to deal with cases where the script takes longer to run than your interavl - technique and problem described above
}
This looks like a job for a job server ;) Have a look at Gearman. The additional benefit of this approach is, that this is triggered by the remote side, when and only then there is something to do, instead of polling. Especially in intervals smaller than (lets say) 5 min polling is not very effective any more, depending on the tasks the job performs.
The quick and dirty way is to create a loop that continuously checks if there is new work.
Psuedo-code
set_ini("max_execution_time", "3600000000");
$keeplooping = true;
while($keeplooping){
if(check_for_work()){
process_work();
}
else{
sleep(5);
}
// some way to change $keeplooping to false
// you don't want to just kill the process, because it might still be doing something
}
Have you tried windows scheduler(comes with Windows by default)? In this you will need to provide php path and your php file path. It works well
Can't you just write a java/c++ program that will query for you through a set time interval? You can have this included in the list of startup programs so its always running as well. Once a task is found, it can handle it on a separate thread even and process more requests and mark others complete.
The most simple way is to use embed Windows schedule.
Run your script with php-cli.exe with filled php.ini with extensions your script needs.
But I should to say that in practice you don't need so short time interval to run your scheduled jobs. Just make some tests to get best time interval value for yours one. It is not recommended to setup time interval less than a 1 minute.
And another little advise: make some lock file at the beginning of your script (php flock function), check for availability to write into this lock file to prevent working of two or more copies same time and at the end of your script unlink it after unlocking.
If you have to write output result to DB try to use MySQL TRIGGERS instead of PHP. Or use events in MySQL.