I have 1 cronjob that runs every 60 minutes but for some reason, recently, it is running slow.
Env: centos5 + apache2 + mysql5.5 + php 5.3.3 / raid 10/10k HDD / 16gig ram / 4 xeon processor
Here's what the cronjob do:
parse the last 60 minutes data
a) 1 process parse user agent and save the data to the database
b) 1 process parse impressions/clicks on the website and save them to the database
from the data in step 1
a) build a small report and send emails to the administrator/bussiness
b) save the report into a daily table (available in the admin section)
I see now 8 processes (the same file) when I run the command ps auxf | grep process_stats_hourly.php (found this command in stackoverflow)
Technically I should only have 1 not 8.
Is there any tool in Cent OS or something I can do to make sure my cronjob will run every hour and not overlapping the next one?
Thanks
Your hardware seems to be good enough to process this.
1) Check if you already have hanging processes. Using the ps auxf (see tcurvelo answer), check if you have one or more processes that takes too much resources. Maybe you don't have enough resources to run your cronjob.
2) Check your network connections:
If your databases and your cronjob are on a different server you should check whats the response time between these two machines. Maybe you have network issues that makes the cronjob wait for the network to send the package back.
You can use: Netcat, Iperf, mtr or ttcp
3) Server configuration
Is your server is configured correctly? Your OS, MySQL are setup correctly? I would recommend to read these articles:
http://www3.wiredgorilla.com/content/view/220/53/
http://www.vr.org/knowledgebase/1002/Optimize-and-disable-default-CentOS-services.html
http://dev.mysql.com/doc/refman/5.1/en/starting-server.html
http://www.linux-mag.com/id/7473/
4) Check your database:
Make sure your database has the correct indexes and make sure your queries are optimized. Read this article about the explain command
If a query with few hundreds thousands of record takes times to execute that will affect the rest of your cronjob, if you have a query inside a loop, even worse.
Read these articles:
http://dev.mysql.com/doc/refman/5.0/en/optimization.html
http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
http://blog.fedecarg.com/2008/06/12/10-great-articles-for-optimizing-mysql-queries/
5) Trace and optimized PHP code?
Make sure your PHP code runs as fast as possible.
Read these articles:
http://phplens.com/lens/php-book/optimizing-debugging-php.php
http://code.google.com/speed/articles/optimizing-php.html
http://ilia.ws/archives/12-PHP-Optimization-Tricks.html
A good technique to validate your cronjob is to trace your cronjob script:
Based on your cronjob process, put some debug trace including how much memory, how much time it took to execute the last process. eg:
<?php
echo "\n-------------- DEBUG --------------\n";
echo "memory (start): " . memory_get_usage(TRUE) . "\n";
$startTime = microtime(TRUE);
// some process
$end = microtime(TRUE);
echo "\n-------------- DEBUG --------------\n";
echo "memory after some process: " . memory_get_usage(TRUE) . "\n";
echo "executed time: " . ($end-$start) . "\n";
By doing that you can easily find which process takes how much memory and how long it takes to execute it.
6) External servers/web service calls
Is your cronjob calls external servers or web service? if so, make sure these are loaded as fast as possible. If you request data from a third-party server and this server takes few seconds to return an answer that will affect the speed of your cronjob specially if these calls are in loops.
Try that and let me know what you find.
The ps's output also shows when the process have started (see column STARTED).
$ ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 2 0.0 0.0 0 0 ? S 18:55 0:00 [ktrheadd]
^^^^^^^
(...)
Or you can customize the output:
$ ps axfo start,command
STARTED COMMAND
18:55 [ktrheadd]
(...)
Thus, you can be sure if they are overlapping.
You should use a lockfile mechanism within your process_stats_hourly.php script. Doesn't have to be anything overly complex, you could have php write the PID which started the process to a file like /var/mydir/process_stats_hourly.txt. So if it takes longer than an hour to process the stats and cron kicks off another instance of the process_stats_hourly.php script, it can check to see if the lockfile already exists, if it does it will not run.
However you are left with the problem of how to "re-queue" the hourly script if it did find the lock file and couldn't start.
You might use strace -p 1234 where 1234 is a relevant process id, on one of the processes which is running too long. Perhaps you'll understand why is it so slow, or even blocked.
Is there any tool in Cent OS or something I can do to make sure my cronjob will run every hour and not overlapping the next one?
Yes. CentOS' standard util-linux package provides a command-line convenience for filesystem locking. As Digital Precision suggested, a lockfile is an easy way to synchronize processes.
Try invoking your cronjob as follows:
flock -n /var/tmp/stats.lock process_stats_hourly.php || logger -p cron.err 'Unable to lock stats.lock'
You'll need to edit paths and adjust for $PATH as appropriate. That invocation will attempt to lock stats.lock, spawning your stats script if successful, otherwise giving up and logging the failure.
Alternatively your script could call PHP's flock() itself to achieve the same effect, but the flock(1) utility is already there for you.
How often is that logfile rotated?
A log-parsing job suddenly taking longer than usual sounds like the log isn't being rotated and is now too big for the parser to handle efficiently.
Try resetting the logfile and see if the job runs faster. If that solves the problem, I recommend logrotate as a means of preventing the problem in the future.
You could add a step to the cronjob to check the output of your above command:
ps auxf | grep process_stats_hourly.php
Keep looping until the command returns nothing, indicating that the process isn't running, then allow the remaining code to execute.
Related
I have read the other questions on SO with a similar title, but that's not what this question is about. I know HOW to execute a PHP script from another PHP script. The problem is, when I do so, it uses far too much CPU. I would like to know how to reduce this.
I have a simple front-controller-like script called index.php. It processes GET requests from a client and depending on the "action" parameter passed, it sends the request to the appropriate file to handle it. For example, this is a client request:
xhttp.open("GET", serverURL + "?action=doSomething" + "&userID=" + user.ID + "&time=" + lastServerTime, true);
index.php has an array that maps the "action" parameter to the appropriate file:
exec('php ' . $url_map[$action] . ' "' . $parameter1 . '"' . ' "' . $parameter2 . '" 2>&1', $output, $return_value);
For testing purposes, I have created a PHP script that does nothing except measure CPU utilisation and dump it to a log file:
<?php
function varDumpToFile($parameter1) {
$file = 'log.txt';
$dump = $parameter1;
$output = print_r($dump, true);
file_put_contents($file, $output, FILE_APPEND | LOCK_EX);
}
varDumpToFile(`ps -eo pcpu,pid,user,args --no-headers| sort -t. -nk1,2 -k4,4 -r |head -n 5`);
?>
This produces a log file that looks like this:
9.0 3123052 user /opt/cpanel/ea-php56/root/usr/bin/php cputest.php 10 147424 1537625595
Clearly, a PHP script shouldn't take 9% of CPU to execute. For comparison, I've run the same script directly accessing it via a GET request:
0.1 3186198 user lsphp:ic_html/dev/php/cputest.php
0.1% is more like it. But why does calling this PHP script from another PHP script use so much CPU? Is it because I have to execute a "new instance" of PHP when I exec PHP, which has a lot of overhead? If so, is there a way to exec a PHP script using an "already running" instance of PHP? Or is there another way of doing this?
I always say "when in doubt, look at PHP source code". In here, for instance. While doing exec, you have to fork the process, create a new stream, read from the input buffer, etc.
And also, while PHP is a compiled language, for the newly forked process, you must run the opcode compiler to generate opcodes (instructions similar to Java bytecode) and then execute those. You can read all about it here. In the end you run the compiler twice, for each fork separately.
Is it worth 9% of your CPU? I have no idea. Maybe. Maybe not. Who knows.
"Better solution"? Upgrade to latest version of PHP. PHP 5.6 is not supported anymore and security updates will cease in 3 months. Even better solution - keep a normal object-oriented and maintainable code without using exec. IMO, it's okay to play around with exec like you are. But if it's your production code, I pray for the souls of those, who would maintain your code after you.
Whatever which way you run your application be mod_php or fpm, they rely on having worker processes ready to manage your request. Process management is built in: they will do their best to keep as many workers idle as you specify and reuse them to avoid exactly this problem, having to fork processes at the least desirable moment.
Not only there's overhead on executing new processes, but the execution environment will be completely different too. If you look into your php configuration there will be several php.ini files, one for each specific environment. This means that one environment could have different modules enabled or different configuration outright. It's not uncommon to have cli scripts max_execution_time or memory_limit set to unlimited. This can affect resource usage on your server, but it's also a pain to maintain.
Also, since your scripts will be running in a brand new process in a different execution environment, this won't have access to some variables (like $_SERVER or $_POST) or capabilities like sending headers.
And there's this thing called shared memory. As #Alex mentions, scripts have to be compiled. If you have opcode cache enabled (which you should) the bytecode gets cached when compiled and this compilation process can be skipped if the resulting bytecode it's there already. For this to work you need to have a persistent running process that can keep this memory around. If you are creating a new process it can't access this shared area and has to do the compilation all by itself.
I'm running a PHP code via CLI by setting a cron-job. The script reads about 10000 records from database and runs 10000 new scripts (by exec command) without waiting for previous script to be done. I use this because I want all those tasks run fast. (each one takes about 10 seconds).
When number of tasks that are running gets large, CPU usage become 100% and can't work with server (CentOS). How can I handle this?
You need to limit the number of scripts running in parallel at any given time because running 10,000 concurrent scripts is clearly saturating your system. Instead, you should queue up each task and process 25 or 50 (whatever causes a reasonable amount of load) tasks at the same time.
Without much knowledge of how these scripts actually work, I can't really give you much advice code-wise, but you definitely need to have a queue in place to limit the number of concurrent instances of your script running at the same time.
Also check out semaphores, they might be useful for this producer/consumer model
I recently wrote a script that handle parallel execution of commands. It basically allows the operator to tune the number of concurrent processes at runtime.
If you feel that CPU usage is too high, just decrease the value in /etc/maxprunning
CMD="path-of-your-php-script"
function getMaxPRunning {
cat /etc/maxprunning
}
function limitProcs {
PRUNNING=`ps auxw | grep -v grep | grep "$CMD" | wc -l`
#echo "Now running $PRUNNING processes, MAX:$MAXPRUNNING"
[ $PRUNNING -ge $MAXPRUNNING ] && {
sleep 1
return
}
#echo "Launching new process"
sleep 0.2
$CMD &
}
MAXPRUNNING=`getMaxPRunning`
C=1
while [ 1 ];do
MAXPRUNNING=`getMaxPRunning`
limitProcs
done
If you want your php scripts to ignore an accidental parent's death, put this line at the top of the php script
pcntl_signal(SIGINT, SIG_IGN);
I have a php-cli script that is run by cron every 5 minutes. Because this interval is short, multiple processes are run at the same time. That's not what I want, since this script has to write inside a text file a numeric id that is incremented each time. It happens that writers are writing at the same time on this text file, and the value written is incorrect.
I have tried to use php's flock function to block writing in the file, when another process is writing on it but it doesnt work.
$fw = fopen($path, 'r+');
if (flock($fw, LOCK_EX)) {
ftruncate($fw, 0);
fwrite($fw, $latestid);
fflush($fw);
flock($fw, LOCK_UN);
}
fclose($fw);
So I suppose that the solution to this is create a bash script that verifies if there is an instance of this php script that is running, if so it should wait until it finished. But I dont know how to do it, any ideas?
The solution I'm using with a bash script is this:
exec 9>/path/to/lock/file
if ! flock -n 9 ; then
echo "another instance is running";
exit 1
fi
# this now runs under the lock until 9 is closed (it will be closed automatically when the script ends)
A file descriptor 9> is created in /var/lock/file, and flock will exit a new process that's trying to run, unless there is no other instance of the script that is running.
How can I ensure that only one instance of a script is running at a time (mutual exclusion)?
I don't really understand how incrementing a counter every 5 minutes will result in multiple processes trying to write the counter file at the same time, but...
A much simpler approach is to use a simple locking mechanism similar to the below:
<?php
$lock_filename = 'nobodyshouldincrementthecounterwhenthisfileishere';
if(file_exists($lock_filename)) {
return;
}
touch($lock_filename);
// your stuff...
unlink($lock_filename);
This as a simple approach will not deal with a situation when the script breaks before it could remove the lock file, in which case it would never run again until it is removed.
More sophisticated approaches are also possible as you suggest, e.g. fork the job in its own process, write the PID into a file, then before running the job it could be checked whether that PID is still running.
To prevent start of a next session of any program until the previous session still running, such as next cron job, I recommend to use either built into your program or external check of running process of this program. Just execute before starting of your program
ps -ef|grep <process_name>|grep -v grep|wc -l
and check, if its result will be 0. Only in this case your program could be started.
I suppose, that you must guarantee an absence of 3rd party process having similar name. (For this purpose give your program a longer and unique name). And a name of your program must not contain pattern "grep".
This work good in combination with normal regular starting of your program, that is configured in a cron table, by cron daemon.
For the case if your check is written as an external script, an entry in the crontab might look like
<time_specification> <your_starter_script> <your_program> ...
2 important remarks: Exit code of your_starter_script must be 0 in case of not starting of your program and it would be better to completely prohibit writing to stdout or stderr by this script.
Such starter is very short and a simple programming exercise. Therefore I don't feel a need to provide its complete code.
Instead of using cron to run your script every 5 minutes, how about using at to schedule your script to run again, 5 minutes after it finishes. Near the end of your script, you can use shell_exec() to run an at command to schedule your script to run again in 5 minutes, like so:
at now + 5 minutes /path/to/script
Or, perhaps even simpler than my previous answer (using at to schedule the script to run again in 5 minutes) is make your script a daemon, by using a non-terminating loop, like so:
while(1) {
// whatever your script does here....
sleep(300) //wait 5 minutes
}
Then, you can do away with scheduling by way of cron or at altogether. Just simply run your script in the background from the command line, like so:
/path/to/your/script &
Or, add /path/to/your/script in /etc/rc.local to make your script start automatically when the machine boots.
So here is a little background info on my setup. Running Centos with apache and php 5.2.17. I have a website that lists products from many different retailers websites. I have crawler scripts that run to grab products from each website. Since every website is different, each crawler script had to be customized to crawl the particular retailers website. So basically I have 1 crawler per retailer. At this time I have 21 crawlers that are constantly running to gather and refresh the products from these websites. Each crawler is a php file and once the php script is done running it checks to ensure its the only instance of itself running and at the very end of the script it uses exec to start itself all over again while the original instance closes. This helps protect against memory leaks since each crawler restarts itself before it closes. However recently I will check the crawler scripts and notice that one of them Isnt running anymore and in the error log I find the following.
PHP Warning: exec() [<a href='function.exec'>function.exec</a>]: Unable to fork [nice -n 20 php -q /home/blahblah/crawler_script.php >/dev/null &]
This is what is supposed to start this particular crawler over again however since it was "unable to fork" it never restarted and the original instance of the crawler ended like it normally does.
Obviously its not a permission issue because each of these 21 crawler scripts runs this exec command every 5 or 10 minutes at the end of its run and most of the time it works as it should. This seems to happen maybe once or twice a day. It seems as though its a limit of some sort as I have only just recently started to see this happen ever since I added my 21st crawler. And its not always the same crawler that gets this error it will be any one of them at a random time that are unable to fork its restart exec command.
Does anyone have an idea what could be causing php to be unable to fork or maybe even a better way to handle these processes as to get around the error all together? Is there a process limit I should look into or something of that nature? Thanks in advance for help!
Process limit
"Is there a process limit I should look into"
It's suspected somebody (system admin?) set limitation of max user process. Could you try this?
$ ulimit -a
....
....
max user processes (-u) 16384
....
Run preceding command in PHP. Something like :
echo system("ulimit -a");
I searched whether php.ini or httpd.conf has this limit, but I couldn't find it.
Error Handling
"even a better way to handle these processes as to get around the error all together?"
The third parameter of exec() returns exit code of $cmd. 0 for success, non zero for error code. Refer to http://php.net/function.exec .
exec($cmd, &$output, &$ret_val);
if ($ret_val != 0)
{
// do stuff here
}
else
{
echo "success\n";
}
In my case (large PHPUnit test suite) it would say unable to fork once the process hit 57% memory usage. So, one more thing to watch for, it may not be a process limit but rather memory.
I ran into same problem and I tried this and it worked for me;
ulimit -n 4096
The problem is often caused by the system or the process or running out of available memory. Be sure that you have enough by running free -m. You will get a result like the following:
total used free shared buffers cached
Mem: 7985 7722 262 19 189 803
-/+ buffers/cache: 6729 1255
Swap: 0 0 0
The buffers/cache line is what you want to look at. Notice free memory is 1255 MB on this machine. When running your program keep trying free -m and check free memory to see if this falls into the low hundreds. If it does you will need to find a way to run you program while consumer less memory.
For anyone else who comes across this issue, it could be several problems as outlined in this question's answer.
However, my problem was my nginx user did not have a proper shell to execute the commands I wanted. Adding .bashrc to the nginx user's home directory fixed this.
I created a script that runs in the background using the ignore_user_abort() function. However, I was foolish enough not to insert any sort of code to make the script stop and now it is sending e-mails every 30 seconds...
Is there any way to stop the script? I am in a shared hosting, so I don't have access to the command prompt, and I don't know the PID.
Is there any way to stop the script? I am in a shared hosting, so I don't have access to the command prompt, and I don't know the PID.
Then no.
But are you sure you don't have any shell access? Even via PHP? If you do, you could try....
<?php
print `ps -ef | grep php`;
...and if you can identify the process from that then....
<?php
$pid=12345; // for example.
print `kill -9 $pid`;
And even if you don't have access to run shell commands, you may be able to find the pid in /proc (on a linux system) and terminate it using the POSIX extension....
<?php
$ps=glob('/proc/[0-9]*');
foreach ($ps as $p) {
if (is_dir($p) && is_writeable($p)) {
print "proc= " . basename($p);
$cmd=file_get_contents($p . '/cmdline');
print " / " . file_get_contents($p . '/cmdline');
if (preg_match('/(php).*(myscript.php)/',$cmd)) {
posix_kill(basename($p), SIGKILL);
print " xxxxx....";
break;
}
print "\n";
}
}
I came to this thread Yesterday! I by mistake had a infinite loop in a page which was not supposed to be visited and that increased my I/O to 100 and CPU usage to 100 I/O was because of some php errors and it was getting logged and log file size was increasing beyond anyone can think.
None of the above trick worked on my shared hosting.
MY SOLUTION
In cPanel, go to PHP Version (except that of current)
Select any PHP Version for time being.
And then Apply Changes.
REASON WHY IT WORKED
The script which had infinite loop with some php errors was a process so I just needed to kill it, changing php version reinforce restart of services like php and Apache, and as restart was involved earlier processes were killed, and I was relaxed as I/O and CPU usage stabilized. Also, I fixed that bug before hand changing the php version :)
how did you deploy the script? surely you can just remove it (if that's an acceptable option). otherwise modify it and insert some logic to only allow it to send a mail once every n mins/hours/days based on the server time?
re. stopping the script from executing (or rather the system trying to execute it) how did you schedule it for execution? is it some type of gui to a crontab or something? can you not just undo what you did there (seeing as you have no access to the command line/terminal)?
rob ganly
Simply .
Call the support, get it cancelled.
Next time, don't execute something you can't control.