Limit cpu usage on multiple PHP script running asyncronously

Limit cpu usage on multiple PHP script running asyncronously - php

I'm running a PHP code via CLI by setting a cron-job. The script reads about 10000 records from database and runs 10000 new scripts (by exec command) without waiting for previous script to be done. I use this because I want all those tasks run fast. (each one takes about 10 seconds).
When number of tasks that are running gets large, CPU usage become 100% and can't work with server (CentOS). How can I handle this?

You need to limit the number of scripts running in parallel at any given time because running 10,000 concurrent scripts is clearly saturating your system. Instead, you should queue up each task and process 25 or 50 (whatever causes a reasonable amount of load) tasks at the same time.
Without much knowledge of how these scripts actually work, I can't really give you much advice code-wise, but you definitely need to have a queue in place to limit the number of concurrent instances of your script running at the same time.
Also check out semaphores, they might be useful for this producer/consumer model

I recently wrote a script that handle parallel execution of commands. It basically allows the operator to tune the number of concurrent processes at runtime.
If you feel that CPU usage is too high, just decrease the value in /etc/maxprunning
CMD="path-of-your-php-script"
function getMaxPRunning {
cat /etc/maxprunning
}
function limitProcs {
PRUNNING=`ps auxw | grep -v grep | grep "$CMD" | wc -l`
#echo "Now running $PRUNNING processes, MAX:$MAXPRUNNING"
[ $PRUNNING -ge $MAXPRUNNING ] && {
sleep 1
return
}
#echo "Launching new process"
sleep 0.2
$CMD &
}
MAXPRUNNING=`getMaxPRunning`
C=1
while [ 1 ];do
MAXPRUNNING=`getMaxPRunning`
limitProcs
done
If you want your php scripts to ignore an accidental parent's death, put this line at the top of the php script
pcntl_signal(SIGINT, SIG_IGN);

Related

Top shows a lot of php processes sleeping

We have a utility server for running scheduled php cronjobs,and after a while when I check top, it shows around 100~ sleeping php processes like this:
6310 user 20 0 223m 16m 9748 S 0.0 0.1 0:00.04 /usr/local/bin/php72-cgi -d open_basedir="/home/user/:/tmp:/usr/share/pear"
As the process list doesn't show the parent script or the actual script name it's hard for me to diagnose which script is generating so many sleeping processes. As you can see, the memory usage is minimal, but I still feel like this shouldn't happen. What's the easiest way to find out the culprit script and should I even worry about sleeping processes?

To understand why, you should list the process with its parent, thanks to this command (adapt the pid to your current execution):
pstree -H 6310 -ps

GNU Parallel as job queue processor

I have a worker.php file as below
<?php
$data = $argv[1];
//then some time consuming $data processing
and I run this as a poor man's job queue using gnu parallel
while read LINE; do echo $LINE; done < very_big_file_10GB.txt | parallel -u php worker.php
which kind of works by forking 4 php processes when I am on 4 cpu machine.
But it still feels pretty synchronous to me because read LINE is still reading one line at a time.
Since it is 10GB file, I am wondering if somehow I can use parallel to read the same file in parallel by splitting it into n parts (where n = number of my cpus), that will make my import n times faster (ideally).

No need to do the while business:
parallel -u php worker.php :::: very_big_file_10GB.txt
-u Ungroup output. Only use this if you are not going to use the output, as output from different jobs may mix.
:::: File input source. Equivalent to -a.
I think you will benefit from reading at least chapter 2 (Learn GNU Parallel in 15 minutes) of "GNU Parallel 2018". You can buy it at
http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
or download it at: https://doi.org/10.5281/zenodo.1146014

Execute Lua on shell and get memory size and execution time

For a project, I have to execute Lua scripts and I want some information about the execution: memory size, CPU time, and runtime. I didn't find anything about some of the parameters for the Lua compiler at lua.org. Does anyone know the parameters or a compiler with this features? Later on, I want to analyze my Lua scripts with PHP.
Shell exec via PHP
and get information back (mem size, cpu time, runtime)
Thanks!

To analyze your Lua script, just wrap it in the following lines of code:
local time_start = os.time()
--------------------------------------
-- Your script is here
-- For example, just spend 5 seconds in CPU busy loop
repeat until os.clock() > 5
-- or call some external file containing your original script:
-- dofile("path/to/your_original_script.lua")
--------------------------------------
local mem_KBytes = collectgarbage("count") -- memory currently occupied by Lua
local CPU_seconds = os.clock() -- CPU time consumed
local runtime_seconds = os.time() - time_start -- "wall clock" time elapsed
print(mem_KBytes, CPU_seconds, runtime_seconds)
-- Output to stdout: 24.0205078125 5.000009 5
Now you can execute this script with shell command lua path/to/this_script.lua and analyze the last line printed to stdout to get the information.

PHP script is killed without explanation

I'm starting my php script in the following way:
bash
cd 'path'
php -f 'scriptname'.php
There is no output while the php script is running.
After a time, the php script responds with:
Killed
My idea is that it reached the memory_limit: ini_set('memory_limit', '40960M');
Increasing the memory limit seemed to solve the problem, but it only increased the edge.
What exactly does that Killed phrase mean?

Your process is killed. There could be a multitude of reasons, but it's easy to discard some of the more obvious.
php limits: if you run into a php limit, you'll get an error in the logfile, and probably on the commandline as well. This normally does not print 'killed'
the session-is-ended-issues: if you still have your session, then your session is obvioiusly not ended, so disregard all the nohup and & stuff
If your server is starved for resources (no memory, no swap), the kernel might kill your process. This is probably what's happening.
In anycase: your process is getting send a signal that it should stop. Normally only a couple of 'things' can do this
your account (e.g. you kill the process)
an admin user (e.g. root)
the kernel when it is really needing your memory for itself.
maybe some automated process, for instance, if you live on a shared server and you take up more then your share of resources.
references: Who "Killed" my process and why?

You could be running out of memory in the PHP script. Here is how to reproduce that error:
I'm doing this example on Ubuntu 12.10 with PHP 5.3.10:
Create this PHP script called m.php and save it:
<?php
function repeat(){
repeat();
}
repeat();
?>
Run it:
el#apollo:~/foo$ php m.php
Killed
The program takes 100% CPU for about 15 seconds then stops. Look at dmesg | grep php and there are clues:
el#apollo:~/foo$ dmesg | grep php
[2387779.707894] Out of memory: Kill process 2114 (php) score 868 or
sacrifice child
So in my case, the PHP program printed "Killed" and halted because it ran out of memory due to an infinite loop.
Solutions:
Increase the amount of RAM available.
Break down the problem set into smaller chunks that operate sequentially.
Rewrite the program so it has a much smaller memory requirements.

Killed is what bash says when a process exits after a SIGKILL, it's not related to putty.
Terminated is what bash says when a process exits after a a SIGTERM.
You are not running into PHP limits, you may be running into a different problem, see:
Return code when OOM killer kills a process

http://en.wikipedia.org/wiki/Nohup
Try using nohup before your command.
nohup catches the hangup signal while the ampersand doesn't (except the shell is confgured that way or doesn't send SIGHUP at all).
Normally, when running a command using & and exiting the shell afterwards, the shell will terminate the sub-command with the hangup signal (kill -SIGHUP ). This can be prevented using nohup, as it catches the signal and ignores it so that it never reaches the actual application.
In case you're using bash, you can use the command shopt | grep hupon to find out whether your shell sends SIGHUP to its child processes or not. If it is off, processes won't be terminated, as it seems to be the case for you.
There are cases where nohup does not work, for example when the process you start reconnects the NOHUP signal.
nohup php -f 'yourscript'.php

If you are already taking care of php.ini settings related with script memory and timeout then may be its linux ssh connection which terminating in active session or some thing like that.
You can use 'nohup' linux command run a command immune to hangups
shell> nohup php -f 'scriptname'.php
Edit:- You can close your session by adding '&' at end of command:-
shell> nohup php -f 'scriptname'.php &> /dev/null &
'&' operater at end of any comand in linux move that command in background

Slow cronjobs on Cent OS 5

I have 1 cronjob that runs every 60 minutes but for some reason, recently, it is running slow.
Env: centos5 + apache2 + mysql5.5 + php 5.3.3 / raid 10/10k HDD / 16gig ram / 4 xeon processor
Here's what the cronjob do:
parse the last 60 minutes data
a) 1 process parse user agent and save the data to the database
b) 1 process parse impressions/clicks on the website and save them to the database
from the data in step 1
a) build a small report and send emails to the administrator/bussiness
b) save the report into a daily table (available in the admin section)
I see now 8 processes (the same file) when I run the command ps auxf | grep process_stats_hourly.php (found this command in stackoverflow)
Technically I should only have 1 not 8.
Is there any tool in Cent OS or something I can do to make sure my cronjob will run every hour and not overlapping the next one?
Thanks

Your hardware seems to be good enough to process this.
1) Check if you already have hanging processes. Using the ps auxf (see tcurvelo answer), check if you have one or more processes that takes too much resources. Maybe you don't have enough resources to run your cronjob.
2) Check your network connections:
If your databases and your cronjob are on a different server you should check whats the response time between these two machines. Maybe you have network issues that makes the cronjob wait for the network to send the package back.
You can use: Netcat, Iperf, mtr or ttcp
3) Server configuration
Is your server is configured correctly? Your OS, MySQL are setup correctly? I would recommend to read these articles:
http://www3.wiredgorilla.com/content/view/220/53/
http://www.vr.org/knowledgebase/1002/Optimize-and-disable-default-CentOS-services.html
http://dev.mysql.com/doc/refman/5.1/en/starting-server.html
http://www.linux-mag.com/id/7473/
4) Check your database:
Make sure your database has the correct indexes and make sure your queries are optimized. Read this article about the explain command
If a query with few hundreds thousands of record takes times to execute that will affect the rest of your cronjob, if you have a query inside a loop, even worse.
Read these articles:
http://dev.mysql.com/doc/refman/5.0/en/optimization.html
http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
http://blog.fedecarg.com/2008/06/12/10-great-articles-for-optimizing-mysql-queries/
5) Trace and optimized PHP code?
Make sure your PHP code runs as fast as possible.
Read these articles:
http://phplens.com/lens/php-book/optimizing-debugging-php.php
http://code.google.com/speed/articles/optimizing-php.html
http://ilia.ws/archives/12-PHP-Optimization-Tricks.html
A good technique to validate your cronjob is to trace your cronjob script:
Based on your cronjob process, put some debug trace including how much memory, how much time it took to execute the last process. eg:
<?php
echo "\n-------------- DEBUG --------------\n";
echo "memory (start): " . memory_get_usage(TRUE) . "\n";
$startTime = microtime(TRUE);
// some process
$end = microtime(TRUE);
echo "\n-------------- DEBUG --------------\n";
echo "memory after some process: " . memory_get_usage(TRUE) . "\n";
echo "executed time: " . ($end-$start) . "\n";
By doing that you can easily find which process takes how much memory and how long it takes to execute it.
6) External servers/web service calls
Is your cronjob calls external servers or web service? if so, make sure these are loaded as fast as possible. If you request data from a third-party server and this server takes few seconds to return an answer that will affect the speed of your cronjob specially if these calls are in loops.
Try that and let me know what you find.

The ps's output also shows when the process have started (see column STARTED).
$ ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 2 0.0 0.0 0 0 ? S 18:55 0:00 [ktrheadd]
^^^^^^^
(...)
Or you can customize the output:
$ ps axfo start,command
STARTED COMMAND
18:55 [ktrheadd]
(...)
Thus, you can be sure if they are overlapping.

You should use a lockfile mechanism within your process_stats_hourly.php script. Doesn't have to be anything overly complex, you could have php write the PID which started the process to a file like /var/mydir/process_stats_hourly.txt. So if it takes longer than an hour to process the stats and cron kicks off another instance of the process_stats_hourly.php script, it can check to see if the lockfile already exists, if it does it will not run.
However you are left with the problem of how to "re-queue" the hourly script if it did find the lock file and couldn't start.

You might use strace -p 1234 where 1234 is a relevant process id, on one of the processes which is running too long. Perhaps you'll understand why is it so slow, or even blocked.

Is there any tool in Cent OS or something I can do to make sure my cronjob will run every hour and not overlapping the next one?
Yes. CentOS' standard util-linux package provides a command-line convenience for filesystem locking. As Digital Precision suggested, a lockfile is an easy way to synchronize processes.
Try invoking your cronjob as follows:
flock -n /var/tmp/stats.lock process_stats_hourly.php || logger -p cron.err 'Unable to lock stats.lock'
You'll need to edit paths and adjust for $PATH as appropriate. That invocation will attempt to lock stats.lock, spawning your stats script if successful, otherwise giving up and logging the failure.
Alternatively your script could call PHP's flock() itself to achieve the same effect, but the flock(1) utility is already there for you.

How often is that logfile rotated?
A log-parsing job suddenly taking longer than usual sounds like the log isn't being rotated and is now too big for the parser to handle efficiently.
Try resetting the logfile and see if the job runs faster. If that solves the problem, I recommend logrotate as a means of preventing the problem in the future.

You could add a step to the cronjob to check the output of your above command:
ps auxf | grep process_stats_hourly.php
Keep looping until the command returns nothing, indicating that the process isn't running, then allow the remaining code to execute.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Limit cpu usage on multiple PHP script running asyncronously - php

Related

Top shows a lot of php processes sleeping

GNU Parallel as job queue processor

Execute Lua on shell and get memory size and execution time

PHP script is killed without explanation

Slow cronjobs on Cent OS 5

Categories

Resources