C++ application collapsing after some hours

C++ application collapsing after some hours - php

i have an application written in C++ that uses opencv 2.0, curl and a opensurf library. First a PHP script (cron.php) calls proc_open and calls the C++ application (called icomparer). When it finishes processing N images returns groups saying which images are the same, after that the script uses:
shell_exec('php cron.php > /dev/null 2>&1 &');
die;
And starts again. Well, after 800 or 900 iterates my icomparer starts breaking. The system don't lets me create more files, in icomparer and in the php script.
proc_open(): unable to create pipe Too many open files (2)
shell_exec(): Unable to execute 'php cron.php > /dev/null 2>&1 &'
And curl fails too:
couldn't resolve host name (6)
Everything crashes. I think that i'm doing something wrong, for example, I dunno if starting another PHP process from a PHP process release resources.
In "icomparer" I'm closing all opened files. Maybe not releasing all mutex with mutex_destroy... but in each iterator the c++ application is closed, I think that all stuff is released right?
What I have to watch for? I have tried monitoring opened files with stof.
Php 5.2
Centos 5.X
1 GB ram
120 gb hard disk (4% used)
4 x intel xeon
Is a VPS (machine has 16 gb ram)
The process opens 10 threads and joins them.

Sounds like you're leaking file descriptors.

On Unix-alike systems, child processes inherit the open file descriptors of the parent. However, when the child process exits, it does close all of its copies of the open file descriptors but not the parent's copies.
So you are opening file descriptors in the parent and not closing them. My bet is that you are not closing the pipes returned by the proc_open() call.
And you'll also need to call proc_close() too.

Yeah, it looks like you're opening processes but don't close them after use and - as it seems - they are not closed automatically (which may work in some circumstances).
Make sure you close/terminate your process with proc_close($res) if you don't use the resource anymore.

Your application doesn't close it's files / sockets you can try to use the ulimit syscall with this you can remove the number of open files allowed per application. Have a look: man ulimit

Related

How can I avoid errors when PHP, C++ and shell script try to access the same file?

Are there methods in PHP, C++ and bash scripts that can make the respective program wait its turn when accessing a file ?
I have a web-page written in PHP that gives the user the ability to input 6 values:
URL
URL Refresh Interval
Brightness
Color1 in hex
Color2 in hex
Color3 in hex
These values will be written in configuration.txt.
Each time the web-page is accessed configuration.txt gets opened, the PHP gets some values from there and then closes it.
configuration.txt is also opened when one or more of the above values are submitted and then it gets closed.
Next, I have a bash that regularly wgets the URL from configuration.txt and writes the output to a different file, called url_response.txt.
while [ 0 ]
do
line=$(head -n 1 data/configuration.txt)
wget -q -i $line -O url_response.txt
sleep 2
done
This script will be put inside a C++ program.
Finally, the same C++ program will have to access url_response.txt to get and parse some strings from it and it will also have to access configuration.txt to get the three colors from it.
I am pretty sure that these 3 programs will intersect at one point and I don't want to find out what happens then.

A common way to avoid race conditions is to use a lock file. When a program tries to read or write to configuration.txt it checks the lock file first.
There are two kinds of locks:
shared lock
exclusive lock
A program can get a shared lock (read lock) as long as no other program has an exclusive lock. This is used to read file. Multiple programs can read a file as long as no other program write to that file.
A program can get an exclusive lock (write lock) only if no other program has a lock (neither exclusive nor shared). This is used to write to a file. As long as a program is reading or writing to a file other programs are not allowed to write.
On a linux system you can use flock to manage file locks.
Read:
flock --shared lockfile -c read.sh
Write
flock --exclusive lockfile -c write.sh
Usually this command will wait until the lock is available. With
flock --nonblock lockfile
the command will fail immediately instead of waiting.
From manpage
SYNOPSIS
flock [options] <file|directory> <command> [command args]
flock [options] <file|directory> -c <command>
flock [options] <file descriptor number>
DESCRIPTION
This utility manages flock(2) locks from within shell scripts or the command line.
The first and second forms wrap the lock around the executing a command, in a manner similar to su(1) or newgrp(1). It
locks a specified file or directory, which is created
(assuming appropriate permissions), if it does not already exist. By default, if the lock cannot be immediately acquired, flock
waits until the lock is available.
The third form uses open file by file descriptor number. See examples how that can be used.
Here is the manpage for c++ and here is the manpage for shell scripts.

So you want to avoid one of your readers getting a partial copy of the file as it is being written?
So the usual way to avoid this issue is that when you write the file, you write it to a different name in the same folder. Then use move to replace the original file.
Unix will ensure that if there are any existing readers for the original file, the file is kept as a hidden file until they all close. Any new readers will see the newly moved file. No readers should see a broken file.
To do the same thing by modifying the file, the best you can do is keep some sort of serial number in the file. The writer should update the serial before it writes, and again after. Any reader should read the serial before and after the read. If the serial changes, then the read is invalid and should be repeated. The issue of ensuring the data is not cached also needs addressing. This is OK for occasionally updating files, but clearly it will impair the performance of the readers if the content is frequently updated.

GNU Parallel as job queue processor

I have a worker.php file as below
<?php
$data = $argv[1];
//then some time consuming $data processing
and I run this as a poor man's job queue using gnu parallel
while read LINE; do echo $LINE; done < very_big_file_10GB.txt | parallel -u php worker.php
which kind of works by forking 4 php processes when I am on 4 cpu machine.
But it still feels pretty synchronous to me because read LINE is still reading one line at a time.
Since it is 10GB file, I am wondering if somehow I can use parallel to read the same file in parallel by splitting it into n parts (where n = number of my cpus), that will make my import n times faster (ideally).

No need to do the while business:
parallel -u php worker.php :::: very_big_file_10GB.txt
-u Ungroup output. Only use this if you are not going to use the output, as output from different jobs may mix.
:::: File input source. Equivalent to -a.
I think you will benefit from reading at least chapter 2 (Learn GNU Parallel in 15 minutes) of "GNU Parallel 2018". You can buy it at
http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
or download it at: https://doi.org/10.5281/zenodo.1146014

PHP Warning: exec() unable to fork

So here is a little background info on my setup. Running Centos with apache and php 5.2.17. I have a website that lists products from many different retailers websites. I have crawler scripts that run to grab products from each website. Since every website is different, each crawler script had to be customized to crawl the particular retailers website. So basically I have 1 crawler per retailer. At this time I have 21 crawlers that are constantly running to gather and refresh the products from these websites. Each crawler is a php file and once the php script is done running it checks to ensure its the only instance of itself running and at the very end of the script it uses exec to start itself all over again while the original instance closes. This helps protect against memory leaks since each crawler restarts itself before it closes. However recently I will check the crawler scripts and notice that one of them Isnt running anymore and in the error log I find the following.
PHP Warning: exec() [<a href='function.exec'>function.exec</a>]: Unable to fork [nice -n 20 php -q /home/blahblah/crawler_script.php >/dev/null &]
This is what is supposed to start this particular crawler over again however since it was "unable to fork" it never restarted and the original instance of the crawler ended like it normally does.
Obviously its not a permission issue because each of these 21 crawler scripts runs this exec command every 5 or 10 minutes at the end of its run and most of the time it works as it should. This seems to happen maybe once or twice a day. It seems as though its a limit of some sort as I have only just recently started to see this happen ever since I added my 21st crawler. And its not always the same crawler that gets this error it will be any one of them at a random time that are unable to fork its restart exec command.
Does anyone have an idea what could be causing php to be unable to fork or maybe even a better way to handle these processes as to get around the error all together? Is there a process limit I should look into or something of that nature? Thanks in advance for help!

Process limit
"Is there a process limit I should look into"
It's suspected somebody (system admin?) set limitation of max user process. Could you try this?
$ ulimit -a
....
....
max user processes (-u) 16384
....
Run preceding command in PHP. Something like :
echo system("ulimit -a");
I searched whether php.ini or httpd.conf has this limit, but I couldn't find it.
Error Handling
"even a better way to handle these processes as to get around the error all together?"
The third parameter of exec() returns exit code of $cmd. 0 for success, non zero for error code. Refer to http://php.net/function.exec .
exec($cmd, &$output, &$ret_val);
if ($ret_val != 0)
{
// do stuff here
}
else
{
echo "success\n";
}

In my case (large PHPUnit test suite) it would say unable to fork once the process hit 57% memory usage. So, one more thing to watch for, it may not be a process limit but rather memory.

I ran into same problem and I tried this and it worked for me;
ulimit -n 4096

The problem is often caused by the system or the process or running out of available memory. Be sure that you have enough by running free -m. You will get a result like the following:
total used free shared buffers cached
Mem: 7985 7722 262 19 189 803
-/+ buffers/cache: 6729 1255
Swap: 0 0 0
The buffers/cache line is what you want to look at. Notice free memory is 1255 MB on this machine. When running your program keep trying free -m and check free memory to see if this falls into the low hundreds. If it does you will need to find a way to run you program while consumer less memory.

For anyone else who comes across this issue, it could be several problems as outlined in this question's answer.
However, my problem was my nginx user did not have a proper shell to execute the commands I wanted. Adding .bashrc to the nginx user's home directory fixed this.

PHP script is killed without explanation

I'm starting my php script in the following way:
bash
cd 'path'
php -f 'scriptname'.php
There is no output while the php script is running.
After a time, the php script responds with:
Killed
My idea is that it reached the memory_limit: ini_set('memory_limit', '40960M');
Increasing the memory limit seemed to solve the problem, but it only increased the edge.
What exactly does that Killed phrase mean?

Your process is killed. There could be a multitude of reasons, but it's easy to discard some of the more obvious.
php limits: if you run into a php limit, you'll get an error in the logfile, and probably on the commandline as well. This normally does not print 'killed'
the session-is-ended-issues: if you still have your session, then your session is obvioiusly not ended, so disregard all the nohup and & stuff
If your server is starved for resources (no memory, no swap), the kernel might kill your process. This is probably what's happening.
In anycase: your process is getting send a signal that it should stop. Normally only a couple of 'things' can do this
your account (e.g. you kill the process)
an admin user (e.g. root)
the kernel when it is really needing your memory for itself.
maybe some automated process, for instance, if you live on a shared server and you take up more then your share of resources.
references: Who "Killed" my process and why?

You could be running out of memory in the PHP script. Here is how to reproduce that error:
I'm doing this example on Ubuntu 12.10 with PHP 5.3.10:
Create this PHP script called m.php and save it:
<?php
function repeat(){
repeat();
}
repeat();
?>
Run it:
el#apollo:~/foo$ php m.php
Killed
The program takes 100% CPU for about 15 seconds then stops. Look at dmesg | grep php and there are clues:
el#apollo:~/foo$ dmesg | grep php
[2387779.707894] Out of memory: Kill process 2114 (php) score 868 or
sacrifice child
So in my case, the PHP program printed "Killed" and halted because it ran out of memory due to an infinite loop.
Solutions:
Increase the amount of RAM available.
Break down the problem set into smaller chunks that operate sequentially.
Rewrite the program so it has a much smaller memory requirements.

Killed is what bash says when a process exits after a SIGKILL, it's not related to putty.
Terminated is what bash says when a process exits after a a SIGTERM.
You are not running into PHP limits, you may be running into a different problem, see:
Return code when OOM killer kills a process

http://en.wikipedia.org/wiki/Nohup
Try using nohup before your command.
nohup catches the hangup signal while the ampersand doesn't (except the shell is confgured that way or doesn't send SIGHUP at all).
Normally, when running a command using & and exiting the shell afterwards, the shell will terminate the sub-command with the hangup signal (kill -SIGHUP ). This can be prevented using nohup, as it catches the signal and ignores it so that it never reaches the actual application.
In case you're using bash, you can use the command shopt | grep hupon to find out whether your shell sends SIGHUP to its child processes or not. If it is off, processes won't be terminated, as it seems to be the case for you.
There are cases where nohup does not work, for example when the process you start reconnects the NOHUP signal.
nohup php -f 'yourscript'.php

If you are already taking care of php.ini settings related with script memory and timeout then may be its linux ssh connection which terminating in active session or some thing like that.
You can use 'nohup' linux command run a command immune to hangups
shell> nohup php -f 'scriptname'.php
Edit:- You can close your session by adding '&' at end of command:-
shell> nohup php -f 'scriptname'.php &> /dev/null &
'&' operater at end of any comand in linux move that command in background

Identify which PHP script is running?

I have a large PHP application and I'm looking for a way to know which PHP script is running at a given moment. Something like when you run "top" on a Linux command line but for PHP.

Are you trying to do so from within the PHP application, or outside of it? If you're inside the PHP code, entering debug_print_backtrace(); at that point will show you the 'tree' of PHP files that were included to get you at that point.
If you're outside the PHP script, you can only see the one process that called the original PHP script (index.php or whatnot), unless the application spawns parallel threads as part of its execution.

If you're looking for this information at the system level, e.g. all php files running under any Apache child process, or even any PHP files in use by other apps, there is the lsof program (list open files), which will spit out by default ALL open files on the system (executables, sockets, fifos, .so's, etc...). You can grep the output for '.php' and get a pretty complete picture of what's in use at that moment.

This old post shows a way you can wrap your calls to php scripts and get a PID for each process.
Does PHP have threading?
$cmd = 'nohup nice -n 10 /usr/bin/php -c /path/to/php.ini -f /path/to/php/file.php action=generate var1_id=23 var2_id=35 gen_id=535 > /path/to/log/file.log & echo $!';
$pid = shell_exec($cmd);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.