Apache + php 7 + FPM = sudden system breaks

Apache + php 7 + FPM = sudden system breaks - php

I've an apache2 + php7 server running a simple wordpress blog.
The blog ran all the time with apache + mod_php + php7, but recently the during some access peak the system was crashing, getting to slow and even breaking.
So I googled how to optimize the configurations and many tutorials said mod_php is slow and I should replace it for php-fpm.
I did and after the change the site was noteciable faster, but now it randomly crashes and start presenting http 500 error...
There is no obvius reason for the new crashes, no users peak or any other situation I could notice.
the apache error log is plent of:
[fastcgi:error] [pid 37179] [client 162.158.167.177:26270] FastCGI: incomplete headers (0 bytes) received from server "/usr/lib/cgi-bin/php-fcgi"
[fastcgi:error] [pid 37176] (104)Connection reset by peer: [client 103.22.200.111:25406] FastCGI: comm with server "/usr/lib/cgi-bin/php-fcgi" aborted: read failed, referer: http://www.fqn.com.br/wordpress/wp-content/plugins/jetpack/css/jetpack.css
there are really thousands of errors like this, every two seconds one error and i dont understand.
First why is apache asking a css to the fpm?
Second what "/usr/lib/cgi-bin/php-fcgi" is supposed to be? there is no file in this folder!! what is supposed to be there?
The log of php-fpm is tottaly useless, I enabled the DEBUG level of loggind and what I get is just:
DEBUG: pid 1664, fpm_pctl_perform_idle_server_maintenance(), line 379: [pool www] currently 1 active children, 2 spare children, 3 running children. Spawning rate 2
every second one status message like this and at random intervals some:
WARNING: pid 1664, fpm_children_bury(), line 252: [pool www] child 38554 exited on signal 11 (SIGSEGV) after 58.797353 seconds from start
but no stack trace or detailed error message to help me to understand. I really liked the performance of apache + fpm and didn't want to roll back to mod_php, but it's impossible to run the system for 12h without crashes in the current configuration.
the bellow link shows the php_info page of the server
https://jpst.it/11FIP
does someone have an idea?

It is a programming fault.... somewhere in the php engine
and it seems to be invoked by a program setting it's input to non-blocking or stopping too soon after starting.
(exec(), shell_exec(), proc_open() which fail)...
It seems the PHP developers are not very eager, to put it mildly, to solve the issue. It has been known to exist for YEARS (2012 is the oldest ticket I have seen)
https://bugs.php.net/bug.php?id=73056

Related

libpcre.so segfault/Segmentation fault - ERR_EMPTY_RESPONSE in Browser

I have a Magento 1.9 shop on a Ubuntu server - it has run without any problems for about a year. When I now open /cron.php in my Chrome browser, I get the following error: ERR_EMPTY_RESPONSE
I have looked into my apache errors log. It says:
[core:notice] [pid 1992] AH00051: child pid 2083 exit signal
Segmentation fault (11), possible coredump in /etc/apache2
... everytime when I try to load the cron.php script.
My syslog shows the following error:
kernel: [ 774.131560] php[2406]: segfault at 7ffc036eceb8 ip
00007f7c1fd7a55a sp 00007ffc036eceb0 error 6 in
libpcre.so.3.13.1[7f7c1fd67000+3d000]
I've found out, that the error occurs at a recursive function in my php script. The function looks quite ok to me.

It turned out that the PHP script had a recurive function with an infinite loop under certain conditions. I would have thought this would just lead to a script timeout. Maybe someone can explain in detail...

php-fpm child process exited on signal 11

Our application runs in a Docker container on AWS:
Operating system: Ubuntu 14.04.2 LTS (Trusty Tahr)
Nginx version: nginx/1.4.6 (Ubuntu)
Memcached version: memcached 1.4.14
PHP version: PHP 5.5.9-1ubuntu4.11 (cli) (built: Jul 2 2015 15:23:08)
System Memory: 7.5 GB
We get blank pages and a 404 Error less frequently. While checking the logs, I found that the php-child process is killed and it seems that memory is mostly used by memcache and php-fpm process and very low free memory.
memcache is configured to use 2 GB memory.
Here is php www.conf
pm = dynamic
pm.max_children = 30
pm.start_servers = 9
pm.min_spare_servers = 4
pm.max_spare_servers = 14
rlimit_files = 131072
rlimit_core = unlimited
Error logs
/var/log/nginx/php5-fpm.log
[29-Jul-2015 14:37:09] WARNING: [pool www] child 259 exited on signal 11 (SIGSEGV - core dumped) after 1339.412219 seconds from start
/var/log/nginx/error.log
2015/07/29 14:37:09 [error] 141#0: *2810 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: x.x.x.x, server: _, request: "GET /suggestions/business?q=Selectfrom HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "example.com", referrer: "http://example.com/"
/var/log/nginx/php5-fpm.log
[29-Jul-2015 14:37:09] NOTICE: [pool www] child 375 started
/var/log/nginx/php5-fpm.log:[29-Jul-2015 14:37:56] WARNING: [pool www] child 290 exited on signal 11 (SIGSEGV - core dumped) after 1078.606356 seconds from start
Coredump
Core was generated by php-fpm: pool www.Program terminated with signal SIGSEGV, Segmentation fault.#0 0x00007f41ccaea13a in memcached_io_readline(memcached_server_st*, char*, unsigned long, unsigned long&) () from /usr/lib/x86_64-linux-gnu/libmemcached.so.10
dmesg
[Wed Jul 29 14:26:15 2015] php5-fpm[12193]: segfault at 7f41c9e8e2da ip 00007f41ccaea13a sp 00007ffcc5730ce0 error 4 in libmemcached.so.10.0.0[7f41ccad2000+32000]
[Wed Jul 29 14:28:26 2015] php5-fpm[12211]: segfault at 7f41c966b2da ip 00007f41ccaea13a sp 00007ffcc5730ce0 error 4 in libmemcached.so.10.0.0[7f41ccad2000+32000]
[Wed Jul 29 14:29:16 2015] php5-fpm[12371]: segfault at 7f41c9e972da ip 00007f41ccaea13a sp 00007ffcc5730b70 error 4 in libmemcached.so.10.0.0[7f41ccad2000+32000]
[Wed Jul 29 14:35:36 2015] php5-fpm[12469]: segfault at 7f41c96961e9 ip 00007f41ccaea13a sp 00007ffcc5730ce0 error 4 in libmemcached.so.10.0.0[7f41ccad2000+32000]
[Wed Jul 29 14:35:43 2015] php5-fpm[12142]: segfault at 7f41c9e6c2bd ip 00007f41ccaea13a sp 00007ffcc5730b70 error 4 in libmemcached.so.10.0.0[7f41ccad2000+32000]
[Wed Jul 29 14:37:07 2015] php5-fpm[11917]: segfault at 7f41c9dd22bd ip 00007f41ccaea13a sp 00007ffcc5730ce0 error 4 in libmemcached.so.10.0.0[7f41ccad2000+32000]
[Wed Jul 29 14:37:54 2015] php5-fpm[12083]: segfault at 7f41c9db72bd ip 00007f41ccaea13a sp 00007ffcc5730ce0 error 4 in libmemcached.so.10.0.0[7f41ccad2000+32000]

While googling for this same issue, and trying hard to find a solution that was not related to sessions (because I have ruled that out) nor to bad PHP code (because I have several websites running precisely the same version of WordPress, and none have issues... except for one), I came upon an answer telling that a possible solution did involve removing some buggy extension (usually memcache/d, but could be something else).
Since I had this same site working flawlessly on one Ubuntu server, when switching to a newer server, I immediately suspected that it was the migration from PHP 5.5 to 7 that caused the problem. It was just strange because no other website was affected. Then I remembered that another thing was different on this new server: I had also installed New Relic. This is both an extension and a small server that runs in the background and sends a lot of analytics data to New Relic for processing. Allegedly, it's a PHP 5 extension, but, surprisingly, it loads well on PHP 7, too.
Now here comes the tricky bit. At some point, I had installed W3 Total Cache for the WordPress installation of that particular website. Subsequently, I saw that the performance of that server was so stellar that W3TC was unnecessary, and simply stuck to a much simpler configuration. So I could uninstall W3TC. That's all very nice, but... I forgot that I had turned New Relic on W3TC, too (allegedly, it adds some extra analytics data to be sent to New Relic). When uninstalling W3TC, probably there was 'something' left on the New Relic configuration in my server which was still attempting to send data through the W3TC interface (assuming that W3TC has an interface... I really have no idea how it works at that level), and, because that specific bit of code was missing, the php_fpm handler for that website would fail... some of the time. Not all the time, because I'm assuming that, in most cases, nginx was sending static pages back. Or maybe php_fpm, set to 'recycle' after 100 calls or so, would crash-on-stop. Whatever exactly was happening, it was definitely related to New Relic — as soon as I removed the New Relic extension from PHP, that website went back to working normally.
Because this is such a specific scenario, I'm just writing this as an answer, in the remote chance that someone in the future googles for the exact problem.

In my case it was related to zend debug/xdebug. It forwards some TCP packets to the IDE (PhpStorm), that was not listening on this port (debugging was off). The solution is to either disable these extensions or enable debug listening on the debugging port.

I had this problem after installing xdebug, adding some properties to /etc/php/7.1/fpm/php.ini and restarting nginx. This is running on a Homestead Laravel box.
Simply restarting the php7.1-fpm service solved it for me.

It can happen if PHP is unable to write the session information to a file. By default it is /var/lib/php/session. You can change it by using configuration session_save_path.
phpMyAdmin having problems on nginx and php-fpm on RHEL 6

In my case it was Xdebug. After uninstalling it, it got back to normal.

In my case, it was caused by the New Relic PHP Agent. Therefore, for a specific function that caused a crash, I added this code to disable New Relic:
if (function_exists('newrelic_ignore_transaction')) {
newrelic_ignore_transaction();
}
Refer to: https://discuss.newrelic.com/t/how-to-disable-a-specific-transaction-in-php-agent/42384/2

In our case it was caused by Guzzle + New Relic. In the New Relic Agent changelog they've mentioned that in version 7.3 there was some Guzzle fix, but even using the 8.0 didn't work, so there is still something wrong. In our case this was happening only in two of our scripts that were using Guzzle. We found that there are two solutions:
Set newrelic.guzzle.enabled = false in newrelic.ini. You will lose data in the External Services tab this way, but you might not need it anyway.
Downgrade New Relic Agent to version 6.x that somehow also works
If you are reading this when they've released something newer than version 8.0, you could also try to update New Relic Agent to the latest and maybe they fixed that

In my case I had deactivated the buffering function ob_start("buffer"); in my code ;)

A possible problem is PHP 7.3 + Xdebug. Please change Xdebug 2.7.0beta1 to Xdebug 2.7.0rc1 or the latest version of Xdebug.

For some reason, when I remove profile from my xdebug.ini modes, it fixes it for me.
i.e. change
xdebug.mode=debug,develop,profile
to
xdebug.mode=debug,develop

Apache error log with mmap cache errors

I found many errors like this one
[Wed Nov 06 14:34:01 2013] [warn-phpd] mmap cache can't open C:\www\somefile.php (pid 4484 th 1668)
in my Apache error.log file. I tried to pinpoint the source of the error for some time but with no luck so far.
I find out that PHP Opcache is not the culprit.
error_log did not help. I think that my PHP source codes do not affect the error.
My stack: Apache 2.4.6, Windows, PHP 5.4.20
Did anyone encounter the same error?
Note: The error message I get is not the same as, for example, the error:
Mon Dec 1 21:08:20 2008] [warn-phpd] mmap cache can't open /var/www/vhosts/domain.com/httpdocs/file.php - Permission denied (pid 7831)
where there is a reason why mmap can't open the file.

This is caused by the total number of files that are opened by the server. If this is on a hosting company then they would be able to resolve this for you, if you are on your own system then try these steps:
Edit the apache startup script, \Program Files\Apache Software Foundation\Apache2.2\etc\init.d\httpd (may be different on your system) and add this before anything else:
ulimit -n 20480 #Raise the ulimit to a higher value then you have
Then Restart apache using httpd.exe restart
Hope this points you in a general direction

Disable MMAP. It's not supported on Windows.
It's an efficient method to map files to memory, to work on their content. Similar story with sendfile, an efficient method to send the content of a file as a response.
# https://httpd.apache.org/docs/2.4/en/mod/core.html#enablemmap
EnableMMAP On
EnableSendfile Off

Mon Dec 1 21:08:20 2008] [warn-phpd] mmap cache can't open
/var/www/vhosts/domain.com/httpdocs/file.php - Permission denied (pid
7831)
It seems that mmap doesn't has the rights to open the file, check file's folder rights .
Check file's folder properties .

Connection reset by peer: mod_fcgid: error reading data from FastCGI server

I am having issue on PHP where my app is trying to run a php backup file and suddenly getting HTTP Error 500 Code. I have checked the logs and this what it saying.
[Tue Aug 28 14:17:28 2012] [warn] [client x.x.x.x] (104)Connection reset by peer: mod_fcgid: error reading data from FastCGI server, referer: http://example.com/backup/backup.php
[Tue Aug 28 14:17:28 2012] [error] [client x.x.x.x] Premature end of script headers: backup.php, referer: http://example.com/backup/backup.php
Anyone knows how to fix this? I'm really stuck in here and can't find solution in internet.
Hope anyone could share their knowledge.
Thanks.
James

I managed to solved this by adding FcgidBusyTimeout . Just in case if anyone have similar issue with me.
Here is my settings on my apache.conf:
<VirtualHost *:80>
.......
<IfModule mod_fcgid.c>
FcgidBusyTimeout 3600
</IfModule>
</VirtualHost>

I had very similar errors in the Apache2 log files:
(104)Connection reset by peer: mod_fcgid: error reading data from FastCGI server
Premature end of script headers: phpinfo.php
After checking the wrapper scripts and Apache2 settings, I realized that /var/www/ did not have accordant permissions. Thus the FCGId Wrapper scripts could not be read at all.
ls -la /var/www
drwxrws--- 5 www-data www-data 4096 Oct 7 11:17 .
For my scenario chmod -o+rx /var/www was required of course, since the used SuExec users are not member of www-data user group - and they should not be member for security reasons of course.

if you want to install a PHP version < 5.3.0, you must replace
--enable-cgi
with:
--enable-fastcgi
in your ./configure statement, excerpt from the php.net doc:
--enable-fastcgi
If this is enabled, the CGI module will be built with support for FastCGI also. Available since PHP 4.3.0
As of PHP 5.3.0 this argument no longer exists and is enabled by --enable-cgi instead. After the compilation the ./php-cgi -v should look like this:
PHP 5.2.17 (cgi-fcgi) (built: Jul 9 2013 18:28:12)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies
NOTICE THE (cgi-fcgi)

I had this issue and realized that the file cgi-bin/php-fcgi had no execution rights.
It had 644 mode while is should have 755 mode.
Setting the correct mode was impossible (probably because the file was opened or sth), so I copied that file from another domain directory where it had proper rights already set and that fixed everything.

I had the same problem with a different and simple solution.
Problem
I installed PHP 5.6 following the accepted answer to this question on Ask Ubuntu.
After using Virtualmin to switch a particular virtual server from PHP 5.5 to PHP 5.6, I received a 500 Internal Server Error and had the same entries in the apache error log:
[Tue Jul 03 16:15:22.131051 2018] [fcgid:warn] [pid 24262] (104)Connection reset by peer: [client 10.20.30.40:23700] mod_fcgid: error reading data from FastCGI server
[Tue Jul 03 16:15:22.131101 2018] [core:error] [pid 24262] [client 10.20.30.40:23700] End of script output before headers: index.php
Cause
Simple: I didn't install the php5.6-cgi packet.
Fix
Installing the packet and reloading apache solved the problem:
sudo apt-get install php5.6-cgi if you are using PHP 5.6
sudo apt-get install php5-cgi if you are using a different PHP 5 version
sudo apt-get install php7.0-cgi if you are using PHP 7
Then use service apache2 reload to apply the configuration.

The famous Moodle "replace.php" script can generate this situation too.
For me it was taking ages to run and then failed with a 500 message in the browser and also with the above error message in my apache error log file.
I followed up on #james-wise answer:
FcgidBusy is readably described in the Apache documentation. I tried this: doubled the amount of time which apache would give my script to run, by inserting the following line in /etc/apache2/mods-available/fcgid.conf
FcgidBusyTimeout 600
Then I restarted Apache and tried to run my replace.php script again.
Fortunately this time the script instance ran to completion, so for my purposes this served as a solution.

I came across this one while debugging a virtualmin/apache related error.
In my case, I am running virtualmin and had in my virtual machine's php.ini
safe_mode=On.
In my Virtual Machine's error log, I was getting the fcgi Connection reset by peer: mod_fcgid: error reading data from FastCGI server
In my main apache error log I was getting:
PHP Fatal error: Directive 'safe_mode' is no longer available in PHP in Unknown on line 0
In my case, I simply set safe_mode = Off in my php.ini and restarted apache.
stackoverflow.com/questions/18683177/where-to-start-with-deprecated-directive-safe-mode-on-line-0-in-apache-error

Not in this questions askers case but often:
What does the "premature end of script headers" error mean?
That error means that the FCGI call was exited unexpectedly.
In some cases it means that the script "backup.php" did crash.
How to fix this?
If the crash of a script was the cause, fix the script so that it does not crash. Then this error is fixed, too. To find out if and why a script crashes, you need to debug it. For example you can check the PHP error log. Errors logged to STDERR normally go into the error handler of the FCGI.

I had the same problem with long-running scripts with the error messages
"Premature end of script headers: index.php" and "Connection reset by peer: mod_fcgid: error reading data from FastCGI server" in error_log.
After hours of testing this helps for me (CentOS 6, PHP-FPM 7, Plesk 12.5.30):
edit the config file:
/etc/httpd/conf.d/fcgid.conf
Set a higher running time. In my case 600 seconds
create the new entry:
FcgidBusyTimeout 600
adapt following entries:
FcgidIOTimeout 600
FcgidConnectTimeout 600
restart httpd:
service httpd restart

In CentOS releases suexec is compiled to run only in /var/www. If you try to set a DocumentRoot somewhere else you have to recompile it - the error in apache log are:
(104)Connection reset by peer: mod_fcgid: error reading data from FastCGI server
Premature end of script headers: php5.fcgi

Just install php5-cgi
in debian
sudo apt-get install php5-cgi
in Centos
sudo yum install php5-cgi

Check /var/lib/php/session and its permissions. This dir should be writable by user so the session can be stored

As already mentioned this could be happening due to fcgi handler permission issues. If you're using suexec - don't forget to check if apache has this module enabled.

I increased max execution time to 600 seconds job done !

If you're on a shared server like me the host said it was a result of hitting memory limits, so they kill scripts which results in the "Premature end of script headers" seen in this error. They referred me to this:
https://help.dreamhost.com/hc/en-us/articles/216540488-Why-did-procwatch-kill-processes-on-my-Shared-serv
Given an increase in memory, the issues went. I think a backup plugin Updraft on wordpress was perhaps over zealous in its duty/settings.

In my case I was using a custom extension for my PHP files and I had to edit /etc/apache2/conf-available/php7.2-fpm.conf and add the following code:
<FilesMatch ".+\.YOUR_CUSTOM_EXTENSION$">
SetHandler "proxy:unix:/run/php/php7.2-fpm.sock|fcgi://localhost"
</FilesMatch>

I've tried the majority of answers that I've found on this issue. My issue was with wp-cron.php executing a particular function.
I'm working on Plesk CentOS7, Apache server.
I used this related question and suggested answer to help me find out how to adjust the fcgid.conf memory limit utilizing the command line.
Upon trying to troubleshoot the limits on the fcgid.conf file (/etc/httpd/conf.d/fcgid.conf) and restarting apache gracefully, I found that there was no change.
After seeing that other cron jobs were running properly, I decided to refer back to my function declared in functions.php and found that some of the arguments declared were not being specified properly, so there was a loop occurring that would eventually lead to the timeout.
Upon fixing this and running the cron again, it ran as it should.
Hope this helps someone else in a similar position!

I got the same problem (with Plesk 12 installed).
However, when i switched from execute PHP as FastCGI to Apache Module the website worked.
Checked my suexec log:
$ cd /var/log/apache2/
$ less suexec.log
When you find something like this:
[2015-03-22 10:49:00]: directory is writable by others: (/var/www/cgi-bin/cgi_wrapper)
[2015-03-22 10:49:05]: uid: (10004/gb) gid: (1005/1005) cmd: cgi_wrapper
try this commands
$ chown root:root /var/www/cgi-bin/cgi_wrapper
$ chmod 755 /var/www/cgi-bin/cgi_wrapper
$ shutdown -r now
as root.
I hope it can help you.

Nginx + PHP-FPM 502 Bad Gateway

I am getting a 502 Bad Gateway from Nginx on a line of PHP code that is working fine in other places of my program ($this->provider = new OAuthProvider();), and that have worked fine before. This is the message I get in the Nginx error log for each 502:
recv() failed (104: Connection reset by peer) while reading response header from upstream
In the PHP-FPM log there is a warning for each 502:
[WARNING] [pool www] child 17427 exited on signal 11 SIGSEGV after 142070.657176 seconds from start
After trying a number of changes to the nginx.conf I am stuck and would very much appreciate any pointers of what to do next.
I'm running Nginx 0.7.67 and PHP 5.3.2 on Ubuntu 10.04.

maybe http://pecl.php.net/bugs/bug.php?id=17689 or bug id #18138

Your PHP process crashed with a segfault ("signal 11 SIGSEGV"), which caused Nginx to see "connection reset by peer" (PHP is the "peer" in this case, and Nginx is telling you "Look, he hung up on me before I could get an answer from him").
Check out the PHP Bug database page on how to report a bug someone will want to fix to find out how to get a backtrace of the segfault so you can report it.

i had the same problem with APC. so i removed it and installed eaccelerator instead. no problem so far.

I had similar problems with nginx/lighttpd + php-fcgi(using spawn-fcgi), do you use any opcode cache for php?
What i found quite some time ago is that xcache was causing strange behaviour in php-fcgi, some php-fcgi processes randomly died, i was unable to find any pattern. I would recommend to take a look at apc(or other opcode cache) settings, if you are using any.
Right now im using nginx + php-fpm on freebsd and have no problems.

ZendOptimizer + APC + php-fpm 5.2.14 gives constantly reproducible SIGSEGV even on phpinfo();.

Try to switch suhosin off. Sometimes it crashes Apache.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.