PHP-FPM performance tuning - bursts of traffic - php

I have a web application written in Laravel / PHP that is in the early stages and generally serves about 500 - 600 reqs/min. We use Maria DB and Redis for caching and everything is on AWS.
For events we want to promote on our platform, we send out a push notification (mobile platform) to all users which results in a roughly 2-min long traffic burst that takes us to 3.5k reqs/min
At our current server scale, this completely bogs down the application servers' CPU which usually operate at around 10% CPU. The Databases and Redis clusters seem fine during this burst.
Looking at the logs, it seems all PHP-FPM worker pool processes get occupied and begin queuing up requests from the Nginx upstream.
We currently have:
three m4.large servers (2 cores, 8gb RAM each)
dynamic PHP-FPM process management, with a max of 120 child processes (servers)on each box
My questions:
1) Should we increase the FPM pool? It seems that memory-wise, we're probably nearing our limit
2) Should we decrease the FPM pool? It seems possible that we're spinning up so many process that the CPU is getting bogged down and is unable to really complete any of them. I wonder if we'd therefore get better results with less.
3) Should we simply use larger boxes with more RAM and CPU, which will allow us to add more FPM workers?
4) Is there any FPM performance tuning that we should be considering? We use Opcache, however, should we switch to static process management for FPM to cut down on the overhead of processes spinning up and down?

There are too many child processes in relation to the number of cores.
First, you need to know the server status at normal and burst time.
1) Check the number of php-fpm processes.
ps -ef | grep 'php-fpm: pool' | wc -l
2) Check the load average. At 2 cores, 2 or more means that the work's starting delayed.
top
htop
glances
3) Depending on the service, we start to adjust from twice the number of cores.
; Example
;pm.max_children = 120 ; normal) pool 5, load 0.1 / burst) pool 120, load 5 **Bad**
;pm.max_children = 4 ; normal) pool 4, load 0.1 / burst) pool 4, load 1
pm.max_children = 8 ; normal) pool 6, load 0.1 / burst) pool 8, load 2 **Good**
load 2 = Maximum Performance 2 cores
It is more accurate to test the web server with a load similar to the actual load through the apache benchmark(ab).
ab -c100 -n10000 http://example.com
Time taken for tests: 60.344 seconds
Requests per second: 165.72 [#/sec] (mean)
100% 880 (longest request)

Related

Huge CPU load - php-fpm + nginx

I use php-fpm with STATIC pools and the problem is that 2-3 pools from 20 are used with 80-100% CPU. Other php pools stay unused.
My question is: Why other 17 processes stay unused?
We used AWS instance c4.large.
Our docker image use 1024 Units of CPU and 2560 MB ram.
DOCKER containers in instance
ALL PROCESSES in container
TOP screenshot
The PHP-FPM pm static setting depends heavily on how much free memory your server has. Basically if you are suffering from low server memory, then pm ondemand or dynamic maybe be better options. On the other hand, if you have the memory available you can avoid much of the PHP process manager (PM) overhead by setting pm static to the max capacity of your server. In other words, when you do the math, pm.static should be set to the max amount of PHP-FPM processes that can run without creating memory availability or cache pressure issues. Also, not so high as to overwhelm CPU(s) and have a pile of pending PHP-FPM operations.

How to compute for php-fpm child process on a Kubernetes Cluster

I have recently migrated my application from a single server w/ docker into Google Kubernetes Engine for the reasons of scaling. I am new to the kubernetes platform, and I may not yet fully understand the concepts of it but I do get the basics.
I have successfully migrated my application on a cluster size of 3 each with 1vCPU and 3.75 GB RAM
Now I came across on what is the best configuration for the php-fpm processes running in a kubernetes cluster. I have read a few articles on how to setup the php-fpm processes such as
https://serversforhackers.com/c/php-fpm-process-management
https://www.kinamo.be/en/support/faq/determining-the-correct-number-of-child-processes-for-php-fpm-on-nginx
On my cluster I have an Elasticsearch, Redis, Frontend and a REST Api and my understanding about kubernetes, each has their own pods running on my cluster, I tried to access the pod for the REST Api and see 1 vCPU and 3.75 GB RAM which is what I set on my cluster specs. And the RAM has only 1.75GB left, so I think there are other services or pods using the memory.
So now I wanted to increase the size of the following based on the articles I shared above.
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 4
pm.max_spare_servers = 8
But my problem is since the pod is on a worker, if I change the configuration base on the available memory left (base on the articles I shared above on Calculating pm.max_children) I might end up a pod consuming all memory space left, and will not be able to allocate for the other services. Is my problem makes sense? or is there an idea I am missing?
Base on the article since my worker has 3.75 GB RAM and and other services is already consuming 1.5GB ram so my best aim is at 1 GB RAM.
pm.max_children brings us to 1024 Mb / 60 Mb = 17 max_children
pm.max_children = 17
pm.start_servers = 8
pm.min_spare_servers = 7
pm.max_spare_servers = 10
pm.max_requests = 500
Which leads me to the question How to compute for the php-fpm child process on a Kubernetes Cluster when there are other services or pods shares the same resources.
Thank you for reading until the end, and thanks in advance for your inputs.
GKE comes with multiple system pods (such as kube-DNS and fluentd). Some of these pods do not scale up much, this means if you add additional nodes, they will have more available resources.
The nodes are also running an OS so some of the memory is being assigned to that.
You can also view the resources available per node by using kubectl describe no | grep Allocatable -A 5
This will show you the amount of resources left after the node's consumption.
Using kubectl describe no | grep Allocated -A 5 you can view the amount of memory and CPU that is already requested by current pods.
All this being said, you should choose the number of child processes based on your need. Once you know the amount of memory the pod will need, set resource requests and limits to your pod config so that the kubernetes scheduler can put the php-fpm on a node with sufficient resources.
Kubernetes strength is that you tell it what you want and it will try to make that happen. Instead of worrying too much about how much you can fit, choose an amount for your pod based on your expected/required performance and tell kubernetes that's how much memory you need. This way, you can also increase the number of pods using HPA instead of managing and scaling up the number of child processes.

php-cgi.exe processes cause high cpu usage in IIS 7.5

I have a Windows Server that has random spikes of high CPU usage and upon looking at ProcessExplorer and Windows Task Manager, it seems that there are a high number of php-cgi.exe processes running concurrently, sometimes up to 6-8 instances, all taking around 10-15% of CPU each. Sometimes they are so bad that they cause the server to be unresponsive.
In the FastCGI settings, I've set MaxInstances to 4 so by right, so there shouldn't be more than 4 php-cgi.exe processes that are running simultaneously. Hence I would like some advice or directions on how to limit the number of instances to 4..
Additional notes: I've also set instanceMaxRequests to 10000 and also PHP_FCGI_MAX_REQUESTS to 10000 as well.

How to run 50k jobs per second with gearman

According to Gearman website
"A 16 core Intel machine is able to process upwards of 50k jobs per second."
I have load balancer that moves traffic to 4 different machines. Each machine has 8 cores. I want to have the ability to run 13K jobs per machine, per second (it's definitely more then 50K jobs).
Each job takes between 0.02 - 0.8 MS.
How many workers do I need to open for this type of performance?
What is the steps that I need to take to open these amount of workers?
Depending on what kind of processing you're doing, this will require a little experimentation and load testing. Before you start, make sure you have a way to reboot the server without SSH, as you can easily peg the CPU. Follow these steps to find the optimum number of workers:
Begin by adding a number of workers equal to the number of cores minus one. If you have 8 cores, start with 7 workers (hopefully leaving a core free for doing things like SSH).
Run top and observe the load average. The load average should not be higher than the number of cores. For 8 cores, a load average of 7 or above would indicate you have too many workers. A lower load average means you can try adding another worker.
If you added another worker in step 2, observe the load average again. Also observe the increase in RAM usage.
If you repeat the above steps, eventually you will either run out of CPU or RAM.
When doing parallel processing, keep in mind that you could run into a point of diminishing returns. Read about Amdahl's law for more information.

Nginx scaling and bottleneck identification on an EC2 cluster

I am developing a big application and i have to load test it. It is a EC2 based cluster with one HighCPU Ex.Large instance for application which runs PHP / NGinx.
This applicaton is responsible for reading data from a redis server which holds some 5k - 10k key values, it then makes the response and logs the data into a mongoDB server and replies back to client.
Whenever i send a request to the app server, it does all its computations in about 20 - 25 ms which is awesome.
I am now trying to do some load testing and i run a php based app on my laptop to send requests to server. Many thousands of them quickly over 20 - 30 seconds. During this load period, whenever i open the app URL in the browser, it replies back with the execution time of around 25 - 35 ms which is again cool. So i am sure that redis and mongo are not causing bottlenecks. But it is taking about 25 seconds to get the response back during load.
The high CPU ex. large instance has 8 GB RAM and 8 cores.
Also, during the load test, the top command shows about 4 - 6 php_cgi processes consuming some 15 - 20% of CPU.
I have 50 worker processes on nginx and 1024 worker connections.
What could be the issue causing the bottleneck ?
IF this doesnt work out, i am seriously considering moving out to a whole java application with an embedded webserver and an embedded cache.
UPDATE - increased PHP_FCGI_CHILDREN to 8 and it halfed the response time during load
50 worker processes is too many, you need only one worker process per CPU core. Using more worker processes will invoke inter-process switching, that will consume many time.
What you can do now:
1. Set worker process to minimum (one worker per CPU, e.g. 4 worker process if you have 4 cpu units), but worker connections - to maximum (10240 for example)
Tune up TCP stack via sysctl. You can reach stack limits if you have many connections
Get statistics from nginx stub_status module (you can use munin + nginx, its easy to setup and gave you enough information about system status).
Check nginx error.log and system messages log for errors.
Tune up nginx (decrease connection timings and max query size).
I hope that helps you.

Categories