I have a virtual machine with 64 vCPUs and 256GB of memory. Recently, I decided to perform some stress tests on the website that is running on this virtual machine. The entire VM is only for this website.
The first test I ran was with 20,000 users per second, and the average response time was around 1400ms. During the test, the site was not usable.
After that, I decided to inspect the top processes to identify the source of the problem. These are the processes and their CPU utilization during the test:
top - 10:30:19 up 1 day, 34 min, 0 users, load average: 8.39, 3.04, 1.46
Tasks: 711 total, 2 running, 709 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.0 us, 9.8 sy, 3.8 ni, 79.2 id, 0.2 wa, 0.0 hi, 0.9 si, 0.0 st
MiB Mem : 257925.6 total, 219425.1 free, 3658.2 used, 34842.3 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 252346.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
218159 mysql 20 0 6911232 96204 19792 S 491.4 0.0 4:24.99 mysqld
139405 nobody 20 0 54948 34196 6128 D 44.9 0.0 0:52.17 litespeed
218251 obl74+ 21 1 347708 29228 19328 S 40.9 0.0 0:20.83 lsphp
218402 obl74+ 21 1 347708 29152 19264 S 40.9 0.0 0:22.35 lsphp
218955 obl74+ 21 1 273004 21336 12472 D 40.9 0.0 0:22.39 lsphp
218957 obl74+ 21 1 273004 21336 12472 D 40.9 0.0 0:22.22 lsphp
218961 obl74+ 21 1 273004 21336 12472 S 40.9 0.0 0:22.37 lsphp
218963 obl74+ 21 1 273004 21328 12468 S 40.9 0.0 0:22.31 lsphp
218252 obl74+ 21 1 347708 29228 19328 D 40.5 0.0 0:22.42 lsphp
218407 obl74+ 21 1 347708 29152 19264 D 40.5 0.0 0:22.30 lsphp
218956 obl74+ 21 1 273004 21332 12472 S 40.5 0.0 0:20.73 lsphp
218959 obl74+ 21 1 273004 21336 12472 S 40.5 0.0 0:22.13 lsphp
Interestingly, despite the website's poor performance during the test, neither the CPU nor memory usage was particularly high. Also, during the test, CyberPanel indicated a CPU usage of 19% and a memory usage of 2%. Therefore, I conclude that the server is not experiencing any resource constraints, as it is not utilizing all its CPU and memory. However, it is still lagging for some reason.
Then, I decided to remove the components related to MySQL from the page on which I performed the stress test. The outcome was much more stable.
top - 10:43:54 up 1 day, 47 min, 0 users, load average: 0.87, 1.23, 1.41
Tasks: 705 total, 5 running, 699 sleeping, 0 stopped, 1 zombie
%Cpu(s): 2.8 us, 1.0 sy, 0.4 ni, 95.2 id, 0.0 wa, 0.0 hi, 0.5 si, 0.0 st
MiB Mem : 257925.6 total, 218249.7 free, 3910.0 used, 35765.9 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 252098.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
139416 nobody 20 0 53200 32480 6128 S 18.3 0.0 0:47.00 litespeed
139402 nobody 20 0 52928 33308 7204 S 16.6 0.0 0:44.40 litespeed
139409 nobody 20 0 54900 34136 6188 S 16.6 0.0 0:46.38 litespeed
139410 nobody 20 0 49904 29156 6128 S 16.6 0.0 0:35.43 litespeed
139414 nobody 20 0 51688 30936 6128 R 16.6 0.0 0:45.46 litespeed
139415 nobody 20 0 55492 35280 6680 R 15.9 0.0 0:46.24 litespeed
139412 nobody 20 0 52112 31420 6188 S 15.6 0.0 0:45.05 litespeed
139404 nobody 20 0 50396 29644 6128 S 15.3 0.0 0:44.83 litespeed
139413 nobody 20 0 44700 23816 6128 S 15.3 0.0 0:21.83 litespeed
139406 nobody 20 0 50752 30004 6128 S 15.0 0.0 1:05.25 litespeed
According to CyberPanel, during the new test, the CPU usage was 4% and the memory usage was 2%.
Therefore, it is obvious that there is an issue with MySQL. I am currently using the default my.cnf configuration provided by CyberPanel, but I have attempted various other configurations found on the internet, yet nothing has improved the performance even a little bit. I've also tried stuff like MySQL Tuner but it didn't change the performance.
The MySQL part that I removed for the second test was a basic query of a table that contained 7 rows. It verified the user's IP address to determine if they were on the IP whitelist. This operation should not have posed a significant problem.
As observed in both tests, I detect a threshold or a bottleneck at the start, beyond which the site experiences a sharp increase in lag. Despite having ample free memory and CPU, there seems to be some limiting factor.
Some might argue that a rate of 20,000 users per second is excessive and unrealistic. However, even when I conducted the test with only 250 users per second, the outcome was the same: the website was extremely slow and not usable.
At this point I am totally lost. I am uncertain as to where to focus my efforts and what steps to take next to decrease the average response time. I would greatly appreciate any insightful comments or suggestions you may have and I thank you in advance for your time and consideration.
UPDATE
I have reinstalled the operating system and CyberPanel, and it appears that the problem has been resolved. Although I am uncertain about what went wrong previously, I suspect that an incorrect setting was responsible.
Suggestions to consider for your CloudSQL configuration
innodb_buffer_pool_size=8G # from ~ 192G because current data is less than 1G
innodb_io_capacity=500 # from 200 to utilize more of your SSD IOPS
innodb_lru_scan_depth=100 # from 1024 to conserve 90% CPU cycles used every second for function
key_buffer_size=20M # from ~ 128M needed for tmp tbl management, NO MyISAM tbls
sql_log_bin=0 # from ON unless you have a need for this specific log
Please view profile for contact info. Other performance enhancements available.
For 20K users per second, you need multiple servers and switches in front of them. Period. End of discussion.
Well, OK, I'll discuss it further.
When MySQL is presented with lots of 'simultaneous' users, it plays fair with them -- each is given equal access to all resources. This is fine until it falls off the cliff. This is when most of the processing is dealing with sharing of the resources. All the threads will eventually finish, but each will take a long time and you (the DBA) will think it crashed and pull the plug.
A simple cure is to lower (YES, lower) the value of max_connections. It turns out that the "cliff" is at a few dozen connections.
Is benchmarking, one throws as much stuff at the server until it croaks. That's usually a few dozen.
In real life web pages are not doing 100% database operations, they are letting the user react, building pages, etc. So, a max_connections of a few hundred is realistic.
Once it reaches the cliff, latency goes through the roof. You would expect throughput to increase, too, but it decreases slightly. I believe that this is because the threads are stumbling over each other too much. Think about any "cache" (buffer_pool, open_tables, table_definitions, etc) -- if "too many" threads are running, the caches may become ineffective.
Think about a market with so many shoppers that they spend most of their time juggling around other people. More shoppers per hour can get through the market if they keep shoppers from entering when it is "full". max_connections is that limiter.
INDEXes needed
ALTER TABLE table_name ADD INDEX(zone);
ALTER TABLE table_name ADD INDEX(IPPool);
(Then take a crash course in the benefits of Indexes (aka "KEYs").)
https://www.php.net/manual/en/features.commandline.webserver.php
From 7.4+ onwards, I assume PHP built in server is capable of handling multiple incoming requests, up to and equal to the environment variable: PHP_CLI_SERVER_WORKERS
I have a web app which is composed of a couple dozen AJAX powered lists, on the first page load, using the built-in server it slows to a crawl, usually fails due to timeout in PHP scripts.
I read the above feature, added environment variable (PHP is a docker container), and shell'ed into my container, did a top/ps I can now see X number of PHP processes:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 722864 21932 14448 S 0.3 1.1 0:09.91 symfony
20 root 20 0 210892 48188 36544 S 0.0 2.4 0:00.79 php7.4
21 root 20 0 205676 33460 24500 S 0.0 1.6 0:00.22 php7.4
22 root 20 0 208212 40908 29640 S 0.0 2.0 0:00.42 php7.4
23 root 20 0 210644 42236 30836 S 0.0 2.1 0:00.61 php7.4
24 root 20 0 208764 40784 31176 S 0.0 2.0 0:01.14 php7.4
25 root 20 0 205804 33588 24508 S 0.0 1.6 0:00.22 php7.4
...
I am using Symfony to start a dev server, but no matter what I do none of the processes seem to be carrying any of the load? What am I missing?
I switched to the built-in PHP server about 2 years ago, from NGINX. It made my vagrant setup easier (now docker) but my performance took a hit, which I've dealt with. I'd like to improve the responsiveness of the app using this approach if possible.
Any ideas?
I'm managing a server on AWS, t2.micro (1 Mem GiB) instance type with Debian 9.
Main services installed are:
Nginx (active)
MySQL (active)
Supervisor (stopped)
Redis (active)
These programs are for 10 Laravel (PHP) projects enabled.
The problem is that free memory is always between 60MB-75MB and I can't even start supervisor service or install new project dependencies via composer without crashing everything (including SSH session):
$ free -m
total used free shared buff/cache available
Mem: 994 477 71 140 444 233
Swap: 0 0 0
The processes consuming memory are:
$ ps aux | awk '{print $6/1024 " MB\t\t" $11}' | sort -n
...
10.9492 MB php-fpm:
104.473 MB php-fpm:
120.109 MB php-fpm:
144.262 MB php-fpm:
380.344 MB /usr/sbin/mysqld
Actually I have only 2 MySQL (not large) databases. Why MySQL is consuming 380MB? There's a way to optimise it?
And what about PHP-FPM, there is a need to run 4 different processes with ~100MB each? How to reduce this?
Default MySQL settings are optimized and suitable for general situations. If it consumes 380 MB (in our days it is a small amount of memory), it is probably normal. Still, you could make some things with MySQL:
use MyISAM instead of InnoDB (you could turn off InnoDB engine - refer to MySQL docs)
Change some memory caches parameters (please refer to http://www.tocker.ca/2014/03/10/configuring-mysql-to-use-minimal-memory.html and MySQL documentation) but in this case, you might get performance degradation of your MySQL server.
Best of all is to use cheaper hosting because AWS is overpriced. You can buy more powerful server for the same money.
I am getting some warning from my Debian 8 server through nagios about average cpu load where top shows load average:
4.01, 3.66, 3.37
.... Sometimes increased to 5 where it shows critical. i am using php5-fpm and nginx with magento in this server. It has 20 cores and 12 GB ram.
top - 10:46:12 up 46 min, 1 user, load average: 3.47, 3.57, 3.38
Tasks: 257 total, 8 running, 249 sleeping, 0 stopped, 0 zombie
%Cpu(s): 23.5 us, 0.6 sy, 0.0 ni, 75.9 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 12327560 total, 9178948 used, 3148612 free, 451868 buffers
KiB Swap: 5241852 total, 0 used, 5241852 free. 4740264 cached Mem
Could anybody help me about this issue?
What is the normal CPU load average for this machine?
I got a very weird problem. My wordpress previously was working fine but recently and suddenly it turned very slow and the nginx returns 502 bad gateway sometimes. So I did some investigation, then I noticed the PHP-FPM processes consume all the CPU even there's no request. Everytime I restarted the wordpress the idle CPU usage just jumped to 0% instantly, and I can see several PHP-FPM processes were working with high CPU consumption, and the MySQL process consumed a lot CPU resource too, no matter if there is a request.
I tried to stop all the plugins - didn't work, same symptom.
I tried to update the wordpress to the latest version but DIDN't
connect to database - CPU usage is normal.
I tried to update the wordpress to the latest version and only keep the file wp-config.php (database username + pwd) - didn't work, same symptom.
This is so weird but seems it's related to the mysql database? But why?
Thanks in advance.
top - 02:08:12 up 56 min, 1 user, load average: 10.18, 9.41, 8.68
Tasks: 115 total, 11 running, 104 sleeping, 0 stopped, 0 zombie
Cpu(s): 36.6%us, 10.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 53.0%st
Mem: 766112k total, 682116k used, 83996k free, 239696k buffers
Swap: 1572860k total, 2664k used, 1570196k free, 125412k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23854 www 20 0 59952 30m 4688 R 44.5 4.1 3:56.99 php-fpm
24337 www 20 0 60204 32m 4520 R 44.2 4.3 3:53.83 php-fpm
24300 www 20 0 52004 23m 4448 R 42.9 3.2 3:48.47 php-fpm
24287 www 20 0 54324 27m 5140 R 37.6 3.7 3:54.34 php-fpm
23855 www 20 0 54824 26m 4504 R 35.6 3.5 3:57.25 php-fpm
24323 www 20 0 46108 19m 4856 R 35.6 2.6 3:57.73 php-fpm
24274 www 20 0 56356 28m 4548 R 35.2 3.9 3:56.55 php-fpm
24374 www 20 0 55080 26m 4524 R 33.9 3.5 3:52.03 php-fpm
24385 www 20 0 63820 33m 4428 R 33.2 4.5 3:51.53 php-fpm
24394 www 20 0 57900 29m 4444 R 30.6 3.9 3:50.09 php-fpm
24250 mysql 20 0 214m 29m 5860 S 23.9 3.9 1:35.21 mysqld
6 root RT 0 0 0 0 S 1.7 0.0 0:01.31 watchdog/0
216 root 20 0 0 0 0 S 1.0 0.0 0:02.96 kjournald
23850 www 20 0 18624 11m 868 S 0.3 1.6 0:01.89 nginx
23851 www 20 0 18812 12m 876 S 0.3 1.6 0:03.61 nginx
27889 root 20 0 2712 1136 880 R 0.3 0.1 0:00.81 top
It turned out it's caused by an XML RPC attack. It's resolved now.
For more information:
https://medium.com/#tturnbull/throttle-xmlrpc-php-attacks-on-wordpress-with-nginx-3cc4a12b7f76