I have a virtual machine with 64 vCPUs and 256GB of memory. Recently, I decided to perform some stress tests on the website that is running on this virtual machine. The entire VM is only for this website.
The first test I ran was with 20,000 users per second, and the average response time was around 1400ms. During the test, the site was not usable.
After that, I decided to inspect the top processes to identify the source of the problem. These are the processes and their CPU utilization during the test:
top - 10:30:19 up 1 day, 34 min, 0 users, load average: 8.39, 3.04, 1.46
Tasks: 711 total, 2 running, 709 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.0 us, 9.8 sy, 3.8 ni, 79.2 id, 0.2 wa, 0.0 hi, 0.9 si, 0.0 st
MiB Mem : 257925.6 total, 219425.1 free, 3658.2 used, 34842.3 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 252346.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
218159 mysql 20 0 6911232 96204 19792 S 491.4 0.0 4:24.99 mysqld
139405 nobody 20 0 54948 34196 6128 D 44.9 0.0 0:52.17 litespeed
218251 obl74+ 21 1 347708 29228 19328 S 40.9 0.0 0:20.83 lsphp
218402 obl74+ 21 1 347708 29152 19264 S 40.9 0.0 0:22.35 lsphp
218955 obl74+ 21 1 273004 21336 12472 D 40.9 0.0 0:22.39 lsphp
218957 obl74+ 21 1 273004 21336 12472 D 40.9 0.0 0:22.22 lsphp
218961 obl74+ 21 1 273004 21336 12472 S 40.9 0.0 0:22.37 lsphp
218963 obl74+ 21 1 273004 21328 12468 S 40.9 0.0 0:22.31 lsphp
218252 obl74+ 21 1 347708 29228 19328 D 40.5 0.0 0:22.42 lsphp
218407 obl74+ 21 1 347708 29152 19264 D 40.5 0.0 0:22.30 lsphp
218956 obl74+ 21 1 273004 21332 12472 S 40.5 0.0 0:20.73 lsphp
218959 obl74+ 21 1 273004 21336 12472 S 40.5 0.0 0:22.13 lsphp
Interestingly, despite the website's poor performance during the test, neither the CPU nor memory usage was particularly high. Also, during the test, CyberPanel indicated a CPU usage of 19% and a memory usage of 2%. Therefore, I conclude that the server is not experiencing any resource constraints, as it is not utilizing all its CPU and memory. However, it is still lagging for some reason.
Then, I decided to remove the components related to MySQL from the page on which I performed the stress test. The outcome was much more stable.
top - 10:43:54 up 1 day, 47 min, 0 users, load average: 0.87, 1.23, 1.41
Tasks: 705 total, 5 running, 699 sleeping, 0 stopped, 1 zombie
%Cpu(s): 2.8 us, 1.0 sy, 0.4 ni, 95.2 id, 0.0 wa, 0.0 hi, 0.5 si, 0.0 st
MiB Mem : 257925.6 total, 218249.7 free, 3910.0 used, 35765.9 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 252098.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
139416 nobody 20 0 53200 32480 6128 S 18.3 0.0 0:47.00 litespeed
139402 nobody 20 0 52928 33308 7204 S 16.6 0.0 0:44.40 litespeed
139409 nobody 20 0 54900 34136 6188 S 16.6 0.0 0:46.38 litespeed
139410 nobody 20 0 49904 29156 6128 S 16.6 0.0 0:35.43 litespeed
139414 nobody 20 0 51688 30936 6128 R 16.6 0.0 0:45.46 litespeed
139415 nobody 20 0 55492 35280 6680 R 15.9 0.0 0:46.24 litespeed
139412 nobody 20 0 52112 31420 6188 S 15.6 0.0 0:45.05 litespeed
139404 nobody 20 0 50396 29644 6128 S 15.3 0.0 0:44.83 litespeed
139413 nobody 20 0 44700 23816 6128 S 15.3 0.0 0:21.83 litespeed
139406 nobody 20 0 50752 30004 6128 S 15.0 0.0 1:05.25 litespeed
According to CyberPanel, during the new test, the CPU usage was 4% and the memory usage was 2%.
Therefore, it is obvious that there is an issue with MySQL. I am currently using the default my.cnf configuration provided by CyberPanel, but I have attempted various other configurations found on the internet, yet nothing has improved the performance even a little bit. I've also tried stuff like MySQL Tuner but it didn't change the performance.
The MySQL part that I removed for the second test was a basic query of a table that contained 7 rows. It verified the user's IP address to determine if they were on the IP whitelist. This operation should not have posed a significant problem.
As observed in both tests, I detect a threshold or a bottleneck at the start, beyond which the site experiences a sharp increase in lag. Despite having ample free memory and CPU, there seems to be some limiting factor.
Some might argue that a rate of 20,000 users per second is excessive and unrealistic. However, even when I conducted the test with only 250 users per second, the outcome was the same: the website was extremely slow and not usable.
At this point I am totally lost. I am uncertain as to where to focus my efforts and what steps to take next to decrease the average response time. I would greatly appreciate any insightful comments or suggestions you may have and I thank you in advance for your time and consideration.
UPDATE
I have reinstalled the operating system and CyberPanel, and it appears that the problem has been resolved. Although I am uncertain about what went wrong previously, I suspect that an incorrect setting was responsible.
Suggestions to consider for your CloudSQL configuration
innodb_buffer_pool_size=8G # from ~ 192G because current data is less than 1G
innodb_io_capacity=500 # from 200 to utilize more of your SSD IOPS
innodb_lru_scan_depth=100 # from 1024 to conserve 90% CPU cycles used every second for function
key_buffer_size=20M # from ~ 128M needed for tmp tbl management, NO MyISAM tbls
sql_log_bin=0 # from ON unless you have a need for this specific log
Please view profile for contact info. Other performance enhancements available.
For 20K users per second, you need multiple servers and switches in front of them. Period. End of discussion.
Well, OK, I'll discuss it further.
When MySQL is presented with lots of 'simultaneous' users, it plays fair with them -- each is given equal access to all resources. This is fine until it falls off the cliff. This is when most of the processing is dealing with sharing of the resources. All the threads will eventually finish, but each will take a long time and you (the DBA) will think it crashed and pull the plug.
A simple cure is to lower (YES, lower) the value of max_connections. It turns out that the "cliff" is at a few dozen connections.
Is benchmarking, one throws as much stuff at the server until it croaks. That's usually a few dozen.
In real life web pages are not doing 100% database operations, they are letting the user react, building pages, etc. So, a max_connections of a few hundred is realistic.
Once it reaches the cliff, latency goes through the roof. You would expect throughput to increase, too, but it decreases slightly. I believe that this is because the threads are stumbling over each other too much. Think about any "cache" (buffer_pool, open_tables, table_definitions, etc) -- if "too many" threads are running, the caches may become ineffective.
Think about a market with so many shoppers that they spend most of their time juggling around other people. More shoppers per hour can get through the market if they keep shoppers from entering when it is "full". max_connections is that limiter.
INDEXes needed
ALTER TABLE table_name ADD INDEX(zone);
ALTER TABLE table_name ADD INDEX(IPPool);
(Then take a crash course in the benefits of Indexes (aka "KEYs").)
multiple php-cgi processes growing and crashing my cpu. Is there any configuration i need to do control this.
I am using apache(httpd) as a server.
top command result:
28278 random_user 20 0 418824 47092 13932 R 3.3 1.6 0:01.10 php-cgi
28335 random_user 20 0 410888 38856 13928 R 3.3 1.3 0:00.75 php-cgi
28340 random_user 20 0 410376 38156 13928 R 3.3 1.3 0:00.68 php-cgi
28342 random_user 20 0 410632 38132 13904 R 3.3 1.3 0:00.71 php-cgi
28395 random_user 20 0 410344 64892 13748 R 3.3 2.2 0:00.49 php-cgi
28456 random_user 20 0 401408 29420 13880 R 3.3 1.0 0:00.28 php-cgi
28474 random_user 20 0 400132 28176 13904 R 3.3 1.0 0:00.22 php-cgi
28494 random_user 20 0 399104 27056 13884 R 3.3 0.9 0:00.17 php-cgi
28513 random_user 20 0 398080 26100 13884 R 3.3 0.9 0:00.10 php-cgi
28221 random_user 20 0 421000 49744 13996 R 3.0 1.7 0:01.17 php-cgi
28239 random_user 20 0 419592 47604 13932 R 3.0 1.6 0:01.13 php-cgi
28249 random_user 20 0 419592 47616 13936 R 3.0 1.6 0:01.11 php-cgi
28277 random_user 20 0 419592 47340 13908 R 3.0 1.6 0:01.12 php-cgi
28282 random_user 20 0 417288 45504 13928 R 3.0 1.6 0:01.05 php-cgi
28304 random_user 20 0 415240 43420 13928 R 3.0 1.5 0:00.88 php-cgi
28309 random_user 20 0 413448 41584 13928 R 3.0 1.4 0:00.90 php-cgi
28311 random_user 20 0 413448 41480 13928 R 3.0 1.4 0:00.82 php-cgi
28312 random_user 20 0 412936 41052 13928 R 3.0 1.4 0:00.82 php-cgi
https://www.php.net/manual/en/features.commandline.webserver.php
From 7.4+ onwards, I assume PHP built in server is capable of handling multiple incoming requests, up to and equal to the environment variable: PHP_CLI_SERVER_WORKERS
I have a web app which is composed of a couple dozen AJAX powered lists, on the first page load, using the built-in server it slows to a crawl, usually fails due to timeout in PHP scripts.
I read the above feature, added environment variable (PHP is a docker container), and shell'ed into my container, did a top/ps I can now see X number of PHP processes:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 722864 21932 14448 S 0.3 1.1 0:09.91 symfony
20 root 20 0 210892 48188 36544 S 0.0 2.4 0:00.79 php7.4
21 root 20 0 205676 33460 24500 S 0.0 1.6 0:00.22 php7.4
22 root 20 0 208212 40908 29640 S 0.0 2.0 0:00.42 php7.4
23 root 20 0 210644 42236 30836 S 0.0 2.1 0:00.61 php7.4
24 root 20 0 208764 40784 31176 S 0.0 2.0 0:01.14 php7.4
25 root 20 0 205804 33588 24508 S 0.0 1.6 0:00.22 php7.4
...
I am using Symfony to start a dev server, but no matter what I do none of the processes seem to be carrying any of the load? What am I missing?
I switched to the built-in PHP server about 2 years ago, from NGINX. It made my vagrant setup easier (now docker) but my performance took a hit, which I've dealt with. I'd like to improve the responsiveness of the app using this approach if possible.
Any ideas?
I got a very weird problem. My wordpress previously was working fine but recently and suddenly it turned very slow and the nginx returns 502 bad gateway sometimes. So I did some investigation, then I noticed the PHP-FPM processes consume all the CPU even there's no request. Everytime I restarted the wordpress the idle CPU usage just jumped to 0% instantly, and I can see several PHP-FPM processes were working with high CPU consumption, and the MySQL process consumed a lot CPU resource too, no matter if there is a request.
I tried to stop all the plugins - didn't work, same symptom.
I tried to update the wordpress to the latest version but DIDN't
connect to database - CPU usage is normal.
I tried to update the wordpress to the latest version and only keep the file wp-config.php (database username + pwd) - didn't work, same symptom.
This is so weird but seems it's related to the mysql database? But why?
Thanks in advance.
top - 02:08:12 up 56 min, 1 user, load average: 10.18, 9.41, 8.68
Tasks: 115 total, 11 running, 104 sleeping, 0 stopped, 0 zombie
Cpu(s): 36.6%us, 10.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 53.0%st
Mem: 766112k total, 682116k used, 83996k free, 239696k buffers
Swap: 1572860k total, 2664k used, 1570196k free, 125412k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23854 www 20 0 59952 30m 4688 R 44.5 4.1 3:56.99 php-fpm
24337 www 20 0 60204 32m 4520 R 44.2 4.3 3:53.83 php-fpm
24300 www 20 0 52004 23m 4448 R 42.9 3.2 3:48.47 php-fpm
24287 www 20 0 54324 27m 5140 R 37.6 3.7 3:54.34 php-fpm
23855 www 20 0 54824 26m 4504 R 35.6 3.5 3:57.25 php-fpm
24323 www 20 0 46108 19m 4856 R 35.6 2.6 3:57.73 php-fpm
24274 www 20 0 56356 28m 4548 R 35.2 3.9 3:56.55 php-fpm
24374 www 20 0 55080 26m 4524 R 33.9 3.5 3:52.03 php-fpm
24385 www 20 0 63820 33m 4428 R 33.2 4.5 3:51.53 php-fpm
24394 www 20 0 57900 29m 4444 R 30.6 3.9 3:50.09 php-fpm
24250 mysql 20 0 214m 29m 5860 S 23.9 3.9 1:35.21 mysqld
6 root RT 0 0 0 0 S 1.7 0.0 0:01.31 watchdog/0
216 root 20 0 0 0 0 S 1.0 0.0 0:02.96 kjournald
23850 www 20 0 18624 11m 868 S 0.3 1.6 0:01.89 nginx
23851 www 20 0 18812 12m 876 S 0.3 1.6 0:03.61 nginx
27889 root 20 0 2712 1136 880 R 0.3 0.1 0:00.81 top
It turned out it's caused by an XML RPC attack. It's resolved now.
For more information:
https://medium.com/#tturnbull/throttle-xmlrpc-php-attacks-on-wordpress-with-nginx-3cc4a12b7f76
My MediaWiki version is 1.22, and have Parsoid service and visual editor extension already installed and enabled in my Localsettings.php file.
I think Parsoid service has no problem because url _wikitext is available.
But in my wiki page, the edit button have no response, just submitted parameter '&veaction=edit' and nothing happens.
And yes, I do have javascript errors when click on edit button.
Error 1: "Uncaught Error: Unknown dependency: mediawiki.cookie"
Error 2: "Uncaught ReferenceError: importScript is not defined"
But I'm not quite familiar with front end and javascript, also don't know how to develop extention for mediawiki, so just don't know what to do next.
What can I do to solve this?
My system version is Centos6.5 , Parsoid and visual editor version are all 1.22
Any help will be appreciated. Thanks!
This recalls me of a similar case in the past where someone copied some JavaScript code which required importScript to be defined, but forgot to copy said function, and broke all the JavaScript in the wiki.
I don't see anything in your MediaWiki:Common.js, so it seems you literally hacked the MediaWiki files. This is highly discouraged: please move all your custom code from the files to the on-wiki JavaScript definitions and self-contained gadgets. By doing one piece at a time, you'll also see what is broken.
Meanwhile, I'm unable to do any testing on your wiki because its network is horrible from Europe: 400+ ms latency, 50 % packet loss. Packets get lost somewhere in China. Is networking better for you? If not, I doubt that you can manage to load JavaScript properly with such a connection.
mtr -w -c 250 wiki.uhaan.com
11.|-- 62-101-124-98.fastres.net 0.8% 250 46.1 54.2 3.3 151.1 37.2
12.|-- 89.96.200.114 0.0% 250 47.2 69.2 2.7 151.6 33.3
13.|-- mno-b2-link.telia.net 0.0% 250 54.0 64.1 3.3 120.3 30.0
14.|-- ffm-bb2-link.telia.net 0.8% 250 50.3 86.2 18.5 221.7 39.8
15.|-- ffm-b10-link.telia.net 2.4% 250 44.5 69.2 18.3 136.5 32.4
16.|-- chinatelecom-ic-306387-ffm-b10.c.telia.net 0.0% 250 36.4 61.5 18.9 133.3 30.2
17.|-- 202.97.58.61 94.8% 250 372.3 351.9 312.9 440.4 38.9
18.|-- 202.97.33.133 40.4% 250 309.7 329.1 300.3 397.6 24.7
19.|-- 202.97.50.233 44.0% 250 325.9 333.9 307.0 410.6 23.4
20.|-- ??? 100.0 250 0.0 0.0 0.0 0.0 0.0
21.|-- 61.164.31.182 80.4% 250 317.6 345.2 314.7 425.4 30.4
22.|-- ??? 100.0 249 0.0 0.0 0.0 0.0 0.0
23.|-- 42.120.244.198 47.8% 249 405.8 422.7 375.4 502.2 35.2
24.|-- 42.120.244.210 47.4% 247 443.1 480.4 402.6 561.9 35.7
25.|-- ??? 100.0 246 0.0 0.0 0.0 0.0 0.0
26.|-- 115.29.170.20 53.3% 246 418.8 413.0 343.3 498.9 34.3