Apache performance log before it dies

Apache performance log before it dies - php

Is there a way to log apache performance before it gets out of memory error? I have mod_status enabled and the tool is great but I want it to run maybe every 5 minutes so that when the server dies I would know what were the processes running at that time and their CPU/memory usage.

You should consider using a tool like Zabbix or Nagios to keep collecting those metrics.
Also take a look at Datadog, which offers a "very easy to set up" (but paid) solution to collect, visualize, and correlate this metric.
The point is to continously collect any related metrics, and when something bad happens, it helps you to pinpoint the root of the problems by correlating data (in this case, for example server load, and traffic served by apache)

Related

Apache server slow when high HTTP API call

I am running HTTP API which should be called more than 30,000 time per minute simultaneously.
Currently I can call it 1,200 time per minute. If I call 1200 time per minute, all the request are completed and get response immediately.
But if I called 12,000 time per minute simultaneously it take 10 minute to complete all the request. And during that 10 minute, I cannot browse any webpage on the server. It is very slow
I am running CentOS 7
Server Specification
Intel® Xeon® E5-1650 v3 Hexa-Core Haswell,
RAM 256 GB DDR4 ECC RAM,
Hard Drive2 x 480 GB SSD(Software-RAID 1),
Connection 1 Gbit/s
API- simple php script that echo the time-stamp
echo time();
I check the top command, there is no load in the server
please help me on it
Thanks

Sounds like a congestion problem.
It doesn't matter how quick your script/page handling is, if the next request gets done within the execution time of the previous:
It is going to use resources (cpu, ram, disk, network traffic and connections).
And make everything parallel to it slower.
There are multiple things you could do, but you need to figure out what exactly the problem is for your setup and decide if the measure produces the desired result.
If the core problem is that resources get hogged by parallel processes, you could lower connection limits so more connections go in to wait mode, which keeps more resources available for actually handing out a page instead of congesting everything even more.
Take a look at this:
http://oxpedia.org/wiki/index.php?title=Tune_apache2_for_more_concurrent_connections
If the server accepts connections quicker then it can handle them, you are going to have a problem which ever you change. It should start dropping connections at some point. If you cram down French baguettes down its throat quicker then it can open its mouth, it is going to suffocate either way.
If the system gets overwhelmed at the network side of things (transfer speed limit, maximum possible of concurent connections for the OS etc etc) then you should consider using a load balancer. Only after the loadbalancer confirms the server has the capacity to actually take care of the page request it will send the user further.
This usually works well when you do any kind of processing which slows down page loading (server side code execution, large volumes of data etc).
Optimise performance
There are many ways to execute PHP code on a webserver and I assume you use appache. I am no expert, but there are modes like CGI and FastCGI for example. Which can greatly enhance execution speed. And tweaking settings connected to these can also show you what is happening. It could for example be that you use to little number of PHP threats to handle that number of concurrent connections.
Have a look at something like this for example
http://blog.layershift.com/which-php-mode-apache-vs-cgi-vs-fastcgi/
There is no 'best fit for all' solution here. To fix it, you need to figure out what the bottle neck for the server is. And act accordingly.
12000 Calls per minute == 200 calls a second.
You could limit your test case to a multitude of those 200 and increase/decrease it while changing settings. Your goal is to dish that number of requestst out in a shortest amount of time as possible, thus ensuring the congestion never occurs.
That said: consequences.
When you are going to implement changes to optimise the maximum number of page loads you want to achieve you are inadvertently going to introduce other conditions. For example if maximum ram usage by Apache would be the problem, the upping that limit will ensure better performance, but heightens the chance the OS runs out of memory when other processes also want to claim more memory.
Adding a load balancer adds another possible layer of failure and possible slow downs. Yes you prevent congestion, but is it worth the slow down caused by the rerouting?
Upping performance will increase the load on the system, making it possible to accept more concurrent connections. So somewhere along the line a different bottle neck will pop up. High traffic on different processes could always end in said process crashing. Apache is a very well build web server, so it should in theories protect you against said problem, however tweaking settings wrongly could still cause crashes.
So experiment with care and test before you use it live.

What can be causing an "exceeded process limit" error?

I launched a website about a week ago and I sent out an email blast to a mailing list telling everyone the website was live. Right after that the website went down and the general error log was flooded with "exceeded process limit" errors. Since then, I've tried to really clean up a lot of the code and minimize database connections. I will still see that error about once a day in the error log. What could be causing this error? I tried to call the web host and they said it had something to do with my code but couldn't point me in any direction as to what was wrong with the code or which page was causing the error. Can anyone give me any more information? Like for instance, what is a process and how many processes should I have?

Wow. Big question.
Obviously, your maxing out your apache child worker processes. To get a rough idea of how many you can create, use top to get the rough memory footprint of one http process. If you are using wordpress or another cms, it could easily be 50-100m each (if you're using the php module for apache). Then, assuming the machine is only used for web serving, take your total memory, subtract a chunk for OS use, then divide that by 100m (in this example). Thats the max worker processes you can have. Set it in your httpd.conf. Once you do this and restart apache, monitor top and make sure you don't start swapping memory. If you do, you have set too high a number of workers.
If there is any other stuff running like mysql servers, make space for that before you compute number of workers you can have. If this number is small, to roughly quote a great man 'you are gonna need a bigger boat'. Just kidding. You might see really high memory usage for a http process like over 100m. You can tweak your the max requests per child lower to shorten the life of a http process. This could help clean up bloated http workers.
Another area to look at is time response time for a request... how long does each request take? For a quick check, use firebug plugin for firefox and look at the 'net' tab to see how long it takes for your initial request to respond back (not images and such). If for some reason request are taking more than 1 or 2 seconds to respond, that's a big problem as you get sort of a log jam. The cause of this could be php code, or mysql queries taking too long to respond. To address this, make sure if you're using wordpress to use some good caching plugin to lower the stress on mysql.
Honestly, though, unless your just not utilizing memory by having too few workers, optimizing your apache isn't something easily addressed in a short post without detail on your server (memory, cpu count, etc..) and your httpd.conf settings.
Note: if you don't have server access you'll have a hard time figuring out memory usage.

The process limit is typically something enforced by shared webhost providers, and generally has to do with the number of processes executing under your account. This will typically equate to the number of connections made to your server at once (assuming one PHP process per each connection).
There are many factors that come into play. You should figure out what that limit is from your hosting provider, and then find a new one that can handle your load.

How to get a rough estimate of LAMP application capacity?

I have a LAMP application running fine, however the number of users are increasing each day. I don't want to be hit with a surprise one morning and find that everything broke because of overload. Is there a way to get a rough estimate of what capacity of the LAMP it is at?
I know that a full detailed report is many books worth of study but can I get some quick litmus test to see if things are running fine.
So say for the mysql component, how can I tell how much more load can it take? Is it at 30% capacity, 50%? etc.
Same for my apache. Although I have a feeling the DB will die before apache.
Perhaps my original was not too good, as English is not my native language. What I am really asking is a way to measure the current load. And then have a way to estimate based on the that load, how much further can I go before it fails. (And this should be done seperately for each component, mysqld, httpd)

ab is a bit annoying if your site needs cookies, etc, ab is too simple.
Basically, from my experience in fixing several imploding PHP websites, it usually goes like this :
1) People use MySQL
You can totally use MySQL, facebook and flickr do it (mysql fanboys love those) IF YOU KNOW THE GOTCHAS which are :
If you have a non-read-only MyISAM table and any query longer than 100 us (even selects) you are dead
On one site I fixed, the guy had rented a double-quad-core server because "his site needs the power". I look at his site, I look at my previous site with > 100K members and a torrent tracker which ran on a Via C7 micro-half-pizzabox server, and I tell him, your site runs fine on the Celeron 300 that's in my basement, and that's even overkill, I can rent it to you for half the price of your Xeon, lol.
It turned out that the guy was a good developer and a real nice guy but he sucked at MySQL, so his site had the typical Search Query From Hell that can kill any website :
10 search queries from hell per second (he had like 300K members on his illegal warez site)
search query from hell takes about 0.1 - 0.2 seconds
a little stream of concurrent updates to the same MyISAM table to spice things up
=> total serialization (MyISAM write locks) of all queries. 1 core 100%, 7 cores idle, loadavg > 1000 (yes he was using apache), page times > 30 seconds, the works.
Fix was easy : optimize the search query from hell, fix point 2) below, switch to InnoDB, switch to lighttpd. loadavg dropped to 0.02
2) UPDATEs
Noone is interested in page counters.
Issue 1 UPDATE for every page view and you are dead.
Add some MyISAM for more effects. Also a killer on InnoDB, not about locking, rather about sync disk IO waits.
3) FULLTEXT
MyISAM not usable for read-write tables because of locking.
MyISAM is as reliable as a ramdisk (in fact, less : you need an OS crash to corrupt a ramdisk, corrupting MyISAM tables just needs a MySQL crash or just hitting it too much concurrently, you'll get "unknown table engine error", I saw this many times)
FULLTEXT not available on InnoDB
Any insertion in a FULLTEXT index triggers almost a full index rebuild (when I inserted a
forum post it was rebuilding 400 MB of index)
==> If you need full text indexing, performance, and reliability, use Sphinx or Xapian.
I've not tried Sphinx (people say good things about it), but Xapian happily searches through 4GB of text in a snap.
4) People use apache.
This nicely combines with the points above.
Unlike a proper server like lighttpd whose CPU usage is undetectable (the crummy Via C7 was serving 100 HTTP hits/s and lighttpd used less than 1% CPU), apache will kill your box.
When the MySQL starts to die (it dies easily), clients start to hit F5 hard, and soon you have about 1000 apache processes, each holding a PHP interpreter, and each PHP interpreter holds an idle MySQL connection, waiting on a MyISAM lock, except one, which is doing some trivial UPDATE of your page view counter, but that takes some time, because the server is gone to lunch swapping, because of the 1000 apache and 1000 php and 1000 mysql processes.
Lighttpd uses no cpu for static pages. The only way for lighttpd to saturate your CPU is if you hit it hard with apachebench at like 20K requests/s. Then Lighttpd talks to a few, like 10 php-fcgi backends (2-4 per core is good) which talk to a few MySQL connections. Everything is a lot faster as a result, and when overloaded, it degrades gracefully, not explosively.
To get to the original question, you definitely want to profile your SQL queries. Add a query log to your PHP application which displays (only to you), the list of queries and the time they take, and also the time from the start of the PHP script to its end (header/footer includes are a good place for this).
For a complex page (excluding search) you'd expect about 3 ms MySQL and 3 ms PHP, that's a good target. You need a PHP compiled code cache of course.

For the current load, there are a couple things your can do. The most expensive, yet most detailed answers will be provided through a enterprise application such as "Gomez".
However, if you're looking to do this yourself, see my previous answers below or use shell utilities such as: htop, top, w, and utilize Apache server-status
Previous answers before question revision:
What you are asking for is sometimes called application profiling.
You need to create a rough memory formula like:
httpd ram + php memory usage + mysql process usage = total request memory footprint
You will also need a CPU formula, but you can also eyeball top during a load test.
Apache has the command 'ab'.
"ab is a tool for benchmarking your Apache Hypertext Transfer Protocol (HTTP) server. It is designed to give you an impression of how your current Apache installation performs. This especially shows you how many requests per second your Apache installation is capable of serving." http://httpd.apache.org/docs/2.0/programs/ab.html
Here is a generic 'ab' benchmark command line:
ab -n 10 -c 1 http://www.yoursite.com/
# qty 10 total requests, 1 request at a time
The strategy is to test the per process (user) load on your application from the web page request through completion. If you can identify how much ram Apache, PHP, and MySQL uses for each request, then you can quickly identify your system capacity.
You'll probably have to use a mix of diagnostic tools like vmstat or top or iostat or ps, etc. to take a snapshot of what a number of requests will demand from your system.
Finally, you are going to want to install Xdebug. This tool will help you profile the php side of the application.
http://xdebug.org/
Here is IBM's tutorial on installing Xdebug:
http://www.ibm.com/developerworks/opensource/library/os-php-fastapps2/

jmeter multiple users problem

We are using Jmeter to test our Php application running on the Apache 2 web server. I can load up Jmeter to use 25 or 50 threads and the load on the server does not increase, however the response time from the server does. The more threads the slower the response time. It seems like Jmeter or Apache is queuing the requests. I have changed the maxclients value in apache web server configuration file, but this does not change the problem. While Jmeter is running I can use the application and get respectable response times. What gives? I would expect to be able to tax my server down to 0% idle by increase the number of threads. Can anyone help point me in the right direction?
Update: I found that if I remove sessions from my application I am able to simulate a full load on the server. I have tried to re-enable sessions and use an HTTP Cookie Manager for each thread, but it does not seem to make an impact.

You need to identify where the bottleneck is occurring, and then attempt to remediate the problem.
The JMeter client should be running on a well equipted machine. I prefer a Solaris/Unix server running the JVM, but for <200 threads, a modern windows machine will do just fine. JMeter can become a bottleneck, and you won't get any meaningful results once it does. Additionally, it should run on a separate machine to what your testing, and preferable on the same network. The WAN latency can become a problem if your test rig and server are far apart.
The second thing to check is your Apache workers. Apache has a module - mod_status - which will show you the state of every worker. It's possible to have your pool size set too low. From the mod_status, you'll be able to see how many workers are in use. To few, and Apache won't have any workers to process requests, and the requests will queue up. Too many, and Apache may exhaust the memory on the box it's running on.
Next, you should check your database. If it's on a separate machine, the database could have an IO or CPU shortage.
If your hitting a bottleneck, and the server and db are on the same machine, you'll generally hit a CPU, RAM, or IO limit. I listed those in the order in which they are easiest to identify. If you get a CPU bound app, you can easily see you CPU usage go to 100%. If you run out of RAM, your machine will start swapping. On both Windows and unix it's fairly easy to see your available free RAM. Lastly, you may be IO bound. This too can be monitored using various tools or stats, but it's not as obvious as CPU.
Lastly, specifically to your question, the one thing that stands out is it's possible to have a huge number of session files stored in a single directory. Often PHP stores session information in files. If this directory gets large, it will take increasingly long amount of time for PHP to find the session. If you ran your test will cookies turned off, the PHP app may have created thousands of session files for each user request. On a Windows server, it will slow down faster than on a unix server, do to differences in the way directories are stored on the two operating systems.

Are you using a constant throughput timer? If Jmeter can't service the throughput with the threads allocated to it, you'll see this queueing and blowouts in the response time. To figure out if this is the problem, try adding more threads.
I also found a report of this happening when there are javascript calls inside the script. In this instance, try to move javascript calls to the test plan element at the top of the script, or look for ways to pre-calculate the value.

Try checking a static file served by apache and not by PHP to see if the problem is in the Apache config or the PHP config.
Also check your network connections and configuration. Our JMeter testing was progressing nicely until it hit a wall. Eventually realized we only had a 100Mb connection and it was saturated, going to gigabit fixed it. Your network cards or switch may be running at a lower speed than you think, especially if their speed setting is "auto".

How to fix Apache instability?

I have configured a simple LAMP stack on Debian and I am experiencing some problems with the Apache web server.
Each 3-4 hours the web server is entering a deadlock and all the requests that hit the database block. The server is creating a new child for each request. The number of processes increases very quickly. After a few seconds Monit notices something is wrong and restarts the Apache server.
I suspect this problem is generated by the way PHP handles database connection pooling because the server is still able to answer static content requests. Have you experienced this kind of behavior? What should I try to do?
Update: Problem solved. It seems it's a bad idea to use APC for opcode caching and user data. I am now using Memcache for storing user data and APC only for code. I still get some segmentation faults from time to time but the server is most of the time stable.

I would suspect that the problems are:
A difficult long-running database query which blocks further requests. This is fairly easy if you're using the MySQL MyISAM engine which has only table-level locking and readers can easily block writers and vice versa, so a single tricky query on, say a user table, can pretty much block the entire server while the database waits for I/O. You can usually diagnose this by using "SHOW PROCESSLIST" or a tool which does this for you.
Having set MaxClients much too high for the RAM available on a prefork server - almost everyone does this. If you are using a "fat" prefork Apache (e.g. with in-process PHP), then don't set MaxClients higher than you have enough ram for. This is probably a lot less than typical values of 100 or 150.
These two things conspire to cause the issue you're seeing. They both need to be fixed as they can cause problems alone.
This is based entirely on guesswork and experience.

Why don't you have a look at the logs? /var/log/apache2/* is a good place to start. What is requested just before the server dies? From there on, you can probably deduce what's going wrong. As php scripts are terminated after 30 seconds by default, the mistake needs to be quite massive to cause something like that.

Check your timeout settings in /etc/apache2/apache2.conf, I have seen similar problems when Timeout is set high and the system gets hit with a bunch of dropped connections.

The mysql-slow log is also useful for finding slow problem-causing queries.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.