How to fix Apache instability?

How to fix Apache instability? - php

I have configured a simple LAMP stack on Debian and I am experiencing some problems with the Apache web server.
Each 3-4 hours the web server is entering a deadlock and all the requests that hit the database block. The server is creating a new child for each request. The number of processes increases very quickly. After a few seconds Monit notices something is wrong and restarts the Apache server.
I suspect this problem is generated by the way PHP handles database connection pooling because the server is still able to answer static content requests. Have you experienced this kind of behavior? What should I try to do?
Update: Problem solved. It seems it's a bad idea to use APC for opcode caching and user data. I am now using Memcache for storing user data and APC only for code. I still get some segmentation faults from time to time but the server is most of the time stable.

I would suspect that the problems are:
A difficult long-running database query which blocks further requests. This is fairly easy if you're using the MySQL MyISAM engine which has only table-level locking and readers can easily block writers and vice versa, so a single tricky query on, say a user table, can pretty much block the entire server while the database waits for I/O. You can usually diagnose this by using "SHOW PROCESSLIST" or a tool which does this for you.
Having set MaxClients much too high for the RAM available on a prefork server - almost everyone does this. If you are using a "fat" prefork Apache (e.g. with in-process PHP), then don't set MaxClients higher than you have enough ram for. This is probably a lot less than typical values of 100 or 150.
These two things conspire to cause the issue you're seeing. They both need to be fixed as they can cause problems alone.
This is based entirely on guesswork and experience.

Why don't you have a look at the logs? /var/log/apache2/* is a good place to start. What is requested just before the server dies? From there on, you can probably deduce what's going wrong. As php scripts are terminated after 30 seconds by default, the mistake needs to be quite massive to cause something like that.

Check your timeout settings in /etc/apache2/apache2.conf, I have seen similar problems when Timeout is set high and the system gets hit with a bunch of dropped connections.

The mysql-slow log is also useful for finding slow problem-causing queries.

Related

Apache performance log before it dies

Is there a way to log apache performance before it gets out of memory error? I have mod_status enabled and the tool is great but I want it to run maybe every 5 minutes so that when the server dies I would know what were the processes running at that time and their CPU/memory usage.

You should consider using a tool like Zabbix or Nagios to keep collecting those metrics.
Also take a look at Datadog, which offers a "very easy to set up" (but paid) solution to collect, visualize, and correlate this metric.
The point is to continously collect any related metrics, and when something bad happens, it helps you to pinpoint the root of the problems by correlating data (in this case, for example server load, and traffic served by apache)

Apache VERY high page load time

My Drupal 6 site has been running smoothly for years but recently has experienced intermittent periods of extreme slowness (10-60 sec page loads). Several hours of slowness followed by hours of normal (4-6 sec) page loads. The page always loads with no error, just sometimes takes forever.
My setup:
Windows Server 2003
Apache/2.2.15 (Win32) Jrun/4.0
PHP 5
MySql 5.1
Drupal 6
ColdFusion 9
Vmware virtual environment
DMZ behind a corporate firewall
Traffic: 1-3 hits/sec peak
Troubleshooting
No applicable errors in apache error log
No errors in drupal event log
Drupal devel module shows 242 queries in 366.23 milliseconds,page execution time 2069.62 ms. (So it looks like queries and php scripts are not the problem)
NO unusually high CPU, memory, or disk IO
Cold fusion apps, and other static pages outside of drupal also load slow
webpagetest.org test shows very high time-to-first-byte
The problem seems to be with Apache responding to requests, but previously I've only seen this behavior under 100% cpu load. Judging solely by resource monitoring, it looks as though very little is going on.
Here is the kicker - roughly half of the site's access comes from our LAN, but if I disable the firewall rule and block access from outside of our network, internal (LAN) access (1000+ devices) is speedy. But as soon as outside access is restored the site is crippled.
Apache config? Crawlers/bots? Attackers? I'm at the end of my rope, where should I be looking to determine where the problem lies?
------Edit:-----
Attached is a waterfall chart from webpagetest.org showing a 15 second load time. I've seen times as high as several minutes. And again, the server runs fine much of the time. The green areas indicate that the browser has sent a request and is waiting to recieve the first byte of data back from the server. This is certainly a back-end delay, but it is puzzling that the CPU is barely used during this slowness.
(Not enough rep to post an image, see https://webmasters.stackexchange.com/questions/54658/apache-very-high-page-load-time
------Edit------
On the Apache side of things - Is this possibly a ThreadsPerChild issue?

After much research, I may have found the solution. If I'm correct, it was an apache config problem. Specifically, the "ThreadsPerChild" directive. See... http://httpd.apache.org/docs/2.2/platform/windows.html
Because Apache for Windows is multithreaded, it does not use a
separate process for each request, as Apache can on Unix. Instead
there are usually only two Apache processes running: a parent process,
and a child which handles the requests. Within the child process each
request is handled by a separate thread.
ThreadsPerChild: This directive is new. It tells the server how many
threads it should use. This is the maximum number of connections the
server can handle at once, so be sure to set this number high enough
for your site if you get a lot of hits. The recommended default is
ThreadsPerChild 150, but this must be adjusted to reflect the greatest
anticipated number of simultaneous connections to accept.
Turns out, this directive was not set at all in my config and thus defaulted to 64. I confirmed this by viewing the number of threads for the second httpd.exe process in task manager. When the server was hitting more than 64 connections, the excess requests were simply having to wait for a thread to open up. I added ThreadsPerChild 150 in my httpd.conf.
Additionally, I enabled the apache status module
http://httpd.apache.org/docs/2.2/mod/mod_status.html
...which, among other things, allows one to see the total number of active request on the server at any given moment. Right away, I could see spikes of up to 80 active request. Time will tell, but I'm confident that this will resolve my issue. So far, 30 hours without a hiccup.

Apache is too bulk and clumsy for "1-3 hits/sec avg".
Once I have similar problem with much lighter (almost static-html, no DB) site, and similar hits/second.
No errors, no high network/CPU/memory/disk loads. Apache on WinXP.
I inserted nginx before Apache for static files and it started working like a charm.

Caching. The solution it caching.
Drupal (in common with most other large CMS platforms) has a tendency toward this kind of thing due to its nature -- every page is built on the fly, constructed from a whole stack of database tables and code modules. The more you've got in there, the slower it will be, but even fairly simple pages can become horribly slow if your site gets a bit of traffic.
Drupal has a page cache mechanism built-in which will cut your load dramatically. As long as your pages are static (ie no dynamic content) then you can simply switch on caching and watch the performance go right back up.
If you have dynamic content, you can still enable caching for the static parts of the page. It is a bit more complex (and beyond the scope of this answer), but it is worth the effort.
If that's still not enough, a server-based caching solution such as Varnish will definitely help.

What can be causing an "exceeded process limit" error?

I launched a website about a week ago and I sent out an email blast to a mailing list telling everyone the website was live. Right after that the website went down and the general error log was flooded with "exceeded process limit" errors. Since then, I've tried to really clean up a lot of the code and minimize database connections. I will still see that error about once a day in the error log. What could be causing this error? I tried to call the web host and they said it had something to do with my code but couldn't point me in any direction as to what was wrong with the code or which page was causing the error. Can anyone give me any more information? Like for instance, what is a process and how many processes should I have?

Wow. Big question.
Obviously, your maxing out your apache child worker processes. To get a rough idea of how many you can create, use top to get the rough memory footprint of one http process. If you are using wordpress or another cms, it could easily be 50-100m each (if you're using the php module for apache). Then, assuming the machine is only used for web serving, take your total memory, subtract a chunk for OS use, then divide that by 100m (in this example). Thats the max worker processes you can have. Set it in your httpd.conf. Once you do this and restart apache, monitor top and make sure you don't start swapping memory. If you do, you have set too high a number of workers.
If there is any other stuff running like mysql servers, make space for that before you compute number of workers you can have. If this number is small, to roughly quote a great man 'you are gonna need a bigger boat'. Just kidding. You might see really high memory usage for a http process like over 100m. You can tweak your the max requests per child lower to shorten the life of a http process. This could help clean up bloated http workers.
Another area to look at is time response time for a request... how long does each request take? For a quick check, use firebug plugin for firefox and look at the 'net' tab to see how long it takes for your initial request to respond back (not images and such). If for some reason request are taking more than 1 or 2 seconds to respond, that's a big problem as you get sort of a log jam. The cause of this could be php code, or mysql queries taking too long to respond. To address this, make sure if you're using wordpress to use some good caching plugin to lower the stress on mysql.
Honestly, though, unless your just not utilizing memory by having too few workers, optimizing your apache isn't something easily addressed in a short post without detail on your server (memory, cpu count, etc..) and your httpd.conf settings.
Note: if you don't have server access you'll have a hard time figuring out memory usage.

The process limit is typically something enforced by shared webhost providers, and generally has to do with the number of processes executing under your account. This will typically equate to the number of connections made to your server at once (assuming one PHP process per each connection).
There are many factors that come into play. You should figure out what that limit is from your hosting provider, and then find a new one that can handle your load.

Scaling a PHP website

Is there a standard solution to scale up a website which runs on PHP + Apache web server ?
As in I get a traffic of about 100,000 requests/day as of now. 6 months down the line I expect it to grow to 200,000 requests/day. The first cut solution which comes to my mind is deploying more Apache web servers with mod_php, but something seems so wrong about it.
Any ideas ?

Try these two options first before adding new servers. They may allow you to stick with one server, but your results may vary.
For speeding the site up when you are hit with many concurrent users, look into installing the APC PECL extension (http://us2.php.net/manual/en/book.apc.php). APC will allow you to cache the compiled version of your scripts, saving the step of the PHP interpreter running each time a script is executed.
Also, if you are experiencing heavy load on the database server, look into installing memcached and caching database results for a certain time period, if possible (http://us2.php.net/manual/en/book.memcache.php).
Finally, if you do decide to get a separate server, look into possibly getting a dedicated SQL box. This, of course, assumes that your application is a database heavy application, as web apps are these days. Segregating SQL into a separate box allows it to take advantage of all of the resources on that box, with more cache and processing power. It could be the way to go.

i don't have any experience with scaling realy large websites, but i don't think you'll need so scale to different servers in this case. i have a browsergame with 40.000-60.000 requests per day, some cronjobs doing a lot of stuff every 5 minutes and a teamspeak-server on a small server (40 $ / month) and havn't got any performance problems till now.

20.000 requests / day is only one every fifth second, sounds like one box should be able to deal with that just fine? If not I'd first have a look at bottlenecks in your code. Redundant database calls? Double-looping database calls rather than simple joins? Are you caching anything?
How to scale after this is totally dependent on your application, how/where do you keep session state and so forth, general advice has limited applicability.

if you like it then you should have put a cache on it

jmeter multiple users problem

We are using Jmeter to test our Php application running on the Apache 2 web server. I can load up Jmeter to use 25 or 50 threads and the load on the server does not increase, however the response time from the server does. The more threads the slower the response time. It seems like Jmeter or Apache is queuing the requests. I have changed the maxclients value in apache web server configuration file, but this does not change the problem. While Jmeter is running I can use the application and get respectable response times. What gives? I would expect to be able to tax my server down to 0% idle by increase the number of threads. Can anyone help point me in the right direction?
Update: I found that if I remove sessions from my application I am able to simulate a full load on the server. I have tried to re-enable sessions and use an HTTP Cookie Manager for each thread, but it does not seem to make an impact.

You need to identify where the bottleneck is occurring, and then attempt to remediate the problem.
The JMeter client should be running on a well equipted machine. I prefer a Solaris/Unix server running the JVM, but for <200 threads, a modern windows machine will do just fine. JMeter can become a bottleneck, and you won't get any meaningful results once it does. Additionally, it should run on a separate machine to what your testing, and preferable on the same network. The WAN latency can become a problem if your test rig and server are far apart.
The second thing to check is your Apache workers. Apache has a module - mod_status - which will show you the state of every worker. It's possible to have your pool size set too low. From the mod_status, you'll be able to see how many workers are in use. To few, and Apache won't have any workers to process requests, and the requests will queue up. Too many, and Apache may exhaust the memory on the box it's running on.
Next, you should check your database. If it's on a separate machine, the database could have an IO or CPU shortage.
If your hitting a bottleneck, and the server and db are on the same machine, you'll generally hit a CPU, RAM, or IO limit. I listed those in the order in which they are easiest to identify. If you get a CPU bound app, you can easily see you CPU usage go to 100%. If you run out of RAM, your machine will start swapping. On both Windows and unix it's fairly easy to see your available free RAM. Lastly, you may be IO bound. This too can be monitored using various tools or stats, but it's not as obvious as CPU.
Lastly, specifically to your question, the one thing that stands out is it's possible to have a huge number of session files stored in a single directory. Often PHP stores session information in files. If this directory gets large, it will take increasingly long amount of time for PHP to find the session. If you ran your test will cookies turned off, the PHP app may have created thousands of session files for each user request. On a Windows server, it will slow down faster than on a unix server, do to differences in the way directories are stored on the two operating systems.

Are you using a constant throughput timer? If Jmeter can't service the throughput with the threads allocated to it, you'll see this queueing and blowouts in the response time. To figure out if this is the problem, try adding more threads.
I also found a report of this happening when there are javascript calls inside the script. In this instance, try to move javascript calls to the test plan element at the top of the script, or look for ways to pre-calculate the value.

Try checking a static file served by apache and not by PHP to see if the problem is in the Apache config or the PHP config.
Also check your network connections and configuration. Our JMeter testing was progressing nicely until it hit a wall. Eventually realized we only had a 100Mb connection and it was saturated, going to gigabit fixed it. Your network cards or switch may be running at a lower speed than you think, especially if their speed setting is "auto".

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.