Apache VERY high page load time - php

My Drupal 6 site has been running smoothly for years but recently has experienced intermittent periods of extreme slowness (10-60 sec page loads). Several hours of slowness followed by hours of normal (4-6 sec) page loads. The page always loads with no error, just sometimes takes forever.
My setup:
Windows Server 2003
Apache/2.2.15 (Win32) Jrun/4.0
PHP 5
MySql 5.1
Drupal 6
ColdFusion 9
Vmware virtual environment
DMZ behind a corporate firewall
Traffic: 1-3 hits/sec peak
Troubleshooting
No applicable errors in apache error log
No errors in drupal event log
Drupal devel module shows 242 queries in 366.23 milliseconds,page execution time 2069.62 ms. (So it looks like queries and php scripts are not the problem)
NO unusually high CPU, memory, or disk IO
Cold fusion apps, and other static pages outside of drupal also load slow
webpagetest.org test shows very high time-to-first-byte
The problem seems to be with Apache responding to requests, but previously I've only seen this behavior under 100% cpu load. Judging solely by resource monitoring, it looks as though very little is going on.
Here is the kicker - roughly half of the site's access comes from our LAN, but if I disable the firewall rule and block access from outside of our network, internal (LAN) access (1000+ devices) is speedy. But as soon as outside access is restored the site is crippled.
Apache config? Crawlers/bots? Attackers? I'm at the end of my rope, where should I be looking to determine where the problem lies?
------Edit:-----
Attached is a waterfall chart from webpagetest.org showing a 15 second load time. I've seen times as high as several minutes. And again, the server runs fine much of the time. The green areas indicate that the browser has sent a request and is waiting to recieve the first byte of data back from the server. This is certainly a back-end delay, but it is puzzling that the CPU is barely used during this slowness.
(Not enough rep to post an image, see https://webmasters.stackexchange.com/questions/54658/apache-very-high-page-load-time
------Edit------
On the Apache side of things - Is this possibly a ThreadsPerChild issue?

After much research, I may have found the solution. If I'm correct, it was an apache config problem. Specifically, the "ThreadsPerChild" directive. See... http://httpd.apache.org/docs/2.2/platform/windows.html
Because Apache for Windows is multithreaded, it does not use a
separate process for each request, as Apache can on Unix. Instead
there are usually only two Apache processes running: a parent process,
and a child which handles the requests. Within the child process each
request is handled by a separate thread.
ThreadsPerChild: This directive is new. It tells the server how many
threads it should use. This is the maximum number of connections the
server can handle at once, so be sure to set this number high enough
for your site if you get a lot of hits. The recommended default is
ThreadsPerChild 150, but this must be adjusted to reflect the greatest
anticipated number of simultaneous connections to accept.
Turns out, this directive was not set at all in my config and thus defaulted to 64. I confirmed this by viewing the number of threads for the second httpd.exe process in task manager. When the server was hitting more than 64 connections, the excess requests were simply having to wait for a thread to open up. I added ThreadsPerChild 150 in my httpd.conf.
Additionally, I enabled the apache status module
http://httpd.apache.org/docs/2.2/mod/mod_status.html
...which, among other things, allows one to see the total number of active request on the server at any given moment. Right away, I could see spikes of up to 80 active request. Time will tell, but I'm confident that this will resolve my issue. So far, 30 hours without a hiccup.

Apache is too bulk and clumsy for "1-3 hits/sec avg".
Once I have similar problem with much lighter (almost static-html, no DB) site, and similar hits/second.
No errors, no high network/CPU/memory/disk loads. Apache on WinXP.
I inserted nginx before Apache for static files and it started working like a charm.

Caching. The solution it caching.
Drupal (in common with most other large CMS platforms) has a tendency toward this kind of thing due to its nature -- every page is built on the fly, constructed from a whole stack of database tables and code modules. The more you've got in there, the slower it will be, but even fairly simple pages can become horribly slow if your site gets a bit of traffic.
Drupal has a page cache mechanism built-in which will cut your load dramatically. As long as your pages are static (ie no dynamic content) then you can simply switch on caching and watch the performance go right back up.
If you have dynamic content, you can still enable caching for the static parts of the page. It is a bit more complex (and beyond the scope of this answer), but it is worth the effort.
If that's still not enough, a server-based caching solution such as Varnish will definitely help.

Related

Long Polling with Ajax and PHP - Apache freezes

We try to implement long-polling based notification service in our company's ERP. Similar to Facebook notifications.
Technologies used:
PHP with timeout set to 60 seconds and 1 second sleep in each iteration of loop.
jQuery for AJAX handling.
Apache as web server.
After nearly month of coding, we went to production. Few minutes after deployment we had to rollback everything. It turned out that our server (8 cores) couldn't handle long requests from 20 employees, using ~5 browser tabs each.
For example: User opened 3 tabs with our ERP, with one long-polling AJAX on each tab. Opening 4th tab is impossible - it hangs until one of previous 3 is killed (and therefore AJAX is stopped).
'Apache limitations', we thought. So we went googling. I found some info about Apache's MPM modules and configs, so I gave it a try. Our server use prefork MPM, as apachectl -l shown us. So I changed few lines in config to look something like this:
<IfModule mpm_prefork_module>
StartServers 1
MinSpareServers 16
MaxSpareServers 32
ServerLimit 50%
MaxClients 150
MaxClients 50%
MaxRequestsPerChild 0
</IfModule>
Funny thing is, it works on my local machine with similar config. On server, it looks like Apache ignores config, because with MinSpareServers set to 16, it lauches 8 after restart. Whe have no idea what to do.
Passerby in first comment of previous post gave me good direction to check out if we hit max browser connections to one server.
As it turns out, each browser has those limit and you can't change them (as far as I know).
We made a workaround to make it work.
Let's assume that I was getting AJAX data from
http://domain.com/ajax
To avoid hitting max browser connections, each long-polling AJAX connects to random subdomain, like:
http://31289.domain.com/ajax
http://43289.domain.com/ajax
and so on. There's a wildcard on a DNS server pointing from *.domain.com to domain.com, and subdomain is unique random number, generated by JS on each tab.
For more information, check out this thread.
There's been also some problems with AJAX Same Origin Security, but we managed to work it out, using appropriate headers on both JS and PHP sides.
If you want to know more about headers, check it out here on StackOverflow, and here on Mozilla Developer's page. Thanks!
I have successfully implemented a LAMP setup with long polling. Two things to keep in mind, the php internal execution clock for linux is not altered or incremented by the 'usleep()' function. Therefore, setting the maximum execution time would only be needed for rare edge cases where obtaining the data takes longer than normal, or possibly for a windows setup. In addition, with long polling bare in mind that once you go over 20+ seconds, you are vulnerable to having browser timeouts occur.
Secondly, you will need to verify that your sessions aren't locking up (if sessions are being used).
Apache really shouldn't have any issue with what you are looking to do. Though, I will admit that webservers like nginx or an ajax-specific webserver may handle the concurrent connections better. If you could post your code for the ajax handler, we might be able to figure out where the problem is.
Utilizing subdomains, or as other threads have suggested -- multiple webservers on separate ports, remember that you may encounter JavaScript domain security issues.
I say, don't change apache config until you encounter an issue and have exhausted all other options; be careful with the PHP sessions, and make sure AJAX is waiting for a response, before sending another request ;)

Reduce Initial Connection & TTFB Time

Similar Question: Here
Website: Cleanfiles PPD Network
Raw Server Link (Skip DNS): http://173.247.246.58/
Waterfall view (Webpage Test):
I recently moved to a new server. All PHP scripts and resources stayed exactly the same. The new server is an Inmotion Elite Dedicated Server.
Average server load:
Server load 1.25 (8 CPUs)
Memory Used 14.14% (1,137,916 of 8,048,804)
Swap Used 0% (0 of 4,095,992)
As a network owner, having a quick and nifty site is a top priority. I can't afford to have 2-4 seconds of random waiting time for my members when navigating through-out pages. The old server never did this, it loaded fine.
Since the server load appears to be fine and the PHP scripts are the same, I want to assume it is something with some Apache settings or something like that. I really cannot tell. I tried running the two scripts listed in the Top Answer of the question posted above, but both had long wait times...
I talked to the hosting company but they didn't really know what was going on. Any help with this issue or tests that I can do would be greatly appreciated :)
Probably the most effective solution is to use a CDN with native HTML caching capabilities (static and dynamic). TTFB relies on your ability to quickly process the HTML on the origin server, you can skip processing time altogether by serving a fresh cached copy from CDN.
I wrote a post about it recently, which looks into TTFB delay factors and average load time of different resources (based on data gathered across 1B sessions). You may find it useful: http://www.incapsula.com/the-incapsula-blog/item/809-using-cdn-to-improve-seo-and-ttfb

What can be causing an "exceeded process limit" error?

I launched a website about a week ago and I sent out an email blast to a mailing list telling everyone the website was live. Right after that the website went down and the general error log was flooded with "exceeded process limit" errors. Since then, I've tried to really clean up a lot of the code and minimize database connections. I will still see that error about once a day in the error log. What could be causing this error? I tried to call the web host and they said it had something to do with my code but couldn't point me in any direction as to what was wrong with the code or which page was causing the error. Can anyone give me any more information? Like for instance, what is a process and how many processes should I have?
Wow. Big question.
Obviously, your maxing out your apache child worker processes. To get a rough idea of how many you can create, use top to get the rough memory footprint of one http process. If you are using wordpress or another cms, it could easily be 50-100m each (if you're using the php module for apache). Then, assuming the machine is only used for web serving, take your total memory, subtract a chunk for OS use, then divide that by 100m (in this example). Thats the max worker processes you can have. Set it in your httpd.conf. Once you do this and restart apache, monitor top and make sure you don't start swapping memory. If you do, you have set too high a number of workers.
If there is any other stuff running like mysql servers, make space for that before you compute number of workers you can have. If this number is small, to roughly quote a great man 'you are gonna need a bigger boat'. Just kidding. You might see really high memory usage for a http process like over 100m. You can tweak your the max requests per child lower to shorten the life of a http process. This could help clean up bloated http workers.
Another area to look at is time response time for a request... how long does each request take? For a quick check, use firebug plugin for firefox and look at the 'net' tab to see how long it takes for your initial request to respond back (not images and such). If for some reason request are taking more than 1 or 2 seconds to respond, that's a big problem as you get sort of a log jam. The cause of this could be php code, or mysql queries taking too long to respond. To address this, make sure if you're using wordpress to use some good caching plugin to lower the stress on mysql.
Honestly, though, unless your just not utilizing memory by having too few workers, optimizing your apache isn't something easily addressed in a short post without detail on your server (memory, cpu count, etc..) and your httpd.conf settings.
Note: if you don't have server access you'll have a hard time figuring out memory usage.
The process limit is typically something enforced by shared webhost providers, and generally has to do with the number of processes executing under your account. This will typically equate to the number of connections made to your server at once (assuming one PHP process per each connection).
There are many factors that come into play. You should figure out what that limit is from your hosting provider, and then find a new one that can handle your load.

php5-fpm children and requests

I have a question.
I own a 128mb vps with a simple blog that gets just a hundred hits per day.
I have nginx + php5-fpm installed. Considering the low visits and the ram I decided to set fpm to static with 1 server running. While I was doing my random tests like running php scripts through http that last over 30 minutes I tried to open the blog in the same machine and noticed that the site was basically unreachable. So I went to the configuration and read this:
The number of child processes to be created when pm is set to 'static' and the
; maximum number of child processes to be created when pm is set to 'dynamic'.
; **This value sets the limit on the number of simultaneous requests that will be
; served**
What shocked me the most was that I didn't know because I always assumed that a php children would handle hundreds of requests at the same time like a http server would do!
Did it get it right?
If for example I launch 2 php-fpm children and launch 2 "long scripts" at the same time all the sites using the same php backend will be unreachable?? How is this usable?
You may think: -duh! a php script (web page) is usually processed in 100ms- ... no doubt about that but what happens if you have pages that could run for about 10 secs each and I have 10 visitors with php-fpm with 5 servers so accepting only 5 requests per time at the same time? They'll all be queued or will experience timeouts?
I'm honestly used to run sites in Windows with Apache and mod_php I never experienced these issues because apparently those limits don't apply being a different way of using PHP.
This also raises another question. If I have file_1.php with sleep(20) and file_2.php with just an echo, if I run file_1 and then file_2 with the fastcgi machine the second file will request the creation of another server to handle the php request using 4MB RAM more. If I do the same with apache/mod_php the second file will only use 30KB more of RAM (in the apache server). Considering this why is mod_php is considering the "bad guy" if the ram used is actually less...I know I'm missing the big picture here.
You've basically got it right. You configured a static number of workers (and that number was "one") -- so that's exactly what you got.
But you don't understand quite how things typically work, since you say:
I always assumed that a php children would handle hundreds of requests
at the same time like a http server would do!
I'm not really familiar with nginx, but consider the typical mod_php setup in apache. If you're using mod_php, then you're using the prefork mpm for apache. So every concurrent http requests is handled by a distinct httpd process (no threads). If you're tuning your apache/mod_php server for low-memory, you're going to have to tweak apache settings to limit the number of processes it will spawn (in particular, MaxClients).
Failing to tune this stuff means that when you get a large traffic spike, apache starts spawning a huge number of heavy processes (remember, it's mod_php, so you have the whole PHP interpreter embedded in each httpd process), and you run out of memory, and then everything starts swapping, and your server starts emitting smoke.
Tuned properly (meaning: tuned so that you ignore requests instead of allocating memory you don't have for more processes), clients will time out, but when traffic subsides, things go back to normal.
Compare that with fpm, and a smarter web server architecture like apache-worker, or nginx. Now you have some, much larger, pool of threads (still configurable!) to handle http requests, and a separate pool of php-fpm processes to handle just the requests that require PHP. It's basically the same thing, if you don't set limits on how many processes/threads can be created, you are asking for trouble. But if you do tune, you come out ahead, since only a fraction of your requests use PHP. So essentially, the average amount of memory needed per http requests is lower -- thus you can handle more requests with the same amount of memory.
But setting the number to "1" is too extreme. At "1", it doesn't even matter if you choose static or dynamic, since either way you'll just have one php-fpm process.
So, to try to give explicit answers to particular questions:
You may think: -duh! a php script (web page) is usually processed in 100ms- ... no doubt about that but what happens if you have pages that could run for about 10 secs each and I have 10 visitors with php-fpm with 5 servers so accepting only 5 requests per time at the same time? They'll all be queued or will experience timeouts?
Yes, they'll all queue, and eventually timeout. The fact that you regularly have scripts that take 10 seconds to run is the real culprit here, though. There are lots of ways to architect around that (caching, work queues, etc), but the right solution depends entirely on what you're trying to do.
I'm honestly used to run sites in Windows with Apache and mod_php I never experienced these issues because apparently those limits don't apply being a different way of using PHP.
They do apply. You can set up an apache/mod_php server the same way as you have with nginx/php-fpm -- just set apache's MaxClients to 1!
This also raises another question. If I have file_1.php with sleep(20) and file_2.php with just an echo, if I run file_1 and then file_2 with the fastcgi machine the second file will request the creation of another server to handle the php request using 4MB RAM more. If I do the same with apache/mod_php the second file will only use 30KB more of RAM (in the apache server). Considering this why is mod_php is considering the "bad guy" if the ram used is actually less...I know I'm missing the big picture here.
Especially on linux, lots of things that report memory usage can be very misleading. But think about it this way: that 30kb is negligible. That's because most of PHP's memory was already allocated when some httpd process got started.
128MB VPS is pretty tight, but should be able to handle more than one php-process.
If you want to optimize, do something like this:
For PHP:
pm = static
pm.max_children=4
for nginx, figure out how to control processes and thread count (whatever the equivalent to apache's MaxClients, StartServers, MinSpareServers, MaxSpareServers)
Then figure out how to generate some realistic load (apachebench, siege, jmeter, etc). use vmstat, free, and top to watch your memory usage. Adjust pm.max_children and the nginx stuff to be as high as possible without causing any significant swap (according to vmstat)

jmeter multiple users problem

We are using Jmeter to test our Php application running on the Apache 2 web server. I can load up Jmeter to use 25 or 50 threads and the load on the server does not increase, however the response time from the server does. The more threads the slower the response time. It seems like Jmeter or Apache is queuing the requests. I have changed the maxclients value in apache web server configuration file, but this does not change the problem. While Jmeter is running I can use the application and get respectable response times. What gives? I would expect to be able to tax my server down to 0% idle by increase the number of threads. Can anyone help point me in the right direction?
Update: I found that if I remove sessions from my application I am able to simulate a full load on the server. I have tried to re-enable sessions and use an HTTP Cookie Manager for each thread, but it does not seem to make an impact.
You need to identify where the bottleneck is occurring, and then attempt to remediate the problem.
The JMeter client should be running on a well equipted machine. I prefer a Solaris/Unix server running the JVM, but for <200 threads, a modern windows machine will do just fine. JMeter can become a bottleneck, and you won't get any meaningful results once it does. Additionally, it should run on a separate machine to what your testing, and preferable on the same network. The WAN latency can become a problem if your test rig and server are far apart.
The second thing to check is your Apache workers. Apache has a module - mod_status - which will show you the state of every worker. It's possible to have your pool size set too low. From the mod_status, you'll be able to see how many workers are in use. To few, and Apache won't have any workers to process requests, and the requests will queue up. Too many, and Apache may exhaust the memory on the box it's running on.
Next, you should check your database. If it's on a separate machine, the database could have an IO or CPU shortage.
If your hitting a bottleneck, and the server and db are on the same machine, you'll generally hit a CPU, RAM, or IO limit. I listed those in the order in which they are easiest to identify. If you get a CPU bound app, you can easily see you CPU usage go to 100%. If you run out of RAM, your machine will start swapping. On both Windows and unix it's fairly easy to see your available free RAM. Lastly, you may be IO bound. This too can be monitored using various tools or stats, but it's not as obvious as CPU.
Lastly, specifically to your question, the one thing that stands out is it's possible to have a huge number of session files stored in a single directory. Often PHP stores session information in files. If this directory gets large, it will take increasingly long amount of time for PHP to find the session. If you ran your test will cookies turned off, the PHP app may have created thousands of session files for each user request. On a Windows server, it will slow down faster than on a unix server, do to differences in the way directories are stored on the two operating systems.
Are you using a constant throughput timer? If Jmeter can't service the throughput with the threads allocated to it, you'll see this queueing and blowouts in the response time. To figure out if this is the problem, try adding more threads.
I also found a report of this happening when there are javascript calls inside the script. In this instance, try to move javascript calls to the test plan element at the top of the script, or look for ways to pre-calculate the value.
Try checking a static file served by apache and not by PHP to see if the problem is in the Apache config or the PHP config.
Also check your network connections and configuration. Our JMeter testing was progressing nicely until it hit a wall. Eventually realized we only had a 100Mb connection and it was saturated, going to gigabit fixed it. Your network cards or switch may be running at a lower speed than you think, especially if their speed setting is "auto".

Categories