I am running PHPUnit to test a CodeIgniter application using CIUnit (a third party interface between the two). A number of the tests select data from an empty MySQL database that is populated with 5-10 records in setUp(). On Windows and the web server (Ubuntu 10.04/Apache 2.2/MySQL 5.1/PHP 5.3), the 105 tests run in 2-3 seconds with a memory usage of around 30mb. On my local (Ubuntu 12.04/Apache 2.2/MySQL 5.5/PHP 5.3), the 105 tests run with the same memory usage, but take approx 45 seconds.
I have narrowed down the slowness to tests which utilise the database; are there any configuration settings that I am possibly missing that are making the tests run 15 times slower? If not, would my best bet be to try downgrading MySQL, or maybe even Ubuntu (I have already tried downgrading from 12.10 to 12.04)?
Any answers much appreciated.
You are most likely running into the performance hit created by barriers being default on in the ext4 filesystem. Read more of them here:
Here's what they do from the docs:
barrier=<0|1(*)>
the jbd code. barrier=0 disables, barrier=1 enables. This also
requires an IO stack which can support barriers, and if jbd gets an
error on a barrier write, it will disable again with a warning. Write
barriers enforce proper on-disk ordering of journal commits, making
volatile disk write caches safe to use, at some performance penalty.
If your disks are battery-backed in one way or another, disabling
barriers may safely improve performance.
You can try to remount your filesystem without them like this (use the mount point where the mysql data files are living)
mount -o remount,nobarrier /
In my environment this makes a Tests: 83, Assertions: 194 suite's runtime reduce from 48 seconds to 6.
I was able to significantly speed up my PHPUnit tests by following this suggestion [1]:
If you are using the InnoDB engine (the default on Fedora), try putting this in your my.cnf database configuration:
[mysqld]
...
innodb_flush_log_at_trx_commit=2
...
then restart your server.
It does seem to reduce the reliability of your database, IF you get a power loss WHILE writing etc. It was definitely an acceptable rick for my development machine, but I wouldn't recommend it for production. See more details about the reliability at [2].
[1] http://aventinesolutions.nl/mediawiki2/index.php/PHPUnit:_a_Quick_Way_to_Speed_Up_Test_Suites?goback=.gde_1685627_member_107087295
[2] http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
Related
I am in the process of upgrading our MongoDBs from 3.4 (using MMAPv1 storage engine) to 4.2 (using WiredTiger). One thing I have encountered that is pretty much a blocker at this point is a serious slowdown of our tests.
Long story short (more details below) - MongoDB 4.2 WiredTiger is taking much longer to process repeated database setup/teardown in tests. The slowdown is in the ballpark of a factor of 10. The tests used to run about 10 minutes, with 4.2 they run almost 90 minutes. This slowdown reproduces even with just a fraction of tests and seems to come from the setup/teardown stage of the testing.
Environment
A few words about our environment -- we are using PHP with Doctrine ODM to talk to MongoDB. We have about 3000 tests, some pure unit tests, some (many) functional, actually using the database. The tests are running in a Dockerized environment - we spin up a fresh MongoDB Docker container for each pipeline, but I have confirmed that the same slowdown occurs even in a production-like baremetal setting. The experiments below were done on bare metal, to limit problems coming from somewhere else.
Each functional test first drops the database, then loads fixtures into it (+ creates indices) and then the actual test scenario is executed.
Profiling PHP
Running a small subset of the tests and measuring the timing, I get these results:
3.4:
real 0m12.478s
user 0m7.054s
sys 0m2.247s
4.2:
real 0m56.669s
user 0m7.488s
sys 0m2.334s
As you can see, the actual CPU time taken by the tests is about the same, no significant difference there. The real time is very different though, which suggests a lot of waiting (for I/O in this case?).
I have further profiled the PHP code and I can see from the results that there is 9-10x increase in the time spent in this function:
MongoDB\Driver\Manager::executeWriteCommand()
The documentation for that function says:
This method will apply logic that is specific to commands that write (e.g. » drop)
That makes me think that the amount of setup/teardown (i.e. dropping collection, creating indexes) will be at play here.
Profiling MongoDB
Profiling PHP pointed at a slowdown in MongoDB so I profiled that as well. The subset of tests that I ran resulted in
1366 profiling documents for 3.4 MMAPv1
2092 profiling documents for 4.2 WiredTiger
Most of the disparity between those numbers can be attributed to the fact that in 4.2 there are no documents for createIndexes (maybe they were added to profiling post-3.4? I don't know).
I filtered the profiling documents to only show those which took at least 1 millisecond (>0). There were:
2 such documents for MongoDB 3.4 (two drop commands)
950+ such documents for MongoDB 4.2 (209x drop, 715x createIndexes, 4x insert, 23x query)
As I mentioned earlier, Mongo 3.4 does not seem to report createIndexes in the profiling. But let's assume all of those commands would take as long as they do in 4.2 (they will probably take shorter, based on the rest of the profiling results though).
Then there are all those drop commands that take up to 15 milliseconds per operation in 4.2. In 3.4 there are also 209 drop commands, but almost all of them are reported to have lasted 0 milliseconds.
There is only a minimal amount of inserting and querying and the size of the collections when those are happening is only a handful of documents (less than 10 per collection, less than 5 collections actually queried and inserted into). This slowdown is not a result of missing caches or indices. Even full scans would be fast in this setting.
Memory and hardware
Most of the discussion I have found regarding this have been around setting an appropriate cache size for the working sets. I ran the tests on a small server with a single core and 4GB RAM with the default cache size (which should be 50% of available memory, i.e. 2GB). That is definitely large enough for all the data the tests could have created. They were truly trivial and most of the time spent on them was on setup/teardown of the database state.
Conclusion
This is the first time I have profiled our tests and their interaction with the database. The ratio of drop-and-index-creation to actual work can definitely be improved, but it has worked so far with MMAPv1 and MongoDB 3.4. Is this type of a slowdown something that is expected with WiredTiger? Is there something I can do to mitigate this?
I am now afraid of upgrading the production MongoDB instances because I don't know how those will behave. If this is related mostly to index creation and database dropping, then I suppose production workload should be fine, but I don't want to take chances. Sadly we are a fairly small company and do not have any performance/stress tests of the production environment.
Edits
Using tmpfs
Since I'm running the tests in Docker and Docker supports tmpfs volumes out-of-the-box, I gave that a try. When using RAM-backed tmpfs as the mount for the MongoDB data, I managed to cut down the test time to about half:
4.2:
real 0m56.669s
user 0m7.488s
sys 0m2.334s
4.2 - tmpfs:
real 0m30.951s
user 0m7.697s
sys 0m2.279s
This is better, but still a far cry from the 12 seconds it takes to run on MMAPv1. Interestingly, using tmpfs with MMAPv1 did not yield a significantly different result.
The real cause of test slowdown - indices
Turns out that our testing framework and fixture loader created indices for all managed collections with each database purge. This resulted in about 100 index creations per test case and this was what caused the slowdown. I did not find a concrete proof directly from Mongo but it seems that index creation with WiredTiger is significantly slower than with MMAPv1. Removing the index creation from the test setup code sped up the tests significantly, getting us back to the pre-upgrade times.
The vast majority of our tests do not need the indexes and their creation takes much longer than the speedup in queries they provide. I implemented an option to enforce index creation for test cases where the developer knows they will need them. That is an acceptable solution for us.
Put the database's data into memory. On Linux, I recommend zram.
In my experience zram is 2x as fast as top of the line nvme ssd (samsung 860 pro I think) in raid 0 and is I think almost 10x as fast as a single consumer-grade laptop ssd. The difference should be even bigger for spinning disk or storage accessed over network.
MongoDB has various other storage engines (there's one called "ephemeral for test" I believe) but they don't support transactions, so you need to use WT if your application makes use of 4.2 (or even 4.0 I think) functionality.
In production you are most likely not dropping collections every request so actual performance difference between 3.x and 4.2 should be smaller.
Use ephemeralForTest engine!
Even though #d-sm mentioned this in their answer, I missed it so let me emphatise it for future readers.
If you simply need a quick storage engine to run your unit tests against, and you need to update MongoDB to v4.2+ (so mmapv1 engine is not an option anymore), you can use the ephemeralForTest engine instead.
This is NOT TO BE CONFUSED with the Enterprise-only In-Memory engine, and it was silently added in v3.2 (see changelog).
This engine is not officially supported in production and has some limitations (e.g. the lack of transactions support), but it's pretty close to mmapv1 in terms of performance for unit tests (which also lacked these features).
So maybe it won't fit all use-cases, but I'm sure it will be just enough for most, so give it a try before trying tmpfs or other solutions, since those will still not grant the same performances.
I am trying to figure out a situation where PHP is not consuming a lot of memory but instead causes a very high Committed_AS result.
Take this munin memory report for example:
As soon as I kick off our Laravel queue (10 ~ 30 workers), committed memory goes through the roof. We have 2G mem + 2G swap on this vps instance and so far there are about 600M unused memory (that's about 30% free).
If I understand Committed_AS correctly, it's meant to be a 99.9% guarantee no out of memory issue given current workload, and it seems to suggest we need to triple our vps memory just to be safe.
I tried to reduce the number of queues from 30 to around 10, but as you can see the green line is quite high.
As for the setup: Laravel 4.1 with PHP 5.5 opcache enabled. The upstart script we use spawn instance like following:
instance $N
exec start-stop-daemon --start --make-pidfile --pidfile /var/run/laravel_queue.$N.pid --chuid $USER --chdir $HOME --exec /usr/bin/php artisan queue:listen -- --queue=$N --timeout=60 --delay=120 --sleep=30 --memory=32 --tries=3 >> /var/log/laravel_queue.$N.log 2>&1
I have seen a lot of cases when high swap use imply insufficient memory, but our swap usage is low, so I am not sure what troubleshooting step is appropriate here.
PS: we don't have this problem prior to Laravel 4.1 and our vps upgrade, here is an image to prove that.
Maybe I should rephrase my question as: how are Committed_AS calculated exactly and how does PHP factor into it?
Updated 2014.1.29:
I had a theory on this problem: since laravel queue worker actually use PHP sleep() when waiting for new job from queue (in my case beanstalkd), it would suggest the high Committed_AS estimation is due to the relatively low workload and relatively high memory consumption.
This make sense as I see Committed_AS ~= avg. memory usage / avg. workload. As PHP sleep() properly, little to no CPU are used; yet whatever memory it consumes is still reserved. Which result in server thinking: hey, you use so much memory (on average) even when load is minimal (on average), you should be better prepared for higher load (but in this case, higher load doesn't result in higher memory footprint)
If anyone would like to test this theory, I will be happy to award the bounty to them.
Two things you need to understand about Committed_AS,
It is an estimate
It alludes how much memory you would need in a worst case scenario (plus the swap). It is dependent on your server workload at the time. If you have a lower workload then the Committed_AS will be lower and vice versa.
If this wasn't an issue with the prior iteration of the framework queue and provided you haven't pushed any new code changes to the production environment, then you will want to compare the two iterations. Maybe spin up another box and run some tests. You can also profile the application with xdebug or zend_debugger to discover possible causal factors with the code itself. Another useful tool is strace.
All the best, you're going to need it!
I have recently found the root cause to this high committed memory problem: PHP 5.5 OPcache settings.
Turns out putting opcache.memory_consumption = 256 cause each PHP process to reserve much more virtual memory (can be seen at VIRT column in your top command), thus result in Munin estimating the potential committed memory to be much higher.
The number of laravel queues we have running in background only exaggerate the problem.
By putting opcache.memory_consumption to the recommended 128MB (we really weren't using all those 256MB effectively), we have cutted the estimating value in half, coupled with recent RAM upgrade on our server, the estimation is at around 3GB, much more reasonable and within our total RAM limit
Committed_AS is the actual size that the kernel has actually promised to processes. And queues runs independently and has nothing to do with PHP or Laravel. In addition to what Rijndael said, I recommend installing New Relic which can be used to find out the problem.
Tip: I've noticed a huge reduction in server-load with NginX-HHVM combination. Give it a try.
We try to deploy APC user-cache in a high load environment as local 2nd-tier cache on each server for our central caching service (redis), for caching database queries with rarely changing results, and configuration. We basically looked at what Facebook did (years ago):
http://www.slideshare.net/guoqing75/4069180-caching-performance-lessons-from-facebook
http://www.slideshare.net/shire/php-tek-2007-apc-facebook
It works pretty well for some time, but after some hours under high load, APC runs into problems, so the whole mod_php does not execute any PHP anymore.
Even a simple PHP script with only does not answer anymore, while static resources are still delivered by Apache. It does not really crash, there is no segfault. We tried the latest stable and latest beta of APC, we tried pthreads, spin locks, every time the same problem. We provided APC with far more memory it can ever consume, 1 minute before a crash we have 2% fragmentation and about 90% of the memory is free. When it „crashes“ we don’t find nothing in error logs, only restarting Apache helps. Only with spin locks we get an php error which is:
PHP Fatal error: Unknown: Stuck spinlock (0x7fcbae9fe068) detected in
Unknown on line 0
This seems to be a kind of timeout, which does not occur with pthreads, because those don’t use timeouts.
What’s happening is probably something like that:
http://notmysock.org/blog/php/user-cache-timebomb.html
Some numbers: A server has about 400 APC user-cache hits per second and about 30 inserts per second (which is a lot I think), one request has about 20-100 user-cache requests. There are about 300.000 variables in the user-cache, all with ttl (we store without ttl only in our central redis).
Our APC-settings are:
apc.shm_segments=1
apc.shm_size=4096M
apc.num_files_hint=1000
apc.user_entries_hint=500000
apc.max_file_size=2M
apc.stat=0
Currently we are using version 3.1.13-beta compiled with spin locks, used with an old PHP 5.2.6 (it’s a legacy app, I’ve heard that this PHP version could be a problem too?), Linux 64bit.
It's really hard to debug, we have written monitoring scripts which collect as much data as we could get every minute from apc, system etc., but we cannot see anything uncommon - even 1 minute before a crash.
I’ve seen a lot of similar problems here, but by now we couldn’t find a solution which solves our problem yet. And when I read something like that:
http://webadvent.org/2010/share-and-enjoy-by-gopal-vijayaraghavan
I’m not sure if going with APC for a local user-cache is the best idea in high load environments. We already worked with memcached here, but APC is a lot faster. But how to get it stable?
best regards,
Andreas
Lesson 1: https://www.kernel.org/doc/Documentation/spinlocks.txt
The single spin-lock primitives above are by no means the only ones. They
are the most safe ones, and the ones that work under all circumstances,
but partly because they are safe they are also fairly slow. They are slower
than they'd need to be, because they do have to disable interrupts
(which is just a single instruction on a x86, but it's an expensive one -
and on other architectures it can be worse).
That's written by Linus ...
Spin locks are slow; that assertion is not based on some article I read online by facebook, but upon the actual facts of the matter.
It's also, an incidental fact, that spinlocks are deployed at levels higher than the kernel because of the very problems you speak of; untraceable deadlocks because of a bad implementation.
They are used by the kernel efficiently, because that's where they were designed to be used, locking tiny tiny tiny sections, not sitting around and waiting for you to copy your amazon soap responses into apc and back out a billion times a second.
The most suitable kind of locking (for the web, not the kernel) available in APC is definitely rwlocks, you have to enable rwlocks with a configure option in legacy APC and it is the default in APCu.
The best advice that can be given, and I already gave it, is don't use spinlocks, if mutex are causing your stack to deadlock then try rwlocks.
Before I continue, your main problem is you are using a version of PHP from antiquity, which nobody even remembers how to support, in general you should look to upgrade, I'm aware of the constraints on the OP, but it would be irresponsible to negate to mention that this is a real problem, you do not want to deploy on unsupported software. Additionally APC is all but unmaintained, it is destined to die. O+ and APCu are it's replacement in modern versions of PHP.
Anyway, I digress ...
Synchronization is a headache when you are programming at the level of the kernel, with spinlocks, or whatever. When you are several layers removed from the kernel, when you rely on 6 or 7 bits of complicated software underneath you synchronizing properly in order that your code can synchronize properly synchronization becomes, not only a headache for the programmer, but for the executor too; it can easily become the bottleneck of your shiny web application even if there are no bugs in your implementation.
Happily, this is the year 2013, and Yahoo aren't the only people able to implement user caches in PHP :)
http://pecl.php.net/package/yac
This is an, extremely clever, lockless cache for userland PHP, it's marked as experimental, but once you are finished, have a play with it, maybe in another 7 years we won't be thinking about synchronization issues :)
I hope you get to the bottom of it :)
Unless you are on a freebsd derived operating system it is not a good idea to use spinlocks, they are the worst kind of synchronization on the face of the earth. The only reason you must use them in freebsd is because the implementer negated to include PTHREAD_PROCESS_SHARED support for mutex and rwlocks, so you have little choice but to use the pg-sql inspired spin lock in that case.
I'm writing some testing code on a Drupal 6 project, and I can't believe how slow these tests seem to be running, after working with other languages and frameworks like Ruby on Rails or Django.
Drupal.org thinks this question is spam, and won't give me a way to prove I'm human, so I figured SO is the next base place to ask a question like this, and get a sanity check on my approach to testing.
The following test code in this gist is relatively trivial.
http://gist.github.com/498656
In short I am:
creating a couple of content types,
create some roles,
creating users,
creating content as the users,
checking if the content can be edited by them
checking if it's visible to anonymous users
And here's the output when I run these tests from the command line:
Drupal test run
---------------
Tests to be run:
- (ClientProjectTestCase)
Test run started: Thu, 29/07/2010 - 19:29
Test summary:
-------------
ClientProject feature 52 passes, 0 fails, and 0 exceptions
Test run duration: 2 min 9 sec
I'm trying to run tests like this before I push code to a central repo everytime, but if it's taking this long this early on the project, I dread to think about it further down the line when we have ever more test cases.
What can I do to speed this up?
I'm using a MacbookPro with:
4gb of ram,
2.2ghz Core 2 Duo processor,
PHP 5.2,
Apache 2.2.14, without any opcode caching,
Mysql 5.1.42 (Innodb tables are my default)
A 5400 RPM laptop hard drive
I understand that in the examples above I'm bootstrapping Drupal each time, and this is a very expensive operation, but this isn't unheard with other frameworks, like Ruby on Rails, or Django, and I don't understand why it's averaging out at a little over a minute per testcase on this project.
There's a decent list of tricks here for speeding up Drupal 7, many of which look like they'd apply to Drupal 6 as well, but I haven't yet had a chance to try them yet, and it would be great to hear how these have worked out for others by I blunder down further blind alleys,
What has worked for you when you've been working with Drupal 6 in this situation, and where are the quick wins for this?
One minute per test case when I'm expecting to easily more than a hundred test cases feels insane.
It looks like the biggest increase in speed will come from running the test database in a ram disk, based on this post here on Performance tuning tips for Drupal 7 testing on qa.drupal.org
DamZ wrote a modified mysql init.d script for /etc/init.d/mysql on Debian 5 that runs MySQL databases entirely out of tmpfs. It's at http://drupal.org/files/mysql-tmpfs.txt, attached to http://drupal.org/node/466972.
It allowed the dual quad core machine donated to move from a 50 minute test and huge disk I/O with InnoDB to somewhere under 3 minutes per test. It's live as #32 on PIFR v1 for testing.d.o right now. It is certainly the only way to go.
I have not and won't be trying it on InnoDB anytime soon if anyone wants to omit the step on skip-innodb below and try it on tmpfs.
Also there some instructions here for creating a ram disk on OS X, although this is for moving your entire stock of mysql databases into a ram disk, instead of just a single database:
Update - I've tried this approach now with OS X, and documented what I've found
I've been able to cut 30-50% from the test times by switching to a ram disk. Here are the steps I've taken:
Create a ram disk
I've chosen a gigabyte mainly because I've got 4gb of RAM, and I'm not sure how much space I might need, so I'm playing it safe:
diskutil erasevolume HFS+ "ramdisk" `hdiutil attach -nomount ram://2048000`
Setup mysql
Next I ran the mysql install script to get mysql installed on the new ramdisk
/usr/local/mysql/scripts/mysql_install_db \
--basedir=/usr/local/mysql \
--datadir=/Volumes/ramdisk
Then, I took the following steps: I made sure the previous mysqld was no longer running, and then started the mysql daemon, making sure we tell it to use ram disk as our data directory, rather than the default location.
/usr/local/mysql/bin/mysqld \
--basedir=/usr/local/mysql \
--datadir=/Volumes/ramdisk \
--log-error=/Volumes/ramdisk/mysql.ramdisk.err \
--pid-file=/Volumes/ramdisk/mysql.ramdisk.pid \
--port=3306 \
--socket=/tmp/mysql_ram.sock
Add the database for testing
I then pulled down the latest database dump on our staging site with drush, before updating where settings.php points to it:
drush sql-dump > staging.project.database.dump.sql
Next was to get this data into the local testing setup on the ram disk. This involved creating a symlink to the ramdisk database socket, and creating the database, granting rights to the mysql user specified in the drupal installation, then loading the database in to start running tests. Step by step:
Creating the symlink - this because the mysql command by default looks for /tmp/mysql.sock, and symlinking that to our short term ram disk was simpler than constantly changing php.ini files
ln -s /tmp/mysql_ram.sock /tmp/mysql.sock
Creating the database (from the comamnd line at the mysql prompt)
CREATE DATABASE project_name;
GRANT ALL PRIVILEGES ON project_name.* to db_user#localhost IDENTIFIED BY 'db_password';
Loading the content into the new database...
mysql project_database < staging.project.database.dump.sql
Run the tests on the command line
...and finally running the test from the command line, and using growlnotify to tell me when tests have finished
php ./scripts/run-tests.sh --verbose --class ClientFeatureTestCase testFeaturesCreateNewsItem ; growlnotify -w -m "Tests have finished."
Two test cases takes around a minute and half still, is still unusably slow - orders of magnitude slower than other frameworks I might have used before.
What am I doing wrong here?
This can't be the standard way of running tests with Drupal, but I haven't been able to find any stats on how long I should expect a test suite to take with Drupal, to tell me otherwise,
The biggest issue with Drupal SimpleTests is it takes a long time to install Drupal, and that's done for every test case.
So use simpletest_clone -- basically, dump your database fresh after installation and it lets you use that dump as the starting point for each test case rather than running the entire installer.
I feel your pain, and your observations are spot on. A suite that takes minutes to run is a suite that inhibits TDD. I've resorted to plain PHPUnit tests run on the command line which run as fast as you'd expect coming from a Rails environment. The real key is to get away from hitting the database at all; use mocks and stubs.
We run a medium-size site that gets a few hundred thousand pageviews a day. Up until last weekend we ran with a load usually below 0.2 on a virtual machine. The OS is Ubuntu.
When deploying the latest version of our application, we also did an apt-get dist-upgrade before deploying. After we had deployed we noticed that the load on the CPU had spiked dramatically (sometimes reaching 10 and stopping to respond to page requests).
We tried dumping a full minute of Xdebug profiling data from PHP, but looking through it revealed only a few somewhat slow parts, but nothing to explain the huge jump.
We are now pretty sure that nothing in the new version of our website is triggering the problem, but we have no way to be sure. We have rolled back a lot of the changes, but the problem still persists.
When look at processes, we see that single Apache processes use quite a bit of CPU over a longer period of time than strictly necessary. However, when using strace on the affected process, we never see anything but
accept(3,
and it hangs for a while before receiving a new connection, so we can't actually see what is causing the problem.
The stack is PHP 5, Apache 2 (prefork), MySQL 5.1. Most things run through Memcached. We've tried APC and eAccelerator.
So, what should be our next step? Are there any profiling methods we overlooked/don't know about?
The answer ended up being not-Apache related. As mentioned, we were on a virtual machine. Our user sessions are pretty big (think 500kB per active user), so we had a lot of disk IO. The disk was nearly full, meaning that Ubuntu spent a lot of time moving things around (or so we think). There was no easy way to extend the disk (because it was not set up properly for VMWare). This completely killed performance, and Apache and MySQL would occasionally use 100% CPU (for a very short time), and the system would be so slow to update the CPU usage meters that it seemed to be stuck there.
We ended up setting up a new VM (which also gave us the opportunity to thoroughly document everything on the server). On the new VM we allocated plenty of disk space, and moved sessions into memory (using memcached). Our load dropped to 0.2 on off-peak use and around 1 near peak use (on a 2-CPU VM). Moving the sessions into memcached took a lot of disk IO away (we were constantly using about 2MB/s of disk IO, which is very bad).
Conclusion; sometimes you just have to start over... :)
Seeing an accept() call from your Apache process isn't at all unusual - that's the webserver waiting for a new request.
First of all, you want to establish what the parameters of the load are. Something like
vmstat 1
will show you what your system is up to. Look in the 'swap' and 'io' columns. If you see anything other than '0' in the 'si' and 'so' columns, your system is swapping because of a low memory condition. Consider reducing the number of running Apache children, or throwing more RAM in your server.
If RAM isn't an issue, look at the 'cpu' columns. You're interested in the 'us' and 'sy' columns. These show you the percentage of CPU time spent in either user processes or system. A high 'us' number points the finger at Apache or your scripts - or potentially something else on the server.
Running
top
will show you which processes are the most active.
Have you ruled out your database? The most common cause of unexpectedly high load I've seen on production LAMP stacks come down to database queries. You may have deployed new code with an expensive query in it; or got to the point where there are enough rows in your dataset to cause previously cheap queries to become expensive.
During periods of high load, do
echo "show full processlist" | mysql | grep -v Sleep
to see if there are either long-running queries, or huge numbers of the same query operating at once. Other mysql tools will help you optimise these.
You may find it useful to configure and use mod_status for Apache, which will allow you to see what request each Apache child is serving and for how long it has been doing so.
Finally, get some long-term statistical monitoring set up. Something like zabbix is straightforward to configure, and will let you monitor resource usage over time, such that if things get slow, you have historical baselines to compare against, and a better ieda of when problems started.
Perhaps you where using worker MPM before and now you are not?
I know PHP5 does not work with the Worker MPM. On my Ubuntu server, PHP5 can only be installed with the Prefork MPM. It seems that PHP5 module is not compatible with multithreading version of Apache.
I found a link here that will show you how to get better performance with mod_fcgid
To see what worker MPM is see here.
I'd use dTrace to solve this mystery... if it was running on Solaris or Mac... but since Linux doesn't have it you might want to try their Systemtap, however I can't say anything about its usability since I haven't used it.
With dTrace you could easily sniff out the culprits within a day, and would hope with Systemtap it would be similiar
Another option that I can't assure you will do any good, but it's more than worth the effort. Is to read the detailed changelog for the new version, and review what might have changed that could remotely affect you.
Going through the changelogs has saved me more than once. Especially when some config options have changed and when something got deprecated. Worst case is it'll give you some clues as to where to look next