Speeding up PHP continuous integration build server on Hudson CI

Speeding up PHP continuous integration build server on Hudson CI - php

I'm trying to speed up my builds some and was looking for some thoughts on how to do so. I currently use Hudson as a continuous integration server for a PHP project.
I use an Ant build.xml file to do the build, using a file similar to Sebastian Bergmann's php-hudson-template. At the moment, though (due to some weird problems with Hudson crashing otherwise), I'm only running phpDocumentor, phpcpd, and phpUnit. phpUnit does generate Clover code-coverage reports, too.
Here are some possible bottlenecks:
phpDocumentor: Takes 180 seconds. There are some large included libraries in my project, such as awsninja, DirectedEdge, oauthsimple, and phpMailer. I'm not sure that I really need to be developing documentation for these. I'm also not sure how to ignore whole subdirectories using my build.xml file.
phpUnit: Takes 120 seconds. This is the only portion of the build that's not run as a parallelTask. The more tests that get written, the longer this time will increase. Really not sure what to do about this, aside from maybe running multiple Hudson build slaves and doling out separate test suites to each slave. But I also have no idea how to go about that, either.
phpcpd: Takes 97 seconds. I'm sure that I can eliminate some parsing and conversion time by ignoring those included libraries. Not sure how to do this in my build.xml file.
My server: Right now I'm using a single Linode server. It seems to get pretty taxed by the whole process.
Any other possible bottlenecks you can think of I'll add to the list.
What are some solutions for reducing my build time?

I'm not a PHP expert at all, but you ought to be able to split your PHPUnit tests onto multiple Hudson slaves if you need to. I would just split your test suite up and run each subset as a separate, parallel Hudson job. If you have a machine with multiple CPUs / cores you can run multiple slaves on it.
One obvious thing you didn't mention - how about just upgrading your hardware, or taking a look at what else is running on the Hudson host and possibly taking up resources ?

phpDocumenter: phpdoc -h reveals the -i option which allows you to specify a comma separated list of files/directories to ignore. This can be added to the arguments tag of your phpdoc build.xml tag
phpUnit: I noticed it can be laggy if I am running tests against a database, but I am not aware of anyway to improve this.
One possible thing that might help would be to not run documenter every time and only run it as part of a build that only happens once a day (or something similar)
I just recently started using these tools and these are few things I discovered.

When we had a similar problem, we resorted to running the documentation in a separate overnight build (along with our functional test scripts in Selenium, as this is also pretty slow). This way, our main CI build wasn't slowed down by generating our API documentation.
However, I note that PHP Documentor has now been updated to version 2, which has significant speed improvements over the slow old version 1. It looks like it's in the region of two to three times faster than v1. This will make a big difference to your CI process. See http://phpdoc.org/ for more info.
Alternatively, you could take a look at apiGen and phpDox, both of which are alternatives to PHPDoc. They are both definitely faster than PHPDoc v1; I haven't compared them with v2 yet.

Related

How to build a complete, cross-platform and full automated test-suite for PHP/Ajax applications?

I'm trying to build a full test suite for Joomla and others. There are some docs around but they're quite limited and I'm wondering somebody did wrote already some scripts. By 'full' I do mean really a complete testing :
1. installing an extension on Windows, Max, Linux (Vagrant, VM)
2. configuring the extensions with the CMS option panels
3. doing things in a full Ajax application
I guess I'll end up with lots of bash-scripts, triggering other scripts within a virtual setup, right ?
I must admit, I am not really familiar with all this testing frameworks and products and I'd be already very happy to get pointed to anything. Doing pure unit tests doesn't seem enough given the nature of such systems (namespace collision, interfering plugins,...)
Thank you for any hints

I am not familiar with wordpress and joomla but those are just PHP code, so using PHPUnit can be suitable. With PHPUnit you are not only doing unit testing but you can also do other kinds of tests (it depends on how much time you are willing to spend on testing but I would say that it can cover pretty much any aspects)
As for front-end testing, there are several choices
Using selenium to write tests
Using one of the many headless testing frameworks out there (for instance, casperjs in Javascript, Watir for Ruby etc...)
For setting up virtual environment, apart from vagrant you can check docker

There are many programs you can use to do the normal interaction tests; if you want to handle logins, some scripting will be necessary to grab the token.
But for a full integration test including ajax I guess you'd be better off with a headless browser, take a look at this Real headless browser where the same requirement is discussed in detail

Parallel PHPUnit testing in integration tests

As the time needed for run complete PHPUnit suite raises, our team starts wondering if there is a possibility to run Unit tests in parallel. Recently I read an article about Paraunit, also Sebastian Bergman wrote, he'll add parallelism into PHPUnit 3.7.
But there remains the problem with integration tests, or, more generally, tests that interact with DB. For the sake of consistency, the testDB has to be resetted and fixtures loaded after each test. But in parallel tests there is a problem with race conditions, because all processes use the same DB.
So to be able to run integration tests in parallel, we have to assign own database to each process. I would like to ask, if someone has some thoughts about how this problem can be solved. Maybe there are already implemented solutions to this problem in another xUnit implementation.
In my team we are using MongoDB, so one solution would be to programmatically create a config file for each PHPUnit process, with generated DB name(for this process), and in setUp() method we could clone the main TestDb into this temporary one. But before we start to implement this approach I would like to ask for your ideas about the topic.

This is a good question: preparing for parallel unit tests is going to require learning some new Best Practices, and I suspect some of them are going to slow our tests down.
At the highest level, the advice is: avoid testing with a database wherever possible. Abstract all interactions with your database, and then mock that class. But you've already noted your question is about integration tests, where this is not possible.
When using PDO, I generally use sqlite::memory: Each test gets its own database. It is anonymous and automatically cleaned up when the test ends. (But I noted some problems with this when you real application is not using sqlite: Suggestions to avoid DB deps when using an in-memory sqlite DB to speed up unit tests )
When using a database that does not have an in-memory choice, create the database with a random name. If the parallelization is at the PHPUnit process level, quite coarse, you could use the process pid. But that has no real advantages over a random name. (I know PHP is single-threaded, but perhaps in future we would have a custom phpUnit module, that uses threads to run tests in parallel; we might as well be ready for that.)
If you have the xUnit Test Patterns book, chapter 13 is about testing databases (relatively short). Chapters 8 and 9 on transient vs. persistent fixtures are useful too. And, of course, most of the book is on abstraction layers to make mocking easier :-)

There is also this awesome library (fastest) for executing tests in parallel. It is optimized for functional/integration tests, giving an easy way to work with N databases in parallel.
Our old codebase run in 30 minutes, now in 7 minutes with 4 Processors.
Features
Functional tests could use a database per processor using the
environment variable.
Tests are randomized by default.
Is not coupled with PhpUnit you could run any command.
Is developed in PHP with no dependencies.
As input you could use a phpunit.xml.dist file or use pipe.
Includes a Behat extension to easily pipe scenarios into fastest.
Increase Verbosity with -v option.
Usage
find tests/ -name "*Test.php" | ./bin/fastest "bin/phpunit -c app {};"

But there remains the problem with integration tests, or, more
generally, tests that interact with DB. For the sake of consistency,
the testDB has to be resetted and fixtures loaded after each test. But
in parallel tests there is a problem with race conditions, because all
processes use the same DB.
So to be able to run integration tests in parallel, we have to assign
own database to each process. I would like to ask, if someone has some
thoughts about how this problem can be solved. Maybe there are already
implemented solutions to this problem in another xUnit implementation.
You can avoid integration test conflicts 2 ways:
running only those tests parallel, which uses very different tables of your database, so they don't conflict
create a new database for conflicting tests
Ofc. you can combine these 2 solutions. I don't know about any phpunit test runner which supports any of these approaches, so I think you have to write your own test runner to speed up the process... Btw you can still group your integration tests, and run only a few of them at once, if you are using them by development...
Be aware, that the same conflicts can cause concurrency issues under heavy loading in PHP. For example if you lock 2 files in reverse order under 2 separate controller action, then your application can end up in a deadlock... I am seeking a way to test concurrency issues in PHP, but no luck so far. I don't have time currently to write my own solution, and I am not sure I can manage it, it's pretty hard stuff... :S

In case that your application is coupled with a specific vendor eg. postgresql you can create separate stacks with docker and docker-compose. Then group together tests by purpoce eg. model tests, controller tests etc etc.
For each group deploy in your pipeline a specific stack using docker-compos and run the tests via docker. The idea is to have seperate environment with seperate databases hence you avoid the confict.

What does a good Phing workflow look like?

I'm trying to get into the mindset of CI and have been playing with Phing this weekend. It all seems straight forward enough to use and have many examples going already.
However something that still puzzles me is how people actually use it. That is, I'm not looking for what tests you do, but instead a suggested work flow using Phing, at what stage do you activate it, at what stage in the development cycle is it actioned.
For example, we have several websites, currently we edit the source locally and on save upload to the live site (I know how bad this is...), we do some quick testing and make sure the code works as planned. If so we commit to the repos and carry on. If not, we can roll back or edit undo and resave. Whilst this now seems crazy the simplicity has worked well for us.
We now have a small team however, so I'm trying to push Phing into this process, to get all the added benefits of the linting/sniffing/mess detecting e.t.c. however I can't figure the best order of events.
Would you suggest :
Edit the code locally.
On save upload the file to a test site.
Test the site on the staging server.
All being well, commit the changes to the repos. Then run phing.
Assess output of Phing, updating code as necessary, re-save, re-commit, re-run phing.
Assuming Phing passes, as I'm running phing on another server, do a svn export and start the deployment process.
The above seems a bit long winded to me. Is it because it looks like I'm trying to merge a test deployment with a live deployment that's confusing me?
Also it would seem a bit backwards to commit, then run Phing, then have to edit and possibly re-commit before trying again.
Hence would it make more sense to :
Edit the code locally.
Save, action a test deployment build with Phing.
Make sure the code has passed all the code checks e.t.c.
Using Phing make sure code is copied to the staging server.
Test the site on the staging server.
All being well, commit the changes to the repos.
Then live deployment build with Phing.
The problem with the above is, let's say I just wanted to correct the spelling of a word hard-coded into an HTML page, seems overkill?
Finally, how do people set up their servers, do you have one server for the live site, one for the staging, and one to host Phing (and any CI software on)?

The point of tools like Phing is automation. So, to answer this "what stage do you activate it, at what stage in the development cycle is it actioned", I would say you should bring it in as soon as you feel you will gain a benefit from using it.
For example, if you have a process which takes multiple commands to complete, there would be a benefit of using Phing (or even just a shell script) to automate the steps, especially if there's more than one person who needs to do it or it's especially error-prone.
So going with that, you should use phing to make your life easier, not harder. Any task which involves more than one shell command, or always involves typing the same command with lots of hard to remember parameters, is usually something you could/should use phing to automate.
Considering the first list of steps you mention, it is indeed a bit longwinded. Instead, you should use phing earlier, to automate the steps:
If you have automated tests, use phing to run them
Commit to source control
Use phing to deploy the code
Use phing to do the svn export or any other manual steps
So basically I would do pretty much what you suggested in your second list.
You probably should make phing commands for the separate steps, and make phing commands for running commonly run together commands in one shot (eg. tests and then deploy).
This would allow you to skip phases if you feel it's necessary, such as in the example you gave about changing just some text.
A typical approach to servers is having the live site on its own server, and then having the staging/testing server mirror it as closely as possible. For CI or other utilities, you can usually host them on the staging server, proven they do not interfere with the main application you are developing.

Performance logging for a live test?

We are completing our application, written in PHP/MySQL (+ memcached) and we are going to start, next weekend, a live test for one night (it's a sort of "social" application).
We will, of course, monitor error log files to make sure everything went fine. But we would like also to keep logs of the performance of the application: for example, determine if a script ran too slow, and more in details how long it took for functions/methods to run, for MySQL queries to be executed and compare that with data obtained (and "un-jsoned") from memcached.
This is the first time I'm doing something like this: however, I believe it's fundamental because we need to make sure the application will scale correctly when our customers will start using it in 10-15 days. (Up)scaling will not be a big issue, since we are using cloud servers (we will start with a single instance with 256 MB of RAM, provided by a well-known company), but we would like to be sure resources are used efficiently.
Do you have any suggestion for this monitoring? Good practices? Good articles to read?
Also, when the test is over, should we continue to monitor performances on the live environment? Even if not on all requests, but just on a sample?
PS: Is it a good idea to log on a file all MySQL queries and the time they took to run?

I usually unit test my work so I can make sure its running at a satisfactory speed, giving the right results, etc.
Once I've finished unit testing I will stress test it, I run the most commonly used functions through an insane loop from my local machine and an instance I have set up for brutally testing my applications and since my tests are written in PHP I can log anything i like;
Such as performance, but I would never use it live I'd just be sure I'm confident with what I have written
[EDIT]
Do you use Zend Studio? (that's the PHP IDE I use) because it has built in unit testing, you can get a free trial here which when it ends is still very functional, I have the paid version so I'm not sure if unit testing is still viable but its well worth its buck!
Here's a link that introduces unit testing which is really great and you can grab the trial of Zend Studio here
here are a few more links for unit testing just pulled from Google.
lastcraft.com
List of unit testing
Another Stack Overflow post on unit testing PHP
Google results page

you could look at installing something like this on the machine(s):
http://en.wikipedia.org/wiki/Cacti_%28software%29
It's always handy to have current and historical information about the performance of you system (CPU\Mem\Bandwidth).

What's the difference between Phing and PHPUnderControl?

We currently use a hand-rolled setup and configuration script and a hand-rolled continuous integration script to build and deploy our application. I am looking at formalizing this somewhat with a third party system designed for these purposes.
I have looked into Phing before, and I get that it's basically like Ant. But, my Ant experience is somewhat limited so that doesn't help me much. (Most of the Java work I have done was just deployed as a jar file).
I have looked into Cruise Control before, and I understand that phpUnderControl is a plug-in for CC. But, Phing says it also works with CC. So I am not clear on the overlap here. Do I need both Phing and phpUnderControl to work with CruiseControl, or are they mutually exlclusive?
What I need exactly is something that can:
Check out source from SVN
Install the database from SQL file
Generate some local configuration files from a series of templates and an ini file
Run all of our unit tests (currently ST, but easy to convert to PHPUnit) and send an email to the dev team if any tests break (with a stack trace of course)
Generate API documentation for the application and put it somewhere
Run a test coverage report
Now, we have just about all of this in one form or another. But, it'd be nice to have it all automated and bundled together in one process.

phing is pretty much ant written in PHP where phpUnderControl adds support for PHP projects to CruiseControl and uses phing or ant on the backend to parse the build.xml file and run commands.
I just set up CruiseControl and phpUnderControl and it's been working great. It checks out my SVN, runs it through phpDocumentor, PHP_CodeSniffer, and PHPUnit whenever we do a check in. Since it's all based off of the build.xml file you can run just about any software you want through it.

I'm sure lots of people will say this by the time I've typed this but...
I know it's not PHP but we're finding Capistrano just the job for this kind of thing. It really is an excellent piece of software.

We've been using Phing, and the cost to set it up has been very low; it's really easy to learn even if you don't know ANT. I've had very bad experiences with CruiseControl (instability - going down randomly) - so I like the simplicity of Phing. Plus, it's easily extensible using PHP (in case you have a custom task that they don't support out of the box).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.