As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are the best methods for tracking and/or automating DB schema changes? Our team uses Subversion for version control and we've been able to automate some of our tasks this way (pushing builds up to a staging server, deploying tested code to a production server) but we're still doing database updates manually. I would like to find or create a solution that allows us to work efficiently across servers with different environments while continuing to use Subversion as a backend through which code and DB updates are pushed around to various servers.
Many popular software packages include auto-update scripts which detect DB version and apply the necessary changes. Is this the best way to do this even on a larger scale (across multiple projects and sometimes multiple environments and languages)? If so, is there any existing code out there that simplifies the process or is it best just to roll our own solution? Has anyone implemented something similar before and integrated it into Subversion post-commit hooks, or is this a bad idea?
While a solution that supports multiple platforms would be preferable, we definitely need to support the Linux/Apache/MySQL/PHP stack as the majority of our work is on that platform.
In the Rails world, there's the concept of migrations, scripts in which changes to the database are made in Ruby rather than a database-specific flavour of SQL. Your Ruby migration code ends up being converted into the DDL specific to your current database; this makes switching database platforms very easy.
For every change you make to the database, you write a new migration. Migrations typically have two methods: an "up" method in which the changes are applied and a "down" method in which the changes are undone. A single command brings the database up to date, and can also be used to bring the database to a specific version of the schema. In Rails, migrations are kept in their own directory in the project directory and get checked into version control just like any other project code.
This Oracle guide to Rails migrations covers migrations quite well.
Developers using other languages have looked at migrations and have implemented their own language-specific versions. I know of Ruckusing, a PHP migrations system that is modelled after Rails' migrations; it might be what you're looking for.
We use something similar to bcwoord to keep our database schemata synchronized across 5 different installations (production, staging and a few development installations), and backed up in version control, and it works pretty well. I'll elaborate a bit:
To synchronize the database structure, we have a single script, update.php, and a number of files numbered 1.sql, 2.sql, 3.sql, etc. The script uses one extra table to store the current version number of the database. The N.sql files are crafted by hand, to go from version (N-1) to version N of the database.
They can be used to add tables, add columns, migrate data from an old to a new column format then drop the column, insert "master" data rows such as user types, etc. Basically, it can do anything, and with proper data migration scripts you'll never lose data.
The update script works like this:
Connect to the database.
Make a backup of the current database (because stuff will go wrong) [mysqldump].
Create bookkeeping table (called _meta) if it doesn't exist.
Read current VERSION from _meta table. Assume 0 if not found.
For all .sql files numbered higher than VERSION, execute them in order
If one of the files produced an error: roll back to the backup
Otherwise, update the version in the bookkeeping table to the highest .sql file executed.
Everything goes into source control, and every installation has a script to update to the latest version with a single script execution (calling update.php with the proper database password etc.). We SVN update staging and production environments via a script that automatically calls the database update script, so a code update comes with the necessary database updates.
We can also use the same script to recreate the entire database from scratch; we just drop and recreate the database, then run the script which will completely repopulate the database. We can also use the script to populate an empty database for automated testing.
It took only a few hours to set up this system, it's conceptually simple and everyone gets the version numbering scheme, and it has been invaluable in having the ability to move forward and evolving the database design, without having to communicate or manually execute the modifications on all databases.
Beware when pasting queries from phpMyAdmin though! Those generated queries usually include the database name, which you definitely don't want since it will break your scripts! Something like CREATE TABLE mydb.newtable(...) will fail if the database on the system is not called mydb. We created a pre-comment SVN hook that will disallow .sql files containing the mydb string, which is a sure sign that someone copy/pasted from phpMyAdmin without proper checking.
My team scripts out all database changes, and commits those scripts to SVN, along with each release of the application. This allows for incremental changes of the database, without losing any data.
To go from one release to the next, you just need to run the set of change scripts, and your database is up-to-date, and you've still got all your data. It may not be the easiest method, but it definitely is effective.
The issue here is really making it easy for developers to script their own local changes into source control to share with the team. I've faced this problem for many years, and was inspired by the functionality of Visual Studio for Database professionals. If you want an open-source tool with the same features, try this: http://dbsourcetools.codeplex.com/
Have fun,
- Nathan.
If you are still looking for solutions : we are proposing a tool called neXtep designer. It is a database development environment with which you can put your whole database under version control. You work on a version controlled repository where every change can be tracked.
When you need to release an update, you can commit your components and the product will automatically generate the SQL upgrade script from the previous version. Of course, you can generate this SQL from any 2 versions.
Then you have many options : you can take those scripts and put them in your SVN with your app code so that it'll be deployed by your existing mechanism. Another option is to use the delivery mechanism of neXtep : scripts are exported in something called a "delivery package" (SQL scripts + XML descriptor), and an installer can understand this package and deploy it to a target server while ensuring structural consistency, dependency check, registering installed version, etc.
The product is GPL and is based on Eclipse so it runs on Linux, Mac and windows. It also support Oracle, MySQL and PostgreSQL at the moment (DB2 support is on the way). Have a look at the wiki where you will find more detailed information :
http://www.nextep-softwares.com/wiki
Scott Ambler produces a great series of articles (and co-authored a book) on database refactoring, with the idea that you should essentially apply TDD principles and practices to maintaining your schema. You set up a series of structure and seed data unit tests for the database. Then, before you change anything, you modify/write tests to reflect that change.
We have been doing this for a while now and it seems to work. We wrote code to generate basic column name and datatype checks in a unit testing suite. We can rerun those tests anytime to verify that the database in the SVN checkout matches the live db the application is actually running.
As it turns out, developers also sometimes tweak their sandbox database and neglect to update the schema file in SVN. The code then depends on a db change that hasn't been checked in. That sort of bug can be maddeningly hard to pin down, but the test suite will pick it up right away. This is particularly nice if you have it built into a larger Continuous Integration plan.
Dump your schema into a file and add it to source control. Then a simple diff will show you what changed.
K. Scott Allen has a decent article or two on schema versioning, which uses the incremental update scripts/migrations concept referenced in other answers here; see http://odetocode.com/Blogs/scott/archive/2008/01/31/11710.aspx.
If you are using C#, have a look at Subsonic, a very useful ORM tool, but is also generates sql script to recreated your scheme and\or data. These scripts can then be put into source control.
http://subsonicproject.com/
I've used the following database project structure in Visual Studio for several projects and it's worked pretty well:
Database
Change Scripts
0.PreDeploy.sql
1.SchemaChanges.sql
2.DataChanges.sql
3.Permissions.sql
Create Scripts
Sprocs
Functions
Views
Our build system then updates the database from one version to the next by executing the scripts in the following order:
1.PreDeploy.sql
2.SchemaChanges.sql
Contents of Create Scripts folder
2.DataChanges.sql
3.Permissions.sql
Each developer checks in their changes for a particular bug/feature by appending their code onto the end of each file. Once a major version is complete and branched in source control, the contents of the .sql files in the Change Scripts folder are deleted.
We use a very simple but yet effective solution.
For new installs, we have a metadata.sql file in the repository which holds all the DB schema, then in the build process we use this file to generate the database.
For updates, we add the updates in the software hardcoded. We keep it hardcoded because we don't like solving problems before it really IS a problem, and this kind of thing didn't prove to be a problem so far.
So in our software we have something like this:
RegisterUpgrade(1, 'ALTER TABLE XX ADD XY CHAR(1) NOT NULL;');
This code will check if the database is in version 1 (which is stored in a table created automatically), if it is outdated, then the command is executed.
To update the metadata.sql in the repository, we run this upgrades locally and then extract the full database metadata.
The only thing that happens every so often, is to forget commiting the metadata.sql, but this isn't a major problem because its easy to test on the build process and also the only thing that could happen is to make a new install with an outdated database and upgraded it on the first use.
Also we don't support downgrades, but it is by design, if something breaks on an update, we restored the previous version and fix the update before trying again.
It's kind of low tech, and there might be a better solution out there, but you could just store your schema in an SQL script which can be run to create the database. I think you can execute a command to generate this script, but I don't know the command unfortunately.
Then, commit the script into source control along with the code that works on it. When you need to change the schema along with the code, the script can be checked in along with the code that requires the changed schema. Then, diffs on the script will indicate diffs on schema changes.
With this script, you could integrate it with DBUnit or some kind of build script, so it seems it could fit in with your already automated processes.
I create folders named after the build versions and put upgrade and downgrade scripts in there. For example, you could have the following folders: 1.0.0, 1.0.1 and 1.0.2. Each one contains the script that allows you to upgrade or downgrade your database between versions.
Should a client or customer call you with a problem with version 1.0.1 and you are using 1.0.2, bringing the database back to his version will not be a problem.
In your database, create a table called "schema" where you put in the current version of the database. Then writing a program that can upgrade or downgrade your database for you is easy.
Just like Joey said, if you are in a Rails world, use Migrations. :)
For my current PHP project we use the idea of rails migrations and we have a migrations directory in which we keep files title "migration_XX.sql" where XX is the number of the migration. Currently these files are created by hand as updates are made, but their creation could be easily modified.
Then we have a script called "Migration_watcher" which, as we are in pre-alpha, currently runs on every page load and checks whether there is a new migration_XX.sql file where XX is larger than the current migration version. If so it runs all migration_XX.sql files up to the largest number against the database and voila! schema changes are automated.
If you require the ability to revert the system would require a lot of tweaking, but it's simple and has been working very well for our fairly small team thus far.
Toad for MySQL has a function called schema compare that allows you to synchronise 2 databases. It is the best tool I have used so far.
I like the way how Yii handles database migrations. A migration is basically a PHP script implementing CDbMigration. CDbMigration defines an up method that contains the migration logic. It is also possible to implement a down method to support reversal of the migration. Alternatively, safeUp or safeDown can be used to make sure that the migration is done in the context of a transaction.
Yii's command-line tool yiic contains support to create and execute migrations. Migrations can be applied or reversed, either one by one or in a batch. Creating a migration results in code for a PHP class implementing CDbMigration, uniquely named based on a timestamp and a migration name specified by the user. All migrations that have been previously applied to the database are stored in a migration table.
For more information see the Database Migration article from the manual.
Try db-deploy - mainly a Java tool but works with php as well.
http://dbdeploy.com/
http://davedevelopment.co.uk/2008/04/14/how-to-simple-database-migrations-with-phing-and-dbdeploy.html
I would recommend using Ant (cross platform) for the "scripting" side (since it can practically talk to any db out there via jdbc) and Subversion for the source repository.
Ant will allow you to "back up" your db to local files, before making changes.
backup existing db schema to file via Ant
version control to Subversion repository via Ant
send new sql statements to db via Ant
IMHO migrations do have a huge problem:
Upgrading from one version to another works fine, but doing a fresh install of a given version might take forever if you have hundreds of tables and a long history of changes (like we do).
Running the whole history of deltas since the baseline up to the current version (for hundreds of customers databases) might take a very long time.
There is a command-line mysql-diff tool that compares database schemas, where schema can be a live database or SQL script on disk. It is good for the most schema migration tasks.
Related
I am curious if there is a standard or open-source application that allows a small team of developers to share MySQL database update/modification scripts?
Right now all the developers have a VM with their own instance of a database, so there are no conflicts and each can have separate development environment. When one makes a DB change we add the SQL scripts to a SQL text file in SVN, which is then run by each dev in their own environment when necessary.
The issue that we are having is that when someone updates the file, the others run the script, and then we add additional changes. It gets very confusing and we get errors if there are ALTER table statements, etc.
We don't want to use DB replication because if one dev destroys their DB we don't want the others to be affected.
We use ExpressionEngine and I've noticed they use PHP to check/validate SQL updates, is that the direction we will need to go?
Anyone else deal with this issue? If so, what did you end up using?
A fairly simple solution is to have a directory, instead of a single file. Then each time a Dev makes a change, they add a "patch file" to the directory. Other developers can get their databases up to date by running any patches they haven't yet run.
This can even be automated by having a metadata table in the database to track which patches have been run and writing a script to run any that haven't.
Lorna Mitchell has blogged about some strategies to doing this:
http://www.lornajane.net/posts/2010/simple-database-patching-strategy
http://www.lornajane.net/posts/2012/taking-on-a-database-change-process
The comments are full or people recommending various tools to help with the process. Personally, I just have a fairly simple script and have no need for larger libraries, but your mileage may vary.
Perhaps what you want is migration support.
Then, you put the migration code in whatever CVS you use and each team member migrates (i.e. runs the migration script) on their box, and this syncs all databases.
The framework I use (yii) supports it but I'm pretty sure there are some standalone solutions if you don't want to have to bring the whole framework over.
I've been reading this site here and there and appears as though you guys have a wonderful community.
As for my background, I am a sophomore at university familiar with SQL, C++, Visual Basic, and some PHP. One of my school projects for the summer term involves building a web application that allows users to log in and schedule specific timeslots over the internet. Typically, I have been the only person working on a project, but in this case I will be part of a group. Since we're all relatively new to working as a team, I would like to set up source control for my group so we're not all working off a shared drive somewhere. Additionally, I would like to make sure that all of us are able to test our changes in some sort of development server that hosts an instance of our website.
My actual question is in regards to the toolset that we should use to achieve this. As a group, we are most familiar with PHP and MySQL so we'll end up using that for the code and database. I have used SVN in the past for my own personal use, but my group members aren't very familiar with source control. We'll probably stick with something simple like Excel for the project management and bug tracking side of things. Ideally, we would like the tools to be free and open source.
How as a group should we manage the construction of the actual application? Are there methods out there that I can use that will allow any one of us to move the files to our development machine and keep track of who did it so we don't end up overwriting each other's changes? If this is not possible, one of us will write some scripts to handle it - but I would like to avoid building basically a separate software application that will only be used to manage our project. Another issue I foresee will be updating the database running on the development machine. Are there any standardised methods that we can use to manage our SQL scripts among the four of us?
I do not expect a really long winded answer here (after all, this is our project!), but any helpful tips would be greatly appreciated. Once I return from holiday I am looking forward to getting started! Thanks!
I recommend your group use source control to synchronize your code. You can either setup your own server or just use a free provider such as github, Google code, or bitbucket.
If you do decide to use one of these sites, a nice feature is that they provide free issue tracking as well, so you can use that instead of Excel.
The best way to manage the SQL scripts is to break them out into separate files and place them under source control as well. You can either create .sql files, or use a tool to manage these changes - for example, have a look at Ruby on Rails' Migrations. This may take some effort to setup, but you'll thank yourself later if you are working on a project of any size...
Draw up a plan for how you would do it if it were just you.
Split the plan up into tasks that take around 3-4 hours to complete. Make sure each task has a measurable objective.
Divy out the tasks. Try to sort them if possible to maximize developer efficiency.
Teach them to use source control. Explain to them that they will use this (maybe not svn, but SOMETHING) in a few years, so they might as well learn how now. Additionally, this will help in every group project they do down the road.
Make a script for building and running your tests. Also script your deployment. This will ensure you have the same mechanism going to live as you do going to test, which increases the number of defects found in testing. (This is as opposed to letting them exist but not found in testing.)
You mentioned updating the development database. It would be entirely reasonable to dump the development database often with a refresh from live. You may want to make 3 environments. Development, staging, and production. The development database would contain fabricated test data. The staging database would have a copy of live (recent to within a few days maybe.) And of course live is live.
Excel works fine as a "bug database." Consider putting it in source control that you manipulate and commit. This will give you a good idea of what happened over time, and you can correct mistakes quicker.
As far as source/version control, I would recommend subversion. There are some GUI tools they might use, or even webDAV to access the SVN. This will allow users to edit files collaboratively and also give you details as to who edited what, when, and why... SVN will also do a pretty good job at merging files that happen to be saved at the same time.
It's not the easiest concept to wrap your head around, but its not very complicated once you get running.
I suggest having everyone read the first chapter from: http://svnbook.red-bean.com/en/1.5/
and they should have a good idea of what's happening.
I am also curious to see what people have to say about the database
How as a group should we manage the construction of the actual application? Are there methods out there that I can use that will allow any one of us to move the files to our development machine and keep track of who did it so we don't end up overwriting each other's changes?
It sounds like you're looking for build management. In the case of PHP, a true "build" is as simple as a collection of source files because the language is interpreted; there is no compilation.
It just so happens that I am one of the developers for BuildMaster, a tool which basically solves every problem you have listed in your question... and it also sounds like it would be free in your case under the Community Edition license. I'll try to address some of your individual pain points and how BuildMaster could be used as a solution.
Source Control
As suggested by others, you must use it. The trick when it comes to deployment is to set up some form of continuous integration so that every time someone checks in, a new "build" is created. In BuildMaster, you can set this up for any source control provider you want.
Issue/Bug Tracking
Excel will work, but it's not an optimal solution. There are plenty of free issue tracking tools you can use to manage your bugs and features. With BuildMaster, you can link your bugs and features list with the application by their release number so you could view them within the tool at any time. It can also modify issue statuses and add descriptions automatically if you want.
Deployments
Using BuildMaster, you can create automated deployment plans for your development environment, e.g.:
Get Latest Source Code
Create Artifact
Copy Files To Development Machine
Deploy Configuration Files
Update Database
The best part is, once you set these up for other environments (glowcoder's point #6), pushing all of your code and database updates is as simple as clicking a button.
Another issue I foresee will be updating the database running on the development machine. Are there any standardised methods that we can use to manage our SQL scripts among the four of us?
Database Updates
Not surprisingly, BuildMaster handles these as well by using the change scripts module. When a member of your team creates a script (e.g. ALTER TABLE ADD [Blah] INT NOT NULL) he can upload it into BuildMaster, then run it on any environment you have created.
The best part is that you can add a step in your automated deployment and never worry about it again. As Justin mentions, you can use .sql files for your object code (stored procedures, views, triggers, etc.) and have those executed on every build since they are essentially code anyway. You can keep those in source control.
Configuration Files
One aspect of all this you may have neglected (but will inevitably run into) is dealing with configuration files. With PHP, you may have an .htaccess file, a php.ini file, a prepend.php, or roll your own custom config file. Since by definition configuration files need to change between your personal machine and the development machine, grabbing them from source control wouldn't necessary work without some bit of hacking a la:
if (DEV) {
// do one thing
}
else if (PROD) {
// do another
}
With BuildMaster, you can templatize your configuration files and associate them with an environment so they can be deployed automatically. It will also maintain a history of changes for you.
Automated Testing
If you want the full ALM effect, you can automatically unit test your code during an automated build, and notify you if anything fails so you know as soon as possible that something is broken.
Apologies for the "long winded" response, but I feel like you're already ahead of the game by observing the problems you might run into in the future and really believe BuildMaster will make all of this deployment stuff simple for your team so you can focus on the fun part, coding!
So, I have a development and production environment that are accessing the same BitBucket repository, and changes I push to the repo, I pull down on the production server by using hg pull and hg update.
This keeps my PHP code all up to date and works fine.
But I could use some advice keeping my MySQL schemas in sync between the two environments. For example I quite often make changes on the development machine that I need to reflect on the production server.
Any advice on how to do this would be very gratefully received.
What you are trying to do, in a nutshell, is version your database schema so that it stays in line with the code as things change. The critical parts of being able to do that is to be able to track the changes to the DB schema, and also being able to track the current state of the DB schema (ie. what version it is at)
One way to track the changes to the schema would be to manually script all changes to the schema. These change scripts are essentially your "diffs" between versions of the schema. Another way to generate these change files would be to use a program that can generate a diff between two databases, or between a database and a create script. In theory, you should be able to develop a pre-commit hook script that can generate the alter script from the current database for that working copy and the previous database for that working copy, but this isn't a trivial task.
Once you have your DB being versioned, you now have to solve the problem of applying those changes on Update. To do this, you will need to develop a post-update hook that can look at the database (probably at some sort of Version table within it that links to the Mercurial changeset Id) and determine what scripts need to be run in order to get the DB up to date.
Since Mercurial allows you to update to a previous version, you will either have to only have non-breaking changes to your database, or simply not allow (in the social sense or the technical sense) the production working copy to be updated to previous versions. Regardless of how you handle it, the post-update hook script that is doing the actual DB updates probably needs to be smart enough to try to apply DB alter scripts that it has already applied.
There are obviously a number of issues to resolve and lots of testing to do to get this all to work, and it isn't a pre-built solution for you by any means, but it should get you well on your way to automating your DB updates to keep them in line with your code. Good luck!
Take a look at the Rails framework. They use database migrations to manipulate (even create) the database. It integrates great with testing and across development machines too (for teams). It's Ruby (which many find preferable to PHP) so it won't work for you unless you switch, but it might give you some ideas on how to implement this for your application.
http://guides.rubyonrails.org/migrations.html
Migrations are a convenient way for
you to alter your database in a
structured and organized manner. You
could edit fragments of SQL by hand
but you would then be responsible for
telling other developers that they
need to go and run them. You’d also
have to keep track of which changes
need to be run against the production
machines next time you deploy.
Active Record tracks which migrations
have already been run so all you have
to do is update your source and run
rake db:migrate. Active Record will
work out which migrations should be
run. It will also update your
db/schema.rb file to match the
structure of your database.
It is pretty standard practice now for desktop applications to be self-updating. On the Mac, every non-Apple program that uses Sparkle in my book is an instant win. For Windows developers, this has already been discussed at length. I have not yet found information on self-updating web applications, and I hope you can help.
I am building a web application that is meant to be installed like Wordpress or Drupal - unzip it in a directory, hit some install page, and it's ready to go. In order to have broad server compatibility, I've been asked to use PHP and MySQL -- is that **MP? In any event, it has to be broadly cross-platform. For context, this is basically a unified web messaging application for small businesses. It's not another CMS platform, think webmail.
I want to know about self-updating web applications. First of all, (1) is this a bad idea? As of Wordpress 2.7 the automatic update is a single button, which seems easy, and yet I can imagine so many ways this could go terribly, terribly wrong. Also, isn't the idea that the web files are writable by the web process a security hole?
(2) Is it worth the development time? There are probably millions of WP installs in the world, so it's probably worth the time it took the WP team to make it easy, saving millions of man hours worldwide. I can only imagine a few thousand installs of my software -- is building self-upgrade worth the time investment, or can I assume that users sophisticated enough to download and install web software in the first place could go through an upgrade checklist?
If it's not a security disaster or waste of time, then (3) I'm looking for suggestions from anyone who has done it before. Do you keep a version table in your database? How do you manage DB upgrades? What method do you use for rolling back a partial upgrade in the context of a self-updating web application? Did using an ORM layer make it easier or harder? Do you keep a delta of version changes or do you just blow out the whole thing every time?
I appreciate your thoughts on this.
Frankly, it really does depend on your userbase. There are tons of PHP applications that don't automatically upgrade themselves. Their users are either technical enough to handle the upgrade process, or just don't upgrade.
I purpose two steps:
1) Seriously ask yourself what your users are likely to really need. Will self-updating provide enough of a boost to adoption to justify the additional work? If you're confident the answer is yes, just do it.
Since you're asking here, I'd guess that you don't know yet. In that case, I purpose step 2:
2) Release version 1.0 without the feature. Wait for user feedback. Your users may immediately cry for a simpler upgrade process, in which case you should prioritize it. Alternately, you may find that your users are much more concerned with some other feature.
Guessing at what your users want without asking them is a good way to waste a lot of development time on things people don't actually need.
I've been thinking about this lately in regards to database schema changes. At the moment I'm digging into WordPress to see how they've handled database changes between revisions. Here's what I've found so far:
$wp_db_version is loaded from wp-includes/version.php. This variable corresponds to a Subversion revision number, and is updated when wp-admin/includes/schema.php is changed. (Possibly through a hook? I'm not sure.) When wp-admin/admin.php is loaded, the WordPress option named db_version is read from the database. If this number is not equal to $wp_db_version, wp-admin/upgrade.php is loaded.
wp-admin/includes/upgrade.php includes a function called dbDelta(). dbDelta() scans $wp_queries (a string of SQL queries that will create the most recent database schema from scratch) and compares it to the schema in the database, altering the tables as necessary so that the schema is brought up-to-date.
upgrade.php then runs a function called upgrade_all() which runs specific upgrade_NNN() functions if $wp_db_version is less than target values. (ie. upgrade_250(), the WordPress 2.5.0 upgrade, will be run if the database version is less than 7499.) Each of these functions run their own data migration and population procedures, some of which are called during the initial database setup script. Nicely cuts down on duplicate code.
So, that's one way to do it.
Yes it would be a security feature if PHP went and overwrote its files from some place on the internet with no warning. There's no guarantee that the server is connecting correctly to your update server (it might download someone code crafted by someone else if DNS poisoning occured) - giving someone else access to your client's data. Therefore digital signing would be important.
The user could control updates by setting permissions on the web directory so that PHP only has read access to the files - this procedure could simply be documented with your program.
One question remains (I really don't know the answer to): can PHP overwrite files if it's currently using them (e.g. if the update.php file itself needed to be updated)? Worth testing.
I suppose you've already ruled this out, but you could host it as a service. (Think wordpress.com)
I'd suggest that you package your application with pear and set up a channel. Your users can then upgrade the application through a standard interface (pear). It's not entirely automatic (unless the users have some kind of automation running on top of pear), but it's standard, so any sysadmin can maintain it.
I think your best option is an update checking mechanism that will alert the administrator when there are update(s).
As you mention, there are a number of potential security problems. Due to those alone, I would suggest not doing this. Instead, try creating a fairly smart upgrading script.
Just my 2 cents: I'd consider an automatically self updating application within my CMS as a security hole, so if you decide to code this feature, you should consider to implement different levels of this behavior:
Automatically update
Check for updates and notify
Disable
Setup is following:
Drupal project, one svn repo with trunk/qa/production-ready branches, vhosts for every branch, post-commit hook that copies files from repository to docroots.
Problem is following: Drupal website often relies not only on source code but on DB data too (node types, their settings, etc.).
I'm looking for solution to make this changes versionable. But not like 'diffing' all data in database, instead something like fixtures in unit tests.
Fixture-like scripts with SQL data and files for content that should be versionable and be applied after main post-commit hook.
Is there anything written for that purpose, or maybe it would be easy to adapt some kind of build tool (like Apache Ant) or unit testing framework. And it would be very great, if this tool know about drupal, so in scripts I can do things like variable_set(), drupal_execute().
Any ideas? Or should I start coding right now instead of asking this? :)
It sounds like you've already got some infrastructure there that you've written.
So I'd start coding! There's not anything that I'm aware of thats especially good for this at the moment. And if there is, I imagine that it would take some effort to get it going with your existing infrastructure. So starting coding seems the way to go.
My approach to this is to use sql patch files (files containing the sql statements to upgrade the db schema/data) with a version number at the start of the filename. The database then contains a table with config info in (you may already have this) that includes info on which version the database is at.
You can then take a number of approaches to automatically apply the patch. One would be a script that you call from the postcommit that checks the version the database is at, and then checks to see if the latest version you have a patch for is newer than the version the db is at, and applies it/them (in order) if so.
The db patch should always finish by updating aforementioned the version number in the config table.
This approach can be extended to include the ability to set up a new database based on a full dump file and then applying any necessary patches to it to upgrade it as well.
Did a presentation on this at a recent conference (slideshare link) -- I would STRONGLY suggest that you use a site-specific custom module whose .install file contains versioned 'update' functions that do the heavy lifting for database schema changes and settings/configuration changes.
It's definitely superior to keeping .sql files around, because Drupal will keep track of which ones have run and gives you a batch-processing mechanism for anything thaht requires long-running bulk operations on lots of data.
My approach to this is to use sql patch files (files containing the sql statements to upgrade the db schema/data) with a version number at the start of the filename.
I was thinking of file (xml or something) with needed DB structure, and tool that applies necessary changes.
And yes, after more research I agreee: it will be easier to code it than to adapt some other solutions. Though some routines from simpletest drupal module will be helpful, I think.
You might want to check out the book Refactoring Databases.
The advice I heard from one of the authors is to have a script that will upgrade the database from version to version rather than building up from scratch each time.
Previously: Drupal Source Control Strategy?