Starting with versioning mysql schemata without overkill. Good solutions?

Starting with versioning mysql schemata without overkill. Good solutions? - php

I've arrived at the point where I realise that I must start versioning my database schemata and changes. I consequently read the existing posts on SO about that topic but I'm not sure how to proceed.
I'm basically a one man company and not long ago I didn't even use version control for my code. I'm on a windows environment, using Aptana (IDE) and SVN (with Tortoise). I work on PHP/mysql projects.
What's a efficient and sufficient (no overkill) way to version my database schemata?
I do have a freelancer or two in some projects but I don't expect a lot of branching and merging going on. So basically I would like to keep track of concurrent schemata to my code revisions.
[edit] Momentary solution: for the moment I decided I will just make a schema dump plus one with the necessary initial data whenever I'm going to commit a tag (stable version). That seems to be just enough for me at the current stage.[/edit]
[edit2]plus I'm now also using a third file called increments.sql where I put all the changes with dates, etc. to make it easy to trace the change history in one file. from time to time I integrate the changes into the two other files and empty the increments.sql[/edit]

Simple way for a small company: dump your database to SQL and add it to your repository. Then every time you change something, add the changes in the dump file.
You can then use diff to see changes between versions, not to mention have comments explaining your changes. This will also make you virtually immune to MySQL upgrades.
The one downside I've seen to this is that you have to remember to manually add the SQL to your dumpfile. You can train yourself to always remember, but be careful if you work with others. Missing an update could be a pain later on.
This could be mitigated by creating some elaborate script to do it for you when submitting to subversion but it's a bit much for a one man show.
Edit: In the year that's gone by since this answer, I've had to implement a versioning scheme for MySQL for a small team. Manually adding each change was seen as a cumbersome solution, much like it was mentioned in the comments, so we went with dumping the database and adding that file to version control.
What we found was that test data was ending up in the dump and was making it quite difficult to figure out what had changed. This could be solved by dumping the schema only, but this was impossible for our projects since our applications depended on certain data in the database to function. Eventually we returned to manually adding changes to the database dump.
Not only was this the simplest solution, but it also solved certain issues that some versions of MySQL have with exporting/importing. Normally we would have to dump the development database, remove any test data, log entries, etc, remove/change certain names where applicable and only then be able to create the production database. By manually adding changes we could control exactly what would end up in production, a little at a time, so that in the end everything was ready and moving to the production environment was as painless as possible.

How about versioning file generated by doing this:
mysqldump --no-data database > database.sql

Where I work we have an install script for each new version of the app which has the sql we need to run for the upgrade. This works well enough for 6 devs with some branching for maintenance releases. We're considering moving to Auto Patch http://autopatch.sourceforge.net/ which handles working out what patches to apply to any database you are upgrading. It looks like there may be some small complication handling branching with auto Patch, but it doesn't sound like that'll be an issue for you.

i'd guess, a batch file like this should do the job (didn't try tough) ...
mysqldump --no-data -ufoo -pbar dbname > path/to/app/schema.sql
svn commit path/to/app/schema.sql
just run the batch file after changing the schema, or let a cron/scheduler do it (but i don't know ... i think, commits work if just the timestamps changed, even if the contents is the same. don't know if that would be a problem.)

The main ideea is to have a folder with this structure in your project base path
/__DB
—-/changesets
——–/1123
—-/data
—-/tables
Now who the whole thing works is that you have 3 folders:
Tables
Holds the table create query. I recommend using the naming “table_name.sql”.
Data
Holds the table insert data query. I recommend using the same naming “table_name.sql”.
Note: Not all tables need a data file, you would only add the ones that need this initial data on project install.
Changesets
This is the main folder you will work with.
This holds the change sets made to the initial structure. This holds actually folders with changesets.
For example i added a folder 1123 wich will contain the modifications made in revision 1123 ( the number is from your code source control ) and may contain one or more sql files.
I like to add them grouped into tables with the naming xx_tablename.sql - the xx is a number that tells the order they need to be runned, since sometimes you need the modification runned in a certain order.
Note:
When you modify a table, you also add those modifications to table and data files … since those are the file s that will be used to do a fresh install.
This is the main ideea.
for more details you could check this blog post

Take a look at SchemaSync. It will generate the patch and revert scripts (.sql files) needed to migrate and version your database schema over time. It's a command line utility for MySQL that is language and framework independent.

Some months ago I searched tool for versioning MySQL schema. I found many useful tools, like Doctrine migration, RoR migration, some tools writen in Java and Python.
But no one of them was satisfied my requirements.
My requirements:
No requirements , exclude PHP and MySQL
No schema configuration files, like schema.yml in Doctrine
Able to read current schema from connection and create new migration script, than represent identical schema in other installations of application.
I started to write my migration tool, and today I have beta version.
Please, try it, if you have an interest in this topic.
Please send me future requests and bugreports.
Source code: bitbucket.org/idler/mmp/src
Overview in English: bitbucket.org/idler/mmp/wiki/Home
Overview in Russian: antonoff.info/development/mysql-migration-with-php-project

Our solution is MySQL Workbench. We regularly reverse-engineer the existing Database into a Model with the appropriate version number. It is then possible to easily perform Diffs between versions as needed. Plus, we get nice EER Diagrams, etc.

At our company we did it this way:
We put all tables / db objects in their own file, like tbl_Foo.sql. The files contain several "parts" that are delimited with
-- part: create
where create is just a descriptive identification for a given part, the file looks like:
-- part: create
IF not exists ...
CREATE TABLE tbl_Foo ...
-- part: addtimestamp
IF not exists ...
BEGIN
ALTER TABLE ...
END
Then we have an xml file that references every single part that we want executed when we update database to new schema.
It looks pretty much like this:
<playlist>
<classes>
<class name="table" desc="Table creation" />
<class name="schema" desc="Table optimization" />
</classes>
<dbschema>
<steps db="a_database">
<step file="tbl_Foo.sql" part="create" class="table" />
<step file="tbl_Bar.sql" part="create" class="table" />
</steps>
<steps db="a_database">
<step file="tbl_Foo.sql" part="addtimestamp" class="schema" />
</steps>
</dbschema>
</playlist>
The <classes/> part if for GUI, and <dbschema/> with <steps/> is to partition changes. The <step/>:s are executed sequentially. We have some other entities, like sqlclr to do different things like deploy binary files, but that's pretty much it.
Of course we have a component that takes that playlist file and a resource / filesystem object that crossreferences the playlist and takes out wanted parts and then runs them as admin on database.
Since the "parts" in .sql's are written so they can be executed on any version of DB, we can run all parts on every previous/older version of DB and modify it to be current.
Of course there are some cases where SQL server parses column names "early" and we have to later modify part's to become exec_sqls, but it doesn't happen often.

I think this question deserves a modern answer so I'm going to give it myself. When I wrote the question in 2009 I don't think Phinx already existed and most definitely Laravel didn't.
Today, the answer to this question is very clear: Write incremental DB migration scripts, each with an up and a down method and run all these scripts or a delta of them when installing or updating your app. And obviously add the migration scripts to your VCS.
As mentioned in the beginning, there are excellent tools today in the PHP world which help you manage your migrations easily. Laravel has DB migrations built-in including the respective shell commands. Everyone else has a similarly powerful framework agnostic solution with Phinx.
Both Artisan migrations (Laravel) and Phinx work the same. For every change in the DB, create a new migration, use plain SQL or the built-in query builder to write the up and down methods and run artisan migrate resp. phinx migrate in the console.

I do something similar to Manos except I have a 'master' file (master.sql) that I update with some regularity (once every 2 months). Then, for each change I build a version named .sql file with the changes. This way I can start off with the master.sql and add each version named .sql file until I get up to the current version and I can update clients using the version named .sql files to make things simpler.

Related

Laravel: +100 Tables... Use Migrations or not?

I came across a situation where having hundreds of database tables and rewriting them all into Laravel Migrations does not seems... a very nice task.
I know Laravel Migrations is a really cool feature top keep track of database changes among with some VCS such an GIT
BUT... Not being able to update the database with php artisan migrate in the production server technically drops away the use of migrations making it real pain... manually changing table by table adding columns, index or foreign keys.
QUESTION: Is there any way for Laravel Migrations to write the changes (SQL statements) to a file instead of doing it directly to the database?

I've come accross the same problem many times, and i did the following to solve it
First when you finally finish the mysql database/structure and you are about to publish the application you must set a "mark" on the database,export it correctly and declare that its the version 1 of database. After that you can start writing migrations and you will be more confident plus you will avoid many problems such us invalid data types, bad structure and other, others.
Another way is making use of toSql make it output under a folder lets say rawSqlDatabaseMigrations with timestamps and such.
Also you could just keep manually writing SQL and use migration only with DB::raw.

With pt-online-schema-change is it possible to pause at the "swap" stage for codebase deploy?

I'm investigating using pt-online-schema-change to assist in certain migrations where an ALTER TABLE command would cause a long maintenance window. I understand that pt-online-schema-change will create an empty copy of the table to perform the ALTER TABLE command on, then migrate rows over in batches from the old table to the new, and create triggers to manage data changes in the interim.
But at the moment when the new table is swapped with the old, is it possible to pause at that point so that we can time it with a new codebase deploy? I don't see this addressed in the documentation. Obviously our PHP ORM (doctrine) of our new release (using Symfony) will be expecting a certain schema to be in place and will cause problems if the swap happens either before or after the codebase deploy.
A related question is I understand foreign key constraints have to be updated on all child tables because otherwise they will still reference the old table. Does that mean this stage has be done behind a maintenance window? I don't see how you could do that ahead of time if we are timing the data migrations to coincide with the release of a particular codebase.

You can write code for the pause logic into a plugin that runs before_swap_tables.
For example (not tested, but this should give you the basic idea):
package pt_online_schema_change_plugin
sub new() {
open HANDLE, ">>pause_please.txt" or die "$!";
close HANDLE;
}
sub before_swap_tables() {
print "Pausing until you rm pause_please.txt...\n";
while (-f "pause_please.txt") {
sleep(1);
}
}
Then when you're ready, open another shell window to rm the file, and then pt-online-schema-change will continue.
See the "PLUGIN" section in the documentation for details.
Re your comment:
It might seem like everyone must have the problem you are facing at the current moment. But you are the first person I've heard ask for this feature.
Most people write code that can read and write the tables before and after the schema change. For example if you're adding a column, avoid SELECT * and INSERT ... VALUES() statements, so it doesn't surprise the app when a new column appears. So it's not necessary to time code pushes with schema changes.
But sometimes the nature of the schema change needs a code change. For example, renaming a column. For these cases when that is necessary, a few minutes of downtime is acceptable.

as of version pt-online-schema-change 2.2.20 there is option --pause-file
documentation however this will only update schema and wait with copying data until file is deleted.
I have created this plugin that will copy data aswell.

Redirect document root internally to another folder based on condition

I currently have the following setup:
*.mysite.com --> /home/public_html/app/index.php
I want to write some code in index.php that changes the whole document root to /home/public_html/app_prev/index.php based on a condition. The reason for this is that I am doing a migration and if they haven't been migrated yet, I want to serve the old version of the code; once they are migrated. Each user has their own database and I will migrate 1 by 1. Normally it would take seconds to migrate all of them, but this release will take awhile to do.
Is this possible?
Is this recommend when making large database schema changes? Will it cause performance problems/errors?

You could just use a PHP redirect based on the condition that you're looking for. It's no different then serving a different page based on what's coming in.
It's a reasonable implementation if you have many large databases and you're worried about performance. I'd test it by
Keep the old code path and old database.
Migrate a test database over to the new codebase. I don't know how you're doing your logic, but you could have a single column, one entry table in each database that describes whether it's on the old or new code base.
Test that the new codebase works.
Start migrating your databases over, changing that single entry in each database (or however your logic is determined).

How to "upgrade" the database in real world?

My company have develop a web application using php + mysql. The system can display a product's original price and discount price to the user. If you haven't logined, you get the original price, if you loginned , you get the discount price. It is pretty easy to understand.
But my company want more features in the system, it want to display different prices base on different user. For example, user A is a golden parnter, he can get 50% off. User B is a silver parnter, only have 30 % off. But this logic is not prepare in the original system, so I need to add some attribute in the database, at least a user type in this example. Is there any recommendation on how to merge current database to my new version of database. Also, all the data should preserver, and the server should works 24/7. (within stop the database)
Is it possible to do so? Also , any recommend for future maintaince advice? Thz u.

I would recommend writing a tool to run SQL queries to your databases incrementally. Much like Rails migrations.
In the system I am currently working on, we have such tool written in python, we name our scripts something like 000000_somename.sql, where the 0s is the revision number in our SCM (subversion), and the tool is run as part of development/testing and finally deploying to production.
This has the benefit of being able to go back in time in terms of database changes, much like in code (if you use a source code version control tool) too.

http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
Here are more concrete examples of ALTER TABLE.
http://php.about.com/od/learnmysql/p/alter_table.htm
You can add the necessary columns to your table with ALTER TABLE, then set the user type for each user with UPDATE. Then deploy the new version of your app. that uses the new column.

Did you use an ORM for data access layer ? I know Doctrine comes with a migration API which allow version switch up and down (in case something went wrong with new version).
Outside any framework or ORM consideration, a fast script will minimize slowdown (or downtime if process is too long).
To my opinion, I'd rather prefer a 30sec website access interruption with an information page, than getting shorter interuption time but getting visible bugs or no display at all. If interruption times matters, it's best doing this at night or when lesser traffic.
This can all be done in one script (or at least launched by one commande line), when we'd to do such scripts we include in a shell script :
putting application in standby (temporary static page) : you can use .htaccess redirect or whatever applicable to your app/server environment.
svn udpate (or switch) for source code and assets upgrade
empty caches, cleaning up temp files, etc.
rebuild generated classes (symfony specific)
upgrade DB structure with ALTER / CREATE TABLE querys
if needed, migrate data from old structure to new : depending on what you changed on structure, it may require fetching data before altering DB structure, or use tmp tables.
if all went well, remove temporary page. Upgrade done
if something went wrong display a red message to the operator so it can see what happened, try to fix it and then remove waiting page by hand.
The script should do checks at each steps and stop a first error, and it should be verbose (but concise) about what it does at all steps, thus you can fix the app faster if something has to went wrong.
The best would be a recoverable script (error at step 2 - stop process - manual fix - recover at step 3), I never took the time to implement it this way.
If works pretty well but these kind of script have to be intensively tested, on an environnement as closest as possible to the production one.
In general we develop such scripts locally, and test them on the same platform tha the production env (just different paths and DB)
If the waiting page is not an option, you can go whithout but you need to ensure data and users session integrity. As an example, use LOCK on tables during upgrade/data transfer and use exclusive locks on modified files (SVN does I think)
There could other better solutions, but it's basically what I use and it do the job for us. The major drawback is that kind of script had to be rewritten at each major release, this incitate me to search for other options to do this, but which one ??? I would be glad if someone here had better and simpler alternative.

Migrating database changes from development to live

Perhaps the biggest risk in pushing new functionality to live lies with the database modifications required by the new code. In Rails, I believe they have 'migrations', in which you can programmatically make changes to your development host, and then make the same changes live along with the code that uses the revised schema. And roll both backs if needs be, in a synchronized fashion.
Has anyone come across a similar toolset for PHP/MySQL? Would love to hear about it, or any programmatic or process solutions to help make this less risky...

I don't trust programmatic migrations. If it's a simple change, such as adding a NULLable column, I'll just add it directly to the live server. If it's more complex or requires data changes, I'll write a pair of SQL migration files and test them against a replica database.
When using migrations, always test the rollback migration. It is your emergency "oh shit" button.

I've never come across a tool that would do the job. Instead I've used individual files, numbered so that I know which order to run them: essentially, a manual version of Rails migrations, but without the rollback.
Here's the sort of thing I'm talking about:
000-clean.sql # wipe out everything in the DB
001-schema.sql # create the initial DB objects
002-fk.sql # apply referential integrity (simple if kept separate)
003-reference-pop.sql # populate reference data
004-release-pop.sql # populate release data
005-add-new-table.sql # modification
006-rename-table.sql # another modification...
I've never actually run into any problems doing this, but it's not very elegant. It's up to you to track which scripts need to run for a given update (a smarter numbering scheme could help). It also works fine with source control.
Dealing with surrogate key values (from autonumber columns) can be a pain, since the production database will likely have different values than the development DB. So, I try never to reference a literal surrogate key value in any of my modification scripts if at all possible.

I've used this tool before and it worked perfectly.
http://www.mysqldiff.org/
It takes as an input either a DB connection or a SQL file, and compares it to the same (either another DB connection or another SQL file). It can spit out the SQL to make the changes or make the changes for you.

#[yukondude]
I'm using Perl myself, and I've gone down the route of Rails-style migrations semi-manually in the same way.
What I did was have a single table "version" with a single column "version", containing a single row of one number which is the current schema version. Then it was (quite) trivial to write a script to read that number, look in a certain directory and apply all the numbered migrations to get from there to here (and then updating the number).
In my dev/stage environment I frequently (via another script) pull the production data into the staging database, and run the migration script. If you do this before you go live you'll be pretty sure the migrations will work. Obviously you test extensively in your staging environment.
I tag up the new code and the required migrations under one version control tag. To deploy to stage or live you just update everything to this tag and run the migration script fairly quick. (You might want to have arranged a short downtime if it's really wacky schema changes.)

The solution I use (originally developed by a friend of mine) is another addendum to yukondude.
Create a schema directory under version control and then for each db change you make keep a .sql file with the SQL you want executed along with the sql query to update the db_schema table.
Create a database table called "db_schema" with an integer column named version.
In the schema directory create two shell scripts, "current" and "update". Executing current tells you which version of the db schema the database you're connected to is currently at. Running update executes each .sql file numbered greater than the version in the db_schema table sequentially until you're up to the greatest numbered file in your schema dir.
Files in the schema dir:
0-init.sql
1-add-name-to-user.sql
2-add-bio.sql
What a typical file looks like, note the db_schema update at the end of every .sql file:
BEGIN;
-- comment about what this is doing
ALTER TABLE user ADD COLUMN bio text NULL;
UPDATE db_schema SET version = 2;
COMMIT;
The "current" script (for psql):
#!/bin/sh
VERSION=`psql -q -t <<EOF
\set ON_ERROR_STOP on
SELECT version FROM db_schema;
EOF
`
[ $? -eq 0 ] && {
echo $VERSION
exit 0
}
echo 0
the update script (also psql):
#!/bin/sh
CURRENT=`./current`
LATEST=`ls -vr *.sql |egrep -o "^[0-9]+" |head -n1`
echo current is $CURRENT
echo latest is $LATEST
[[ $CURRENT -gt $LATEST ]] && {
echo That seems to be a problem.
exit 1
}
[[ $CURRENT -eq $LATEST ]] && exit 0
#SCRIPT_SET="-q"
SCRIPT_SET=""
for (( I = $CURRENT + 1 ; I <= $LATEST ; I++ )); do
SCRIPT=`ls $I-*.sql |head -n1`
echo "Adding '$SCRIPT'"
SCRIPT_SET="$SCRIPT_SET $SCRIPT"
done
echo "Applying updates..."
echo $SCRIPT_SET
for S in $SCRIPT_SET ; do
psql -v ON_ERROR_STOP=TRUE -f $S || {
echo FAIL
exit 1
}
done
echo OK
My 0-init.sql has the full initial schema structure along with the initial "UPDATE db_schema SET version = 0;". Shouldn't be too hard to modify these scripts for MySQL. In my case I also have
export PGDATABASE="dbname"
export PGUSER="mike"
in my .bashrc. And it prompts for password with each file that's being executed.

Symfony has a plugin called sfMigrationsLight that handles basic migrations. CakePHP also has migrations.
For whatever reason, migration support has never really been a high priority for most of the PHP frameworks and ORMs out there.

Pretty much what Lot105 described.
Each migration needs an apply and rollback script, and you have some kind of control script which checks which migration(s) need to be applied and applies them in the appropriate order.
Each developer then keeps their db in sync using this scheme, and when applied to production the relevant changes are applied. The rollback scripts can be kept to back out a change if that becomes necessary.
Some changes can't be done with a simple ALTER script such as a tool like sqldiff would produce; some changes don't require a schema change but a programmatic change to existing data. So you can't really generalise, which is why you need a human-edited script.

I use SQLyog to copy the structure, and I ALWAYS, let me repeat ALWAYS make a backup first.

I've always preferred to keep my development site pointing to the same DB as the live site. This may sound risky at first but in reality it solves many problems. If you have two sites on the same server pointing to the same DB, you get a real time and accurate view of what your users will see when it goes live.
You will only ever have 1 database and so long as you make it a policy to never delete a column from a table, you know your new code will match up with the database you are using.
There is also significantly less havoc when migrating. You only need to move over the PHP scripts and they are already tested using the same DB.
I also tend to create a symlink to any folder that is a target for user uploads. This means there is no confusion on which user files have been updated.
Another side affect is the option of porting over a small group of 'beta-testers' to use the site in everyday use. This can lead to a lot of feedback that you can implement before the public launch.
This may not work in all cases but I've started moving all my updates to this model. It's caused much smoother development and launches.

In the past I have used LiquiBase, a Java-based tool where you configure your migrations as XML files. You can generate the necessary SQL with it.
Today I'd use the Doctrine 2 library which has migration facilities similar to Ruby.
The Symfony 2 framework also has a nice way to deal with schema changes - its command line tool can analyze the existing schema and generate SQL to match the database to the changed schema definitions.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.