Archive for the ‘Uncategorized’ Category.

New Blog Hosting

We have moved our blog to a new hosting provider and to WordPress from Blogger.

Please let us know if you have any issues or if any RSS feeds don’t transfer correctly.

Why Aren’t Databases Version Controlled?

There was a great post the other day on Coding Horror titled “Get Your Database Under Version Control“. It references a good series of posts by K. Scott Allen on database version control, but also makes the point that “When it comes to version control, the database is often a second or even third-class citizen.”

I have noticed this in the past as well, and have wondered why?

Lack of tools?
Subversion, CVS, RCS and others have been around for years, but when I came the the realization that the database must be version controlled I could find no tools that fit my need and so created LiquiBase. I think lack of tools is a symptom of the problem rather than a cause, however. Developers have never been a group to sit around waiting for a tool to solve a pressing need they see.

DBA Overlords?
In some organization changes to the database must go through a database change process managed by DBAs. The fact that the database changes are managed by an external group could create a “not our problem” situation where the developers depend on the DBAs to track changes. Again, I don’t see this as a reason because the majority of projects do not have such a process in place and so wouldn’t be depending on it.

Only Now A Big Problem?
The answer that makes most sense to me is that database versioning is a relatively new problem. Code changes need to be propagated to every developer on a team quickly and reliably and therefore automated tools like Subversion and CVS have been around for a long time and no one would ever consider a group project without them. Most developers would not even consider a solo project without them.

Databases, on the other hand, do not change as often and so manual and error-ridden processes have worked well enough in the past. As databases have become more and more central to projects of all types the old manual database update scripts are showing their limitations.

I think the final straw in the ad-hoc database management schemes has come from the growth of agile processes. Pre-agile, even smaller projects would often design the database up-front and changes to it could be managed in a piecemeal fashion throughout the project. As agile does away with as much of the up-front design as possible, the number of database changes introduced throughout a project increases dramatically and a way to quickly, reliably, and automatically apply changes becomes a necessity.

I see the existing database versioning tools like LiquiBase, DBDeploy, and ActiveRecord:Migration as the RCS of database versioning: they are a great start, but there is a lot of ground left to cover. Recent version of LiquiBase have added features such as database change rollback, database comparisons, DBDoc, and change contexts, but there is still a lot to do. That doesn’t mean, however, that you shouldn’t Get your databse under version control!

Japanese Documentation Complete

I would like to thank Yasuo Honda for all his hard work on the Japanese translation of the LiquiBase documentation. He was also a great help in determining the best wiki tool to use to support translations.

If you are interested in assisting with translating the LiquiBase documentation, feel free to edit the site wiki directly, or contact the documentation mailing list for more information.

You can view the Japanese translation by starting at http://www.liquibase.org/ja/home or by clicking the “ja” box on any page.

LiquiBase at JavaOne? Cross your Fingers!

I submitted a JavaOne 2008 session into the call for papers. I went for a combination of LiquiBase usage and general database change management. Let’s hope I’m selected!

Revenge of the Mock Tests

If you didn’t see, version 1.1.1 of LiquiBase was released the day after 1.1 because of problems reported by users with the new diff tool.

Now, we could simply say “it’s a new feature, you should expect bugs”, but they were pretty obvious ones that should have been caught by simply executing the diff tool against all our supported databases. A simple unit test or two that actually ran the diff tool exposed the bugs and they were easy to catch.

Why weren’t those tests wrote before the 1.1 release? It was because I was testing with a mock database connection which didn’t capture all the idiosyncrasies of different databases. The tests were there, and the coverage was there, but there were still bugs because of limitations in the (mock) abstraction layer.

The lesson to be learned is that when writing database access level code, never use a mock database connection.

Start With Simple Evolutionary Database Design

Reg Developer in the UK recently posted an interview with Scott Ambler about database refactoring at http://www.regdeveloper.co.uk/2007/07/04/evolutionary_database_design/.

In the interview, he discusses questions developers and DBAs often have when first approaching evolutionary database design (EDD). The two main points he covers are:

  1. Having both developers and DBAs involved in the database design process is an improvement over “traditional” waterfall-style databases designed only by the DBA.
  2. If you make your schema changes with triggers and views correctly, you can have a “transition window” that will allow you to make your changes without breaking existing systems that use the database.

While I agree with both these points, I worry about his focus on “transition window” database refactoring methods. There is certainly value in being able to provide these windows for databases that have many independent systems depending on them, but I think that for the vast majority of projects it makes database refactoring seem too difficult to even attempt.

I would propose that more focus needs to be placed on the simpler aspect of EDD: databases that are specific to a single system or set of related systems that are all managed and updated in a frequent, agile manner. These are the teams most likely to adopt an EDD process and there is still a lot of work to get them the tools they need to work efficiently.

In the interview, Scott says that “right now we’re at the beginning of the adoption curve” in regards to EDD and I agree with that. So–for now–let’s focus on the needs of the majority of those early adopters.

I see the tools and practices required to do EDD efficiently as a spectrum that range from a single-project database with no DBA, through a set of related projects with a part time DBA involved, to a database that supports a large set of independent systems that is under the constant care of a DBA. “Transition Window” methods of database refactoring work great for the high end of the spectrum and I don’t think it should be ignored because it is important to know that a process can scale.

For now, however, we need to start a the simple end and build a strong foundation of tools and techniques before working our way up the stack to the top. There is the start of the required tools in the form of DBUnit for unit testing and LiquiBase for managing refactorings, but there is still large holes including IDE support for refactorings, best practices and pattern catalogs for database testing, and more.

SOX Compliance and Database Refactoring

Managing, tracking, and applying database changes is difficult, especially in an agile database environment where there are many changes made throughout a project’s life cycle. Even with a tool like LiquiBase, It takes a lot of discipline to apply your database changes in a consistent and traceable manner.

For projects that need to deal with SOX-compliant releases the process is even more difficult because your release documentation needs to include not only how to update your database, but also how to roll back in case of a problem with the release.

It was to address this problem that we added automatic rollback support to LiquiBase. For each changeSet in your change log file, LiquiBase can (usually) generate rollback SQL. For changes that cannot be automatically undone (drop table, insert data, etc.), or if you want to override the default rollback method, you can specify a tag containing the correct SQL. This method of generating rollback commands works well because for most cases you don’t have to do anything, and when you do have to specify rollback SQL, it is stored along side the original change.

To control the generation of generating SQL for updating a database and for rolling it back, see the command line migrator documentation.

The Problem With Rails Active Migrations

Rails Active Migration is nice for its simplicity, but in non-trivial projects, it quickly falls apart due to limitations it has regarding multiple developers and/or branches. It’s a well known problem

The fundamental problem is that Rails tracks the “database version” as a single incrementing integer. That works fine when only one developer is adding migrations and when there is only one branch. When you add developers and branches, however, you quickly run into problems with duplicated version numbers, and missed migrations because the production “database version” is higher than a newly merged in migration.

The reason this is a problem is because there the reliance on a the “database version” concept. While there may be a logical database version that you can think about, really the state of the database is simply the set of all the applied migrations. Think of a file in source control. The source control system may generate “version” numbers for reference, but it doesn’t use them to decide what should be merged in. When you say “merge the “1_1″ branch into trunk, it simply takes all the changes in the “1_1″ branch and applies the same changes to trunk, regardless of whether the file in trunk has a higher “version number” or not. The same logic needs to be applied to database updates.

There are attempts to solve the situation in the Ruby community. The work-around solution works by naming each change by a timestamp and storing all the executed changes in the schema_migrations table. While it is a step in the right direction, if you need to re-order the execution of changes after a commit by multiple developers or after a merge it can be difficult. Hopefully a good solution to the problem is created soon, or we will have to develop a port of LiquiBase for Ruby :)

Managing Change Logs

While you can include all your change sets in one giant change log, there are many good reasons to split them up. While there are many ways to divide your change logs, the best strategy I have found is to create a change log per major java package that contains data access code. This approach has several advantages:

Easy to Know Where to Add (and Look For) Database Changes

If you are making a database change due to code in a package, you know exactly where to put the change.

Makes Code-Reuse and Code-Repackaging Easier

Since packages are often used as a logical code unit, the code to manage database changes can be moved and re-used along with the java code.

Keeps Database Changes Closer to the Code

You don’t need to go hunt through your file navigator as you switch between adding required databases changes and making the corresponding code changes.

Fewer File Conflicts

On multi-developer teams, the database change logs are a shared resource with many individuals editing them. Like any version controlled file, the more developers and the more branches that touch a file, the more chances for problems. Breaking up a change log into multiple files limits the changes and extent of merge issues.

There are, of course, other ways to break up your change log files including one change log per project and one change log per table. Depending on your requirements, these or other strategies may work better for you. The important thing is to find what works best for you.

The Problem With Database Diffs

When talking about how LiquiBase works, I’m often asked “Why not just do a database diff to know what changes need to be applied to a new database?”

There are advantages to the database diff technique compared to LiquiBase’s method of tracking changes, mainly related to not relying on developers to remember to document the changes they made. You simply have everyone make whatever changes they want to the dev database and let the diff tool sort it out.

There are two fundamental flaw with diff tools, however. The first is is that while they do a good job of showing what the differences are syntactically, they don’t understand the semantics of the changes. For example, if you rename a column from “fname” to “firstname”, a diff will show you that there is a “fname” column that needs to be dropped and a “firstname” column that needs to be be added. If you follow these suggestions, you will end up with a database that is structured correctly, but you will loose your valuable data when the changes is applied to an existing production system. You need to not just understand what the structural differences are, you also need to understand why they are.

The second flaw is that it is impossible to diff the data so you find inserts, updates, and deletes that must be applied for the system to work correctly. These data changes can range from additional lookup table values to copying data in a de-normalizing process and can be just as important as the structural changes. Theoretically, a diff tool could also check for new, updated, and missing data between database, but in practice this cannot work for two reasons:

  1. Performance. As your data set grows, the amount of information to compare grows until it is unmanageable.
  2. Changing Data. During development, test data is often added to the development database that shouldn’t be copied into other databases. Also, new data may be added to testing and production databases that should not be deleted just because it doesn’t exist in the development database.

If you track your data changes along with your structural changes during development, however, you can be certain that they will be applied correctly down the road.

In the end, I feel that the problems associated with a diff tool outweighs the convenience and simplicity they offer. While they can have great value as a “did all the changes get stored in the change log correctly” safeguard, it should be relied apon as your primary means of tracking database changes.