ActiveRecord migrations are a killer feature of Ruby on Rails. The feature is very well-implemented, it's easy to use, and countless teams have benefited from it. Before Rails, most teams I encountered were in the habit of making incredibly error-prone ad hoc changes to each of their application's databases. It's one part of the framework I wish were emulated much more broadly than it has been.

While the Rails Guides do a great job teaching developers how to take advantage of migrations, there is little guidance on the habits needed to keep an application's migrations healthy over the long-term. This post will outline a few things I look for in well-maintained Rails projects.

Keep in mind that for most "application code", it's enough that we strive to write "maintainable" code. We don't have to get everything right today, because we can always improve it tomorrow and redeploy. But once migrations have been initially pushed and deployed, they aren't supposed to ever change again. That means "maintainability" per se isn't achievable, which places the onus on the developer to get migrations right up front. (That conclusion can be hard for teams to swallow, since it runs counter to the prevailing anti-future-proof winds of Agile Best Practice™.)

A few habits of healthy db/migrate directories follow below.

Habit 1: Keep them working

At any point in the life of a Rails application, it ought to be possible for a developer to run all of its migrations from a clean database. To be clear:

$ rake db:drop db:create db:migrate

Ought to always succeed.

Even though rake db:schema:load should have the same effect as running all your migrations, its success will hinge on the accuracy of your project's schema.rb, which in turn is generated by running all of the project's migrations. Developers can only have confidence in their schema.rb file if they're able to regenerate it exactly by re-running the project's migrations from scratch. If a project reaches a point where old migrations can no longer be run, the team is left to trust the veracity of a generated file that they can no longer validate.

In fact, if the schema.rb isn't generated from scratch now-and-again, tiny errors and divergences will tend to accumulate as new migrations are iterated upon. To illustrate: a developer may very well write a migration, run rake db:migrate, then change some aspect of migration, before erroneously committing changes to schema.rb that were generated before the migration was itself finalized.

When this occurs, such a team's development and test environments will reflect whatever the schema.rb indicates, whereas their production environment's schema—itself the product of only the sum of all deployed migrations—may differ in non-trivial and surprising ways.

While as a general rule it's advisable never to modify a deployed migration, if the alternative is "all our migrations are broken forever", it's worth finding a minimally invasive fix.

Habit 2: Run them often

Regularly running all of your application's migrations from an empty database provides two immediate benefits. First, any broken migrations will be detected sooner, so they can be fixed more easily. Second, the authoritativeness of your project's schema.rb will be regularly validated—if git detects a change in the file after running all migrations, then it's as easy-to-fix as committing those changes.

This is why I tend to run db:migrate instead of db:schema:load in any bootstrap scripts that might be distributed with the project. It might be marginally slower to reinitialize a development database, but that's pretty easily outweighed by the aforementioned benefits.

Habit 3: Habitually "redo" them

Most migrations should be reversible. If a column is added in a migration, rolling back that migration should remove it. Every time I add a new migration and see it succeed, I always make sure its down migration works too. To do this right after successfully applying a migration, just run:

$ rake db:migrate:redo

Which will revert the most recent migration and then reapply it. This will reveal any problems with the migration's down directive. And, if no problems appear, then the database is up-to-date and you can go about your business.

This has become even more important since Rails 3 introduced the otherwise nifty change hook, because some operations will succeed while migrating forward and only fail when reversed. (Rails 4 corrected a number of these cases, but errors can still crop up.)

Habit 4: Don't reference models

Suppose your application has a User model and several related migrations. Your:

  • User ActiveRecord model assumes the "users" table is up-to-date
  • 20120204...change_users migration assumes the "users" table is exactly as it was on February 4th, 2012

There's a glaring impedance mismatch here, if you think about it. When a migration is run, it's always in a context where the database schema is out-of-date. When an ActiveRecord model is loaded, however, it's always in a context where the database schema is up-to-date.

ActiveRecord models inherit lots of their behavior by interrogating the state of the database schema when they're first loaded. If, at model-load-time, the schema is out of alignment with what your model's internal code expects (validations, callbacks, etc.), it's very likely your model will raise errors in the context of a running migration. Therefore, never reference your application's ActiveRecord models from your migrations.

Why might anyone think to do this? Because frequently, complex changes require migrations to existing data as well as to the schema, and interacting with data is much easier with ActiveRecord's APIs than it is by way of hand-written SQL updates. (Not to mention that writing complex data migrations in raw SQL is terrifically difficult in comparison to accomplishing most other SQL tasks, yet we adopted an ORM to avoid even those cases.)

If you've written migrations that depend on loading your application's actual ActiveRecord models, take comfort in knowing you're not alone, because Rails developers seem to inadvertently do this all the time. At least some of the blame lies with Rails itself for using the same load strategy when running migrations as it uses when loading the entire application. (If it were up to me, I'd make everything under app/ off-limits to migrations.)

And while I'm perfectly content just dictating that one shouldn't do this, it may help to have a longform illustration at hand as to how referencing models from migrations can come back to bite you.

An illustration

Suppose you start a User class:

class User < ActiveRecord::Base
end

And with it, a simple migration:

class CreateUsers < ActiveRecord::Migration
  def change
    create_table :users do |t|
      t.string :name
    end
  end
end

Later, you might decide to split the user's name up into a first and last name. You could accomplish this with another migration, this one including a data migration:

class SplitUserNameFields < ActiveRecord::Migration
  def up
    add_column :users, :first_name, :string
    add_column :users, :last_name, :string
    User.find_each do |u|  
      u.update!(
        :first_name => u.name.split(" ").first,
        :last_name => u.name.split(" ").last,
      )
    end
    remove_column :users, :name
  end

  def down
    add_column :users, :name, :string
    User.find_each do |u|  
      u.update!(
        :name => "#{u.first_name} #{u.last_name}"
      )
    end
    remove_column :users, :first_name
    remove_column :users, :last_name
  end
end

Later on, you might decide to change your model in some way that makes loading the class, querying for instances, or saving changes impossible from the perspective of an out-of-date schema. An easy example is to add the acts_as_paranoid gem, which adds logical deletion to models, like so:

class User < ActiveRecord::Base
  acts_as_paranoid
end

This requires the addition of a deleted_at column:

class AddDeletedAtToUsers < ActiveRecord::Migration
  def change
    add_column :users, :deleted_at, :time
  end
end

Running rake db:migrate and rake db:migrate:redo will work fine at this point.

However, we've inadvertently broken our old migration! If we were to run rake db:drop db:create db:migrate, our data migration would fail because the acts_as_paranoid gem will preclude User from being loaded when a deleted_at column doesn't exist. Whoops!

Luckily, there are safer ways of leveraging ActiveRecord's APIs without loading our models under app/!

Using ActiveRecord models safely

We could fix the now-broken data migration in the previous illustration by defining a new ActiveRecord::Base subclass that's designed to only be used for the purpose of the migration. By updating it to:

class SplitUserNameFields < ActiveRecord::Migration
  class MigrationUser < ActiveRecord::Base
    self.table_name = :users
  end
  def up
    add_column :users, :first_name, :string
    add_column :users, :last_name, :string
    MigrationUser.find_each do |u|  
      u.update!(
        :first_name => u.name.split(" ").first,
        :last_name => u.name.split(" ").last,
      )
    end
    remove_column :users, :name
  end
  #...
end

A clean run of all our migrations will once again succeed! In fact, if our data migration required logic involving associations, there's nothing preventing us from defining them as well with configuration like has_many :pets, :class_name => "SplitUserNameFields::Pet".

Once you've established these idioms in your project, it's easy to carry them forward. Now, you can define all the data migrations you like without any risk that future changes to your application code could someday prevent your old migrations from working.

If you enjoy this post, let us know by twitter or e-mail! If you'd like to discuss it, open an issue on our feedback repo! Or share it on HN!