Live with the pain or change it?

Chances are you’ve worked on an application with a less-than-ideal database schema. Maybe the users table had over 50 columns and became costly to query, or there was an absence of foreign keys. Whatever your case may be, you have the decision to either live with it or change it.

Modifying the database schema can be daunting if you’ve never done it before. There are a lot of unknowns, and you might worry: how can the database change without breaking the application? But living with it causes another sort of pain. Sure, changing the database schema is risky, but with the right know-how it becomes quite trivial.

tl;dr


To help you prep for applying this in your own environment, first I’ll run through an example as if nothing ever goes wrong. We all know that’s never what actually happens. So, you’ll also find a “Field notes” section for each step of the process at the end of this post.

I have a web application with a user interface that allows users to create and view golf courses and its architects. Right now, there are only golf courses that have 1 or 2 architects, so all I need is this simple golf_courses table:

Database structure

As more people use my application, the catalog of golf courses grows and I find some limitations/issues with my current schema:

  • Users complain that they want to add more than two architects to a course.
  • Courses are renovated by newer architects and I’m unsure how to classify and store that data.

We’ll only solve the first issue in this blog post, but you’ll see how the changes we make would prepare us to solve the second issue too. We’ll also only be focusing on the database and back end code, which are the hard pieces of work. To completely solve the problem we need to update the UI, but I won’t cover that in this post.

1. Make non-destructive schema changes

Making non-destructive changes means that removing columns or tables is off-limits in this step. The schema can only be added to. This is important because the application is still depending on the old schema. The schema must remain intact while the application is weaned off of it or we threaten the application’s stability.

Back to my golf course application, I’ve created two new tables: golf_course_architects and architects. While it might be tempting to remove the architect_1 and architect_2 columns from the golf_courses table, remember we can only add to the schema. These changes should be deployed to production right now.

Database structure

2. Write to the new schema

Now that the new schema is in production, it’s time to start writing to it. The goal is to keep the old and new schema in sync going forward. So, you need to identify all areas of the code base that are writing to your old schema. Now, update those instances so that they also write to the new schema.

Here’s how my application is handling write operations:

class GolfAppController
  def create_golf_course(name, architects)
    GolfCourseModel.create(name: name,
                           architect_1: architects[0],
                           architect_2: architects[1])
  end
end

class AdminGolfCourseController
  def create_golf_course(name, architect_1, architect_2)
    GolfCourseModel.create(name: name,
                           architect_1: architect_1,
                           architect_2: architect_2)
  end
end

I recommend the following approach for writing to your new schema alongside the old schema:

  • Do not stop writing to the old schema.
  • Create an abstraction layer to hide implementation details of database operations.
  • Introduce a feature flag to conditionally write to the new schema. This is helpful for testing with internal users.

Here’s a slimmed down version of how I implemented my approach:

class GolfAppController
  def create_golf_course(name, architects)
    GolfCourseCreator.create_golf_course(name, architects)
  end
end

class AdminGolfCourseController
  def create_golf_course(name, architect_1, architect_2)
    GolfCourseCreator.create_golf_course(name, [architect_1, architect_2])
  end
end

class GolfCourseCreator
  def create_golf_course(name, architects)
    # write to the old schema here

    if FeatureFlags.new_golf_course_schema_write.enabled?
      # write to the new schema here
    end
  end
end

3. Populate and verify the new schema

The old schema has data that the new schema doesn’t since it has been around longer. For example, the golf course application has been around for two years. In that time it has accumulated 10,000 golf courses. We only introduced writing to the new schema yesterday, so the new schema has only about 10 courses.

To correct this, create a script that takes data from the old schema and copies it to the new schema. The script needs to be idempotent to leave data unchanged on repeated runs. Finally, the script verifies that the old and new schema are in sync. Verification can look different depending on the use case and schema modifications, but it usually compares the counts of the old and new rows.

How you write the script is up to you. It can be in plain SQL—which is my preference—or it can be in a programming language you’re most comfortable with. The script should be easy to run against production data and idempotent, those are the only constraints. The script can also be broken out into two scripts: one for population and the other for verification. There are pros and cons to each approach. For this post I’ve kept population and verification in the same script and will only refer to it as simply “the population script” going forward.

Here’s how I populate and then verify the architects table. This is a good way to bolster your SQL skills as there are loads of ways to accomplish this.

-- populate architects
insert into
  architects (name)
select golf_course.architect
from golf_courses course
cross join lateral (values
  (course.name, course.architect_1),
  (course.name, course.architect_2)
)
as golf_course(course, architect)
where architect is not null
on conflict (name) do nothing;

-- verify number of architects matches distinct architect records in
-- the golf_courses table
select
  count(distinct(golf_course.architect)) as old,
  (select count(id) from architects ) as new
from golf_courses course
cross join lateral (values
  (course.name, course.architect_1),
  (course.name, course.architect_2)
)
as golf_course(course, architect)
where architect is not null;

-- OMITTED for the sake of the blog post:
-- populate golf_course_architects
-- verify number of golf_course_architects equals the number of distinct
-- combinations of course and architect

Deploy the code changes to production. Once deployed, run the population script as well, verifying the output is as expected. This is a good time to enable the feature flag for yourself or your team, to test that writing to the new schema is working.

4. Read from the new schema

This step follows the same strategy as the code updating part of the write step, just with reads, so I’m omitting the details more detail can be found below. For the golf course application, whenever a client makes a request for a golf course, we want to return data from the new schema instead of the old. The main thing to avoid is altering the data structure that clients are expecting.

Here is how data is currently being returned in my golf course app:

{
  "Manakiki Golf Course": {
    "architect_1": "Donald Ross",
    "architect_2": null
  },
  "Sheep Ranch": {
    "architect_1": "Ben Crenshaw",
    "architect_2": "Bill Coore"
  }
}

Here’s how I’d like to return it (eventually):

{
  "Manakiki Golf course": {
    "architects": [
      { "name": "Donald Ross" }
    ]
  },
  "Sheep Ranch": {
    "architects": [
      { "name": "Ben Crenshaw" },
      { "name": "Bill Coore" }
    ]
  }
}

You can imagine that if I changed all read operations in my code base to return this new format then I’d have to update every single client to conform to this.

5. Cleanup

The finished product ticks the following boxes:

  • The application still works!
  • The application no longer writes to/reads from the old schema.
  • All feature flags are enabled.
  • The verification script returns zero discrepancies.

It’s time to deprecate the old schema, delete feature flags, and all references to the old schema. If you’ve managed to execute the previous steps perfectly then this step is stress-free. But, continue to be cautious. Automated tests could still all pass even after you drop the old schema. But this doesn’t necessarily mean nothing is broken—there could still be dangling references to the old schema that aren’t covered by automated tests.

This is also a good time to knock out any refactor cards that came out of the write/read steps, such as changing the data structure on reads.

Field Notes

These Field Notes cover more of the thought process behind each step: when to do “y”, why do “x”, etc., so you feel well equipped to tackle tough scenarios.

While my example was simplistic, this safe and incremental approach works for real applications. I’ve completed a project that involved replacing the way consent is stored. This was for an application in the pharmaceutical industry so risk was extremely high as consent impacted things like patient and doctor communications. Our team migrated the application to use a new schema and not a single customer, or even product owner, knew anything had changed.

Field Notes: Make non-destructive schema changes

There are many ways to structure your database schema. Work with your team to come up with the design that solves your problems today but is open to modifications later. My favorite way to do this brainstorming is on a whiteboard with your team, but that’s getting more and more rare unfortunately. My second favorite way is to use this tool.

Field Notes: Write to/read from the new schema

I’m going to use the phrases and variations of “write/write to” in this section for brevity, but know that you can swap in variations of “read/read from” as they are interchangeable.

In my experience this is the most time-consuming part of the process. I sometimes (often!) discover that the application is writing to the old schema from a lot of different parts of the code base and in different ways.

In the past I’ve created a spreadsheet that lists where and how an application writes to the old schema. This is helpful because there’s a big difference between 50 calls to an abstraction layer and 50 direct calls to the database. 50 calls to an abstraction layer means I need to modify only the abstraction (this is the jackpot but is rare). Without the abstraction layer I have to update those 50 instances that directly call the database. Plus, each of those calls could look slightly different if they were added at different times and by different developers. The direct calls to the database are dangerous and time-consuming to wrangle, so create the abstraction and get rid of them.

To reduce risk, I would introduce calls to the abstraction layer incrementally. I’ll replace a small handful of the 50 instances to begin with. Then I can check my work, make adjustments, and continue replacing the remaining instances.

This part of the process has the most ambiguity. It’s up to you and your team to determine how much to replace at a time. Two big factors in this decision are the complexity of the application and your risk comfort level. On a past project, my team considered it a significant win to write to our new schema from only one place out of dozens. It depends on your situation, so take your time with this step.

Field Notes: Populate and verify the new schema

Run the verification script to check your work. Expect discrepancies if:

  • There are areas of the code base that do not write to the new schema yet.
  • A feature flag is enabling writes to the new schema for a subset of users.

If your application does not fall into one of those two categories, then an area of the code base needs updated or something is broken. This is fine because at this stage the application is still reading from the old schema. The new write operations can be broken and our millions of users will be none the wiser. Running the verification script gives us direction for our next step. We can either move forward knowing everything works, or we need to find what went wrong with the writes.

Field Notes: Read specific

I called this out in the main body of the post, but you should maintain the data structure that clients are currently expecting. Problems will arise if the data structure a client relies on changes, and if you start down the path of updating all clients then you’ll soon have an inflated change set.

Tests are a great way to check yourself on this. I already have tests around the data structure returned to the controllers in my golf application. When I update the GolfCourseCreator to return data from the new schema my tests will tell me if the structure is correct.

Field Notes: Verifying reads are working

Verifying that you’ve updated all the read references can be a bit trickier than verifying the writes. When starting to make my updates, one trick I’ve done is I’ll completely drop the old schema or modify it in such a way that the app would break if it relied on the old schema at all (I only do this in my local environment!). This can help you identify a good chunk of the updates you’ll need to make. Also, sound test coverage can’t be stressed enough.

Another way to verify is to purposely get the old and new schema out of sync by blatantly changing a row in the old schema. Using my app as an example I would change the architect_1 value for “Manakiki Golf Course” from “Donald Ross” to “Donald Duck”. Then, I’d navigate to a page or call an API that I’ve already updated and I should still be getting back “Donald Ross”, meaning I’m reading from the new schema.

Closing thoughts

The best advice I can give is to do these steps in order and as incrementally as possible. To keep risk low and the application stable, it’s necessary to tackle small pieces of each step. An incremental approach also yields quicker feedback. This helps negate going too far down the wrong path and also gives you a progress indicator.

That’s all there is to it (sarcasm!). But seriously, don’t let your database schema hinder your application’s potential. Good luck!

Justin Schoen

Person An icon of a human figure Status
Double Agent
Hash An icon of a hash sign Code Name
Agent 00103
Location An icon of a map marker Location
Cleveland, OH