Downloading large Heroku Postgres backups

Publish Date: November 12, 2019
Authors: Justin Searls

Easily one of my favorite things about developing apps with Heroku is its bevy of console commands that make it easy to manage Postgres databases. Recently, however, as one of my apps crossed the threshold from a small “hobby” tier database to a larger “standard” database, the heroku CLI started warning me that several of my favorite commands could risk degrading the database’s performance and its availability to my production application.

[Related: I ignored these warnings and ran the commands anyway, only for Bugsnag to send me dozens of emails as my users began to experience database network timeouts and transaction failures. Would not recommend.]

Unfortunately, the documentation explaining why it’s more complicated to fetch backups for standard and premium databases doesn’t make it terribly clear how to do it anyway, so I wrote this post as a recap of Heroku’s backup facilities for fetching hobby tier databases before providing the quickest, most direct path to safely using those same tools once you’ve moved on to a larger database.

Backup tools available to small databases

To begin, let’s recap some of the fun and convenient commands that make development such a breeze for small databases, because the heroku CLI lets you do several really cool operations much more easily than the built-in Postgres commands do.

Pull it down

My favorite command for fetching a production database to reproduce something locally is easily pg:pull. With it (and its sibling, pg:push), you can fetch and restore the entire database with an easy-to-remember one-line command.

In most simple cases (and assuming you have your heroku git remote pointing to the right app), the command will look like this:

$ heroku pg:pull DATABASE_URL some_new_local_database

The above will download the full database and restore it to a new local postgres database named some_new_local_database. This is super handy, because if you’re using a framework like Rails, you can just update your local config/database.yml to point to the new database:

default: &default
  adapter: postgresql
  encoding: unicode
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>

development:
  <<: *default
  database: some_new_local_database

Then restart your server or console and start playing around with a pristine copy of whatever had just been in your production database.

I’ve only needed it once or twice when deploying major data migrations, but pg:push can be used much the same way in reverse. You’ll just want to make sure to turn on maintenance mode first and understand that pushing requires you to first delete any data in the remote database.

What this enables are simple workflows like the following (assuming your users can stomach the downtime):

# Disable access to your application
$ heroku maintenance:on

# Pull down the data
$ heroku pg:pull DATABASE_URL data_to_migrate

# Do whatever you need to run and validate your migration tasks locally
$ script/run_your_migration_or_otherwise_do_stuff

# Delete the remote data (which requires confirmation input)
$ heroku pg:reset DATABASE_URL

# Push up the migrated data
$ heroku pg:push data_to_migrate DATABASE_URL

# Re-enable access to the application
$ heroku maintenance:off

I don’t know about you, but for something as fraught as “delete and replace all of my production data”, it’s really nice to only need five easy-to-remember commands to pull it off.

Save a backup dump

If you want to save your production data to a file as opposed to a local postgres database, that’s a breeze, too. It requires two commands, pg:backups:capture and pg:backups:url (docs):

$ heroku pg:backups:capture

Which will output:

Starting backup of postgresql-something-1234... done

Use Ctrl-C at any time to stop monitoring progress; the backup will continue running.
Use heroku pg:backups:info to check progress.
Stop a running backup with heroku pg:backups:cancel.

Backing up DATABASE to b1799... done

Then with that b1799 identifier, you can fetch a URL and download the dump file like this:

$ heroku pg:backups:url b1799

This will output a very long and very secure-looking URL, which you can paste into a browser to download or otherwise pass to another command like curl or wget.

Schedule regular backups

Even though a hobby-basic Postgres database only costs $9 per month, it includes an automated backup scheduler, which you can enable with something like:

$ heroku pg:backups:schedule --at 12:00

You can verify your hobby database’s schedule with:

$ heroku pg:backups:schedules

=== Backup Schedules
DATABASE_URL: daily at 12:00 UTC

And see a list of backups (including with the identifiers you’d need to invoke heroku pg:backups:url, as shown above) with the top-level pg:backups command:

$ heroku pg:backups
=== Backups
ID     Created at                 Status                               Size      Database
─────  ─────────────────────────  ───────────────────────────────────  ────────  ────────
b1799  2019-11-11 02:51:55 +0000  Completed 2019-11-11 02:51:57 +0000  4.42MB    DATABASE
# ……

Not bad for nine bucks!

Downloading a backup of larger databases

Even though an entry-level standard-0 database will run you $50 per month, the heroku pg:backups:schedule command will warn you not to use it, and pg:backups:capture and pg:pull will very likely cause connection timeouts between your production application and the database, leading to availability issues for your users.

All this might seem like a downgrade, but it’s important to keep in mind that for the primary purpose of actually backing up and recovering critical data, the standard & premium tier databases provide physical backups that are far more reliable than the limited logical backup tools made available to the smaller hobby tier.

But we’re not here to use backups to protect our data, we’re here for quick and convenient ways to download our production data for our own nefarious purposes!

If you were to attempt to schedule a backup of a larger database using the method shown above, it’ll succeed, but the CLI will spit out this warning:

$ heroku pg:backups:schedule --at 00:00
 ▸    Continuous protection is already enabled for this database. Logical backups of
 ▸    large databases are likely to fail.
 ▸    See
 ▸    https://devcenter.heroku.com/articles/heroku-postgres-data-safety-and-continuous-protection#physical-backups-on-heroku-postgres.
Scheduling automatic daily backups of postgresql-something-1234... done

If you try to find an alternate method by reading the article at the URL above, you’ll likely be disappointed by the lack of one-off commands that are as simple and memorable as what you’re used to for smaller databases. But fear not, there’s an easy workaround!

Forking your primary database

The simplest path to “just give me a copy of production without crippling my app” is to create a fork of your primary database and then fetch it as you normally would, comforted by the knowledge that it’s completely separated from your production application and won’t break anything.

First, you’ll need to know what tier addon your production database is, because you’ll probably want your fork to be the same size. You can find this by running pg:info:

$ heroku pg:info DATABASE_URL
=== DATABASE_URL, HEROKU_POSTGRESQL_AQUA_URL
Plan:                  Standard 0
# ……

Because my primary database’s plan is “Standard 0”, that means we need to provision a heroku-postgresql:standard-0 addon for our fork, like this:

$ heroku addons:create heroku-postgresql:standard-0 --fork DATABASE_URL

This will do two things, first it will provision a new database from the most recent physical production backup (which might be up to 30 hours old), and then it will replay all of the intervening write transactions (stored as WAL files) to catch it up to the current state of production. This can take anywhere from a few minutes to a long time.

And if you don’t want to wait

But what if you don’t care about the most recent 30 hours of data and you just want your backup now? Well, lucky for us, Heroku has implemented a --fast flag for just this impulse.

$ heroku addons:create heroku-postgresql:standard-0 --fast --fork DATABASE_URL

Creating heroku-postgresql:standard-0 on ⬢ your-app-name-here... $50/month
 ▸    Release command executing: config vars set by this add-on will not be available
 ▸    until the command succeeds. Use `heroku releases:output` to view the log.
Fork will contain data from November 10, 2019  6:22 AM UTC (about 21 hours ago).
To create a fork with up-to-date data, exclude the `--fast` flag.
Database will become available after it completes forking.
Use `heroku pg:wait` to track status.
postgresql-something-4321 is being created in the background. The app will restart when complete...
Use heroku addons:info postgresql-something-4321 to check creation progress
Use heroku addons:docs heroku-postgresql to view documentation

Invoking the addons:info command output above will tell you the creation status of the fork as well as get its database URL alias, like so:

$ heroku addons:info postgresql-something-4321

=== postgresql-something-4321
Attachments:  your-app-name-here::HEROKU_POSTGRESQL_NEONGREEN
Installed at: Mon Nov 11 2019 13:07:56 GMT+0500 (Eastern Standard Time)
Owning app:   your-app-name-here
Plan:         heroku-postgresql:standard-0
Price:        $50/month
State:        creating

I ran the command a few times, and it took about 5 minutes for the state to switch from “creating” to “created”.

Downloading your forked data

Once you have the fork and its database URL, you can use all the same tricks outlined above, like pg:backups:capture:

$ heroku pg:backups:capture HEROKU_POSTGRESQL_NEONGREEN

 ▸    Continuous protection is already enabled for this database. Logical backups of
 ▸    large databases are likely to fail.
 ▸    See
 ▸    https://devcenter.heroku.com/articles/heroku-postgres-data-safety-and-continuous-protection#physical-backups-on-heroku-postgres.
Starting backup of postgresql-something-4321 done

Use Ctrl-C at any time to stop monitoring progress; the backup will continue running.
Use heroku pg:backups:info to check progress.
Stop a running backup with heroku pg:backups:cancel.

Backing up NEONGREEN to b279... done

Which, in spite of the same “standard” tier warning, should be fine since nothing is connected to and writing to the database. It can then be downloaded from the link generated by heroku pg:backups:url b279.

And you’ll find that running pg:pull also works, even though it’s likely to take longer now if you have a lot of data:

$ heroku pg:pull HEROKU_POSTGRESQL_NEONGREEN yay_production_data

Don’t forget to delete all your data!

In your excitement of fetching a backup of your production database, don’t forget to delete the newly-provisioned addon when you’re done! It’d be a shame to forget until you get next month’s bill and discover it to be $50 to $8,500 pricier than usual.

To make sure you delete the right database (a worthwhile precaution), run heroku addons to verify the name of the database that isn’t assigned DATABASE_URL alias, and then delete it with the addons:remove command (which, since it’s a destructive operation, will require confirmation input):

$ heroku addons:remove postgresql-something-4321

We did it!

And that’s it! I hope this guide showed you a thing or two about leveraging Heroku’s robust backup tools to more easily fetch and work with your data. But be warned: once you grow accustomed to this level of convenience, it can be tough to go back to other services that lack the same features.

By the way, a big thanks to Heroku engineer and friend of Test Double Mike Busch for patiently answering all my questions about how to work with backups for standard-tier database plans.

Justin Searls

Status

Double Agent

Code Name

Agent 002

Location

Orlando, FL

Twitter Mastodon Github LinkedIn Website

Tips for hand-rolling your own deployment pipeline

Cloud Providers like AWS provide tools to allow you to build your own deployment pipeline. In the first post of a series on NoOps philosophies, we investigate some of the unknowns in rolling your own deployment services with AWS.

Publish Date: August 20, 2019
Authors: Micah Adams
Categories: DevOps

Are you really ready for serverless?

With all the hype of serverless, it can be difficult to differentiate between real value and marketing fluff. This post seeks to examine the purported benefits of serverless presented in the "Serverless Manifesto" to help you understand if serverless is the right tool for you needs.

Publish Date: August 6, 2019
Authors: Micah Adams
Categories: DevOps

The Selfish Programmer

Publish Date: May 8, 2019
Authors: Justin Searls
Categories: Development

Is your job okay? It could be better: autonomy to solve problems the right way, dedicated time to grow your skills, and empathetic colleagues to partner on your best work.

We're 100% remote and employee-owned.

Join us