Tuesday, August 16, 2016

Encrypting application-level data at rest

In a previous post I described how I keep sensitive config data such as API keys secret for a public app.

What about data that is kept per-user, such as a password or API key for each user or other similar record in your databases?

For that I use the attr_encrypted gem. There is a great post on how to do this; if you're using Figaro as recommended in my earlier post, that gives you an easy place to store the symmetric-encryption key(s) used by attr_encrypted, so instead of

attr_encrypted :user, :key => ENV["USERKEY"]

you would instead say

attr_encrypted :user, :key => Figaro.env.USERKEY

Beware of attr_encrypted and serialize

Be aware that if you are using serialize to store (say) a hash or array in an ActiveRecord text field, there appear to be some weird interactions if you try to use attr_encrypted on that field. What I have found to work is this:

Before:

class User < ActiveRecord::Base
  serialize :api_keys, Hash
end

User.new.api_keys  # =>  {}

After:

class User < ActiveRecord::Base
  attr_encrypted :api_keys, :key => Figaro.env.SECRET, \
     :marshal => true
end

User.new.api_keys  # => nil


That last option instructs attr_encrypted to marshal the resulting data structure before encrypting, and unmarshal after reading from the database and decrypting. However, whereas using serialize gives newly-created attributes a default value of a new instance of the serialized type (in this case, that would be the empty hash), this is not true with attr_encrypted. To remedy this, if your app relies on the serialized value always being non-nil, I'd advise using an after_initialize block to enforce the invariant that the attribute's default value is always an instance of the serialized class:

class User < ActiveRecord::Base
  attr_encrypted :api_keys, :key => Figaro.env.SECRET, \
    :marshal => true
  def ensure_is_hash ; self.api_keys ||= {} ; end
  after_initialize :ensure_is_hash
  private          :ensure_is_hash
  # ...
end

User.new.api_keys  # =>  {}


Et voila, with the exception of the after_initialize callback, your encrypted-at-rest attributes will now behave the same way as regular unencrypted attributes.


Keeping secrets

Lately I've been involved with a bunch of apps that have to manage sensitive or semi-sensitive data, such as passwords, API keys, and so on.  I've converged on a couple of ways of managing this data that I thought I'd share. Most of my apps are hosted on Heroku, but the advice here applies to other deployment platforms too.

I'll distinguish two kinds of data: configuration-level data that is specified once for the whole app (for example, an API key for the app to access another microservice), and sensitive app-level data (for example, a password or other secret that needs to be stored for each user).  In this post I describe how to store configuration-level secret data; a separate post builds on this one to store app-level secret data.

You'll need the Figaro gem and a command-line-friendly installation of your favorite encryption package; I use GPG.

Use Figaro

For managing sensitive config data such as API keys, here's a methodology that observes two important guidelines:
  1. DRY: the secret data should be kept all in one place and nowhere else.
  2. Sensitive data should never be checked into version control (eg GitHub), especially if the app is otherwise open source.
First, set up your app to use the Heroku-friendly Figaro gem to manage the app's secrets. In brief, Figaro uses the well-known technique of accessing  your app's sensitive config data as environment variables, but:
  • It centralizes all secrets in a file config/application.yml
  • it lets you access them through a proxy object, so that environment variable FooBar can be accessed as Figaro.env.FooBar. This allows you to stub/override certain config variables for testing if you want, and also (more importantly) to specify different values of those variables for production environment vs. development. For example, many microservices like Stripe let you setup two different API keys—a regular key, and a "testing" key that behaves the same as a regular key but no financial transactions are actually performed. Using Figaro, your app doesn't have to know which key it uses, because the correct key values for each environment can be supplied in application.yml
When you setup Figaro, it correctly adds config/application.yml to your .gitignore, since this file containing secrets should not be versioned (at least not in cleartext).

Encrypt  & version your secrets file

Next, agree with the rest of your team on a symmetric key for encrypting the secrets file. You can then encrypt the file like so (this example uses GPG and the bash shell):

export KEY=your-secret-key-value
gpg --passphrase "$KEY" --encrypt --symmetric --armor \
   --output config/application.yml.asc  config/application.yml

This will create the ASCII-armored encrypted file config/application.yml.asc, which I then check into version control.  Note that the security of this file relies on having chosen a good symmetric-encryption key.


Make sure developers can generate the secrets file

Of course, the config/application.yml file is now needed for your app to run, but only config/application.yml.asc exists in the repo. So any developer who clones the repo needs to know the value of $KEY, and when they clone the repo, they must manually create the secrets file from its encrypted version by performing the decrypt operation:

export KEY=your-secret-key-value
gpg --passphrase "$KEY" --decrypt \
   --output config/application.yml  config/application.yml.asc

Make sure Heroku knows the secrets

Figaro arranges to make all the info in config/application.yml available as environment ("configuration") variables on Heroku. Whoever has deploy access to the Heroku app can do this step:

figaro heroku:set -e production -a name-of-heroku-app


Make sure CI knows the secrets

Finally, if you're using continuous integration (you are, right?) it probably also needs to be able to generate config/application.yml in order to run the tests. On Travis, I add a step to the "before-script" to do this:

before_script:
  - gpg --passphrase "$KEY" --output config/application.yml \
       --decrypt config/application.yml.asc

Of course, this requires Travis to know the value of $KEY, so you also need to go to your app's config variables in Travis and set the value for KEY manually. These steps are easily adapted for Semaphore or other CI environments—in general, you manually supply the CI environment with the symmetric key value, and you add a before-step to the build to generate the unencrypted secrets file from the encrypted one.

When you change the contents of config/application.yml

If you add or modify secrets within application.yml, you'll need to re-create and commit config/application.yml.asc, and notify your developers that they must merge the new file and manually re-create config/application.yml from it.  You'll also need to re-run figaro heroku:set -e production to populate the deploy with the new values.

That's it. This seems complex, but after the one-time setup, it's basically maintenance-free. In the next post I'll talk about encrypting data at rest per-user.