My name is Philipp C. Heckel and I write about nerdy things.
This site moved here recently from blog.philippheckel.com!

How-To: Using ZFS Encryption at Rest in OpenZFS (ZFS on Linux, ZFS on FreeBSD, …)


Administration, Linux

How-To: Using ZFS Encryption at Rest in OpenZFS (ZFS on Linux, ZFS on FreeBSD, …)


An upcoming feature of OpenZFS (and ZFS on Linux, ZFS on FreeBSD, …) is At-Rest Encryption, a feature that allows you to securely encrypt your ZFS file systems and volumes without having to provide an extra layer of devmappers and such. To give you a brief overview of what the feature can do, I thought I’d write a short post about it.

The current ZFS encryption implementation is not (yet) merged into the upstream repository (as of January 2017). There is a pretty big pull request which is still being reviewed, but because the feature is so incredibly cool (and because my colleague Tom Caputi at Datto developed it), I thought a sneak preview is absolutely necessary.


Content


0. This post

This post demonstrates a feature that has not yet been released. For the demos I will focus on ZFS on Linux on an Ubuntu 16.04 based machine. The code I run is available in the official repos (links below) or in my forks (SPL & ZFS, use branch “blogpost” for both).

1. Introduction

At-rest encryption is a new feature in ZFS (zpool set feature@encryption=enabled <pool>) that will automatically encrypt almost all data written to disk using modern authenticated ciphers (AEAD) such as AES-CCM and AES-GCM.

The CLI makes it incredibly easy to enable encryption on a per dataset/volume basis (zfs create -o encryption=on <dataset>). The keys used for encryption can be inherited or are manually set for a dataset. Keys can be loaded from different sources (prompt or file) and various input formats are available (raw, hex or passphrase). Keys and key sources can be changed after the dataset/volume creation, and without re-encrypting the data (as they are never used directly).

The encryption parameters and key status of a dataset/volume are represented in various properties (encryption=<on|aes-128-gcm|...>, keysource=<raw|hex|passphrase>,<prompt|file>, keystatus=<none|available|unavailable>, pbkdf2iters=<n>).

Many normal ZFS commands are available even if the key of a dataset is not loaded, meaning that administrators can manage the pool without having to know the keys. For instance: a pool can be scrubbed (zpool scrub <pool>) without the keys, and datasets and snapshots can be listed (zfs list -rt). In future releases, zfs send and zfs recv will also work even if the key is not available.

Having built-in support for encryption at the file system level is huge. It means that you no longer have to use dm-crypt if you want to encrypt your data on disk, and you can still manage your pools even if keys are not loaded. Many thanks to Datto and Tom Caputi for bringing us this incredible feature.

1.1. What’s encrypted

All important pieces are encrypted (actual data and metadata, ACLs, permissions, directory listings, …), while some things are unencrypted to allow managing pools more easily.

Here’s a listing of what’s encrypted and what’s not:

Encrypted Not Encrypted
  • File data and metadata
  • ACLs, names, permissions, attrs
  • Directory listings
  • All Zvol data
  • FUID Mappings
  • Master encryption keys
  • All of the above in the L2ARC
  • All of the above in the ZIL
  • Dataset / snapshot names
  • Dataset properties
  • Pool layout
  • ZFS Structure
  • Dedup tables
  • Everything in RAM

1.2. Crypto Details

Note: This section describes the nitty gritty crypto details. You can safely skip it if you just want to use the feature.

Crypto concepts are always a bit hard to explain without confusing everyone. Tom has done an excellent job explaining the ZFS encryption crypto concept in his talk and it is visualized very nicely in his slides (PDF; or on Google Drive: original, mirror).

If you don’t have the time to watch the entire talk, let me try to summarize the concepts one of his slides:

Normal / non-dedup case: Before the plaintext block data (or metadata) is written, it is encrypted using AES (in CCM or GCM mode, depending on the -o encryption=.. property) with a 128/192/256-bit encryption key (default is AES-CCM-256). The 96-bit initialization vector (IV) used for CCM/GCM is randomly generated using the standard linux PRNG, and it is never reused. The encryption key itself is derived from the encrypted master key (see below) using the key derivation function HKDF. The 64-bit salt used for HKDF is randomly generated (using the above mentioned PRNG) and stored with the encryption key in a volatile salt cache. The encryption key is reused (for performance reasons) until it goes stale.

Dedup case: If deduplication is enabled, the algorithm behaves slightly differently, because it has to produce the same ciphertext for the same plaintext (given the same master key). To achieve that, the salt and the IV are not randomly generated, but instead a 160-bit HMAC of the plaintext is used: the first 64 bits are used as the salt, the remaining 128 bits are used as IV. The 256-bit HMAC key is randomly generated (using above mentioned PRNG), and stored alongside the master key.

Master key: The master key is randomly generated (using above mentioned PRNG) and it is never exposed to the user directly. Instead, the master key is encrypted (with the same cipher and mode with a 256-bit key) using a user provided wrapping key. This wrapping key is provided via a file (as hex or raw, see -o keysource=.. property) or via a password prompt. If a passphrase is supplied by the user, the wrapping key is derived using the password-based key derivation function PBKDF2 (using 100k iterations by default, or whatever you specify in the property -o pbkdf2iters=..).

If you want to know more, I highly suggest watching Tom Caputi’s ZFS encryption talk, or reviewing the slides (PDF; or on Google Drive: original, mirror).

2. Using ZFS encryption

Using the encryption feature is pretty simple. All the relevant commands and properties are described in great detail in the ZFS man page (man zfs), but here’s an excerpt of what you need to know.

2.1. Enabling the feature on the pool

Assuming you have installed a version of ZFS with encryption installed (if not, follow the steps in section compile and install at your own risk), you need to turn it on for your pool. I’ll create a test pool called testpool for this post:

2.2. Creating an encrypted dataset

Once you’ve enabled the feature on the pool, you can create encrypted datasets and volumes. To do that, you need to pass the two properties -o encryption=.. -o keysource=.. to the zfs create command. Depending on your preferences, you may also pass -o pbkdf2iters=..:

The -o encryption=.. property controls the ciphersuite (cipher, key length and mode). The default is aes-256-ccm, which is used if you specify -o encryption=on.

The -o keysource=.. property controls what format the encryption key will be provided as and where it should be loaded from. The key can be formatted as raw bytes, as hex representation or as a user password. It can be provided via a user prompt which will pop up when you first create it, or when you mount the dataset (zfs mount) or load the key manually (zfs key -l). Unless you want to automate things, -o keysource=passphrase,prompt seems like a good option.

The -o pbkdf2iters=.. property is only used if a passphrase is used (-o keysource=passphrase,..). It controls the iterations of PBKDF2. Higher is better as it slows down potential dictionary attacks on the password. The default is -o pbkdf2iters=100000.

Here are a few examples of how to create encrypted datasets and volumes (ZVOLs):

Creating an encrypted dataset, using the defaults:

Creating an encrypted child dataset, which inherits all parameters and keys from its parent:

Creating an encrypted dataset, using AES/GCM with 128-bit key loaded from a file (encoded as hex):

Creating an encrypted dataset, using AES/GCM with a 256-bit key loaded from a file (not encoded):

Creating an encrypted ZFS volume (ZVOL), using the defaults with one million PBKDF2 rounds:

2.3. Reading the encryption properties

Once you’ve created a dataset or volume, you can query its encryption properties like you normally would:

2.4. Importing a pool, mounting datasets and loading keys

If a dataset is encrypted, the read-only property keystatus represents the status of the key, and thereby also whether the dataset can be used (mounted, written to, read from …). It can be either off (unencrypted dataset), available (the key is loaded) or unavailable (the key is not loaded).

When a pool is imported using zpool import, encrypted datasets are left unmounted, because their keys are not automatically loaded. Only if the -l option is passed will encrypted datasets be loaded (if they can):

Instead of using zpool import -l ..., you can manually load the keys for individual datasets and volumes using zfs key -l:

If zfs mount is called on an encrypted dataset with unavailable key, it will prompt you:

Unloading a key of a mounted dataset won’t work, because it’s still in use. The dataset has to be unmounted first:

That’s essentially all the magic. If you want to know more, I suggest reading the ZFS man page (man zfs).

3. Compile and install

If you want to try the current implementation (before it is released), here are a few steps to compile and install it yourself on an Ubuntu 16.04-based system. Other systems will be very similar, but not identical. You can consult the Building ZFS wiki page for details.

Warning: Please be sure to only perform these steps on a test machine or a throw-away VM, because this will replace the ZFS kernel modules.

First, install all the build dependencies:

Once that’s done, compile and install SPL using the steps below. All relevant ZFS encryption pulls have been merged (as of January 2017), so this should “just work”. If it doesn’t (e.g. because the code has changed; you may be reading this in the future …), you may want to use my forked version instead (see SPL and ZFS, use branch “blogpost” for both):

Next, compile and install ZFS using Tom’s fork. If the ZFS encryption pull request has been merged, you may just want to use the upstream master branch:

Now the ZFS modules should be built in /lib/modules/$(uname -r)/extra, so all you have to do is load them. Be sure to remove the old modules first:

If that succeeded, be sure to yell “hooray” before you move on, because these steps took me a while to get right when I did it for the first time.

You can now use the feature (as described above).

4. FAQ

4.1. Is deduplication supported?

Yes, it is supported. However, there is some information that is leaked due to the nature of deduplication. See more in Tom’s talk.

4.2. Can I change the password? Will data be re-encrypted?

Yes, the password can be changed. The data does not get re-encrypted, because the password is merely used to decrypt a master key.

4.3. Does it work with TPM?

No, not yet.

12 Comments

  1. pp

    1. Thanks for the guide – however you forgot a ‘make’ in the installation instructions for ZFS, right before ‘make install’
    2. Have you already tried out booting from an encryption-enabled pool with grub? Not sure if I’m doing anything wrong, but grub refuses to detect ZFS as my root filesystem :(


  2. Eddy

    Woaaa, thanks for this write-up.

    Can you tell something about PAM integration? It would be so nice to automatically decrypt/encrypt /home/username at login/logout.



  3. Eric

    Great post. Will the encryption features be available on existing pools after the next zfs release? Being able to do a zpool upgrade and then turn this feature on for an existing dataset or zvol would obviously be great.



  4. Svetlin Tonchev

    Hello,

    Great post! Thank you a lot. I am really curious to have my fingers on that, however i completed the compilation guide and the installation without problems (errors etc). All modules were loaded successfully but when i try to create new volume i get “invalid property ‘keysource'” error. Its like i am running zfs without encryption support. Any ideas?


  5. Felix

    Hi Philipp,

    Great post, do you know how far the upstream process is? – I find it hard to figure out myself

    thx


  6. Philipp C. Heckel

    Tom tells me that it is approved and scheduled to be merged in after the next release of ZFS, so it’ll be released with the release after next.


  7. Felix

    Svetlin: try this

    zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase testpool/enc1

    (you must have compiled Tom’s latest version and not Phillip’s)


  8. Injo

    Above you mentioned encryption should be merged in the next release or the one after, which I believe we are at. I’m running Arch Linux with the latest kernel and ZFS package, zfs-linux 0.7.0_4.12.3_1-2 but I don’t think encryption is included yet?

    Do you have any idea when we might see this implemented?

    I also wanted to ask your opinion on something if you don’t mind. I am looking into syncing backups with a friend over the internet with zfs send/receive. We don’t want to be able to see each others’ data. In your opinion, what would be the best way with zfs send/receive? I was thinking dm-crypt or native with zfs encryption? I’m leaning towards native because it would be the least hassle. Or maybe you have a better idea? Thanks!!


  9. Philipp C. Heckel

    A little birdy told me that it will be merged within the next few days. Officially it’ll be released for 0.7.1 but once it’s merged I’d say you can start using it.

    For your use case, I’d say definitely go with native zfs encryption. Everything else is a hassle. dmcrypt does work, but you’ll have to nest filesystems and loop devices, which is annoying…


  10. Felix

    I see 0.7.1 has been released, but no encryption yet? – any ETA. for final release

    Thanks


Leave a comment

I'd very much like to hear what you think of this post. Feel free to leave a comment. I usually respond within a day or two, sometimes even faster. I will not share or publish your e-mail address anywhere.