Secure external backup with ZFS native encryption

Secure external backup with ZFS native encryption

In this article, I will demonstrate how I streamline the process of ZFS backups. This setup allows me to simply connect my external drive, run backupToExternalDrive, and then disconnect the drive. This walkthrough covers everything from target pool creation to the restoration of data.

This guide primarily focuses on utilizing NixOS; however, its principles can be adapted to other Linux distributions. In such cases, setting up services will require manual intervention, involving the creation of files and direct use of systemctl to enable them.

0. Prerequisites

Before delving into the guide, an encrypted rpool/nixos dataset and a sanoid service should be set up.

When installing NixOS on ZFS, it is possible to specify whether on not to encrypt the root pool (step Create root system container of the OpenZFS official guide). I personally recommend setting up an encrypted rpool container (namely rpool/nixos), so that it becomes harder for a malevolent party with physical access to your workstation to get access to your data. When employing an encrypted dataset in combination with zfs send --raw, the dataset will automatically undergo encryption on the target device, ensuring secure backup.


Caution: at the moment, there are known problems with sending/receiving raw encrypted datasets: occasionally, data gets corrupted; therefore, do not use this strategy for production, but only on your local workstation at your own discretion.

I recommend periodically verifying the integrity of data to mitigate corruption risks. Additionally, establishing a secondary backup strategy is advisable. In my case it is restic. An alternative strategy might be ZFS on LUKS on another drive, and then using zfs send without the --raw flag.


0.2. Configuring Sanoid

Set up automatic Sanoid snapshots. Make sure your config only applies for the right device (i.e. the workstation(s) to back up, and not the receiving server). For example, my /hosts/the-world/configuration.nix looks as follows:

services.sanoid = {
  enable = true;
  extraArgs = [
    "--verbose"
    # "--readonly" # enable only for testing the configuration
  ];
  interval = "hourly";
  datasets."rpool/nixos/home" = {
    recursive = false; # does not matter much in my case, as there are no dataset children
    autosnap = true;
    autoprune = true;
    daily = 7;
    weekly = 4;
    monthly = 2;
    yearly = 0;
  };
};

Do not add "--readonly" as extra argument (which is already commented out), as that will just simulate snapshot creation and deletion but will not actually create and delete them. This argument does not have anything to do with the source or destination dataset being mounted as read-only (which is something we are going to enable later in this guide)! "--simulate" or ""--dry-run would have arguably been better name choices.

In case of multiple devices to backup with the same config, it is enough to move it to /modules/workstation/default.nix.

Due to using an external drive, we are not going to set up a syncoid service; instead, we will use the syncoid executable directly. Declaratively defining a sanoid service (such as the config above) does not automatically add the sanoid and syncoid binaries to the PATH, therefore let us globally install the package providing the syncoid binary:

environment.systemPackages = {
    inherit (pkgs)
      # other packages ...
      sanoid
    ;
};

Now, let us set up the external drive we want to use as backup device.

0.3. Excluding files and folders

Before setting up sanoid and syncoid, you may want to ensure that some folders and files of big size (such as movies, downloads, programming projects, databases, …), are excluded. Since it is not possible to exclude a folder within a dataset, there are a couple options:

  • Turning these folders into ZFS datasets
    • e.g. rpool/nixos/home/videos mounted at /home/manuel/Videos
    • Can get complex pretty quickly.
  • Creating a folder /unsynced (or a dataset mounted at /unsynced in case rpool/nixos is backed up too and not only rpool/nixos/home)
    • Create folders such as /unsynced/manuel/Workspace, /unsynced/manuel/Videos and /unsynced/alice/Videos
    • Export environment variables to some location here, so that huge folders such as ~/.cache or ~/.gradle can be excluded from the backup. This can be achieved this way:
      environment.sessionVariables = let unsyncedPath = "/unsynced/$USER"; in { 
        UNSYNCED=unsyncedPath;
        XDG_DOWNLOAD_DIR="${unsyncedPath}/Downloads";
        XDG_CACHE_HOME="${unsyncedPath}/.cache";
        GRADLE_USER_HOME="${unsyncedPath}/.gradle";
        PUB_CACHE="${unsyncedPath}/.pub-cache";
        ANALYZER_STATE_LOCATION_OVERRIDE="${unsyncedPath}/.dartServer";
      };
      
    • tmpfiles.d can be used for these folders:
      systemd.tmpfiles.rules = [
        "d /unsynced 0755 root root -"
        "d /unsynced/bob 0700 bob users -"
        "d /unsynced/alice 0700 alice users -"
      ];
      
    • The solution I employ.

1. Becoming root

If doas is enabled:

doas -s

Else with sudo:

sudo -s

2. Setting up environment

nix shell nixpkgs#parted

List all disks

find /dev/disk/by-id/

Declare the one that represents the external drive

DISK='/dev/disk/by-id/EXT_DISK_ID'

3. Formatting drive

The following function will take care of formatting the disk and creating a backup partition (it still is). NB: Feel free to replace backup with bakpool or whatever name you prefer throughout the guide.

partition_disk () {
  local disk="${1}"
  blkdiscard -f "${disk}" || true

  parted --script --align=optimal  "${disk}" -- \
  mklabel gpt \
  mkpart backup 0% 100%

  partprobe "${disk}"
  udevadm settle
}

partition_disk ${DISK}

NB: blkdiscard makes sure previously created ZFS pools and datasets are correctly erased.

If the external device never had a ZFS pool before, you can ignore possible warnings such as BLKDISCARD ioctl failed: Operation not supported. In case ZFS pools were already presents before and the warning is shown (i.e. blkdiscard fails to remove those pools and datasets) and if the following steps fail (namely zpool creation and dataset replication), you will have to investigate and find a way to fix it (maybe try running wipefs -a /dev/disk/by-id/EXT_DISK_ID).

4. Zpool creation

Let us create a pool named backup:

# shellcheck disable=SC2046
zpool create \
    -o ashift=12 \
    -o autotrim=on \
    -O acltype=posixacl \
    -O compression=zstd \
    -O dnodesize=auto \
    -O normalization=formD \
    -O relatime=on \
    -O xattr=sa \
    -O readonly=on \
    -O canmount=on \
    -O mountpoint=/mnt/backup_pool \
    backup \
  "${DISK}-part1";

At this point, the backup zpool should be imported automatically (check with zfs list). If it is not, run: doas zpool import backup.

Before proceeding with the next step, we can exit the nix environment and root:

exit # nix shell

then again

exit # root

Now, your terminal session should be logged out of root and back in to your primary account.

5. Dataset replication

Do not create an encrypted system container like the following one:

# DO NOT RUN THIS! Syncoid will replicate the dataset instead!
doas zfs create \
  -o canmount=on \
        -o mountpoint=/mnt/backup_pool/eds \
        -o encryption=on \
        -o keylocation=prompt \
        -o keyformat=passphrase \
backup/eds

NB: Note: If, inadvertently, you have already manually created the dataset, please ensure to first destroy it (doas zfs destroy backup/eds) before proceeding to execute the syncoid snippet provided below.


The essence of Sanoid/Syncoid is replication. Therefore, make sure there is no backup/eds dataset and run:

doas syncoid --sendoptions="w" --compress=none --delete-target-snapshots --no-sync-snap --no-rollback rpool/nixos/home backup/eds

NB: In the official ZFS installation guide, when rpool/nixos is encrypted, it can be inferred from the code that the encryption extends to rpool/nixos/home, and vice versa. It is actually possible to test this manually with zfs get encryption rpool/nixos and zfs get encryption rpool/nixos/home.

Let’s understand the arguments of the command above:

  • In case our dataset is indeed encrypted, it makes most sense to run syncoid with --sendoptions="w", which is the same as zfs send -w, which is the short option for zfs send --raw: this ensures data is sent exactly as it exists on disk (i.e. encrypted and compressed), allowing to send backups to untrusted machines. Since compression is already handled with at the dataset level, --compress=none should be used.
  • With --delete-target-snapshots, the snapshots that are pruned on the source dataset will be pruned on the target dataset as well.
  • --no-sync-snap prevents syncoid from automatically generating a snapshot (which, by default, will get replaced by another one the next time syncoid is run). By using --no-sync-snap, only sanoid snapshots are employed.
    For example, in the following list of snapshots, the one in the middle would have not been present if --no-sync-snap were passed:
    rpool/nixos/home@autosnap_2023-11-02_15:00:03_hourly             Thu Nov  2 16:00 2023
    rpool/nixos/home@autosnap_2023-11-02_16:00:04_hourly             Thu Nov  2 17:00 2023
    rpool/nixos/home@syncoid_the-world_2023-11-02:17:31:28-GMT01:00  Thu Nov  2 17:31 2023
    rpool/nixos/home@autosnap_2023-11-02_17:00:04_hourly             Thu Nov  2 18:00 2023
    rpool/nixos/home@autosnap_2023-11-02_18:00:04_hourly             Thu Nov  2 19:00 2023
    
  • --no-rollback instructs syncoid to refrain from aligning the target dataset with the source dataset in order to facilitate replication. Therefore, if the destination dataset contains newer data than the source (i.e., data has been written to it), this option causes the replication process to fail, intentionally producing the desired outcome. When this option is enabled, you need to set the destination dataset with readonly=on or at least with atime=off. Otherwise, a simple ls or cat on the target dataset will update the directory access time, modifying the destination filesystem. The next subsection shows how to turn the dataset into a read-only one.

The process might take multiple minutes the first time. In the end, you will have a replicated dataset.

Let’s test if the replicated dataset is encrypted with zfs get encryption backup/eds.

  • Unencrypted example (run without --sendoptions="w")

    ❯ zfs get encryption backup/eds
    NAME        PROPERTY    VALUE        SOURCE
    backup/eds  encryption  off          default
    
  • Encrypted example (run with --sendoptions="w")

    ❯ zfs get encryption backup/eds
    NAME        PROPERTY    VALUE        SOURCE
    backup/eds  encryption  aes-256-gcm  -
    

5.1. Read-only target dataset

Look at the output of

doas zfs get readonly backup/eds

If off is displayed, make the dataset read-only with:

doas zfs set readonly=on backup/eds

readonly=on ensures the dataset cannot be modified when mounted. With readonly=off it would be possible to change data of the target dataset when mounted, which would result in an error when trying to run syncoid.

5.2. Mounting attributes for target dataset

syncoid does not create the dataset with the best properties for our use case. Let’s add:

doas zfs set snapdir=visible backup/eds
doas zfs set mountpoint=/mnt/backup_pool/eds backup/eds
doas zfs set canmount=on backup/eds

Here is what snapdir=visible doas: it makes /path/to/mounted/dataset/.zfs visible. By default (snapdir=hidden), the file exists but it is invisible (not even shell autocomplete can detect it).

6. Exporting pool

At this point the pool needs to be exported. Run:

doas zpool export backup

It is important that it gets exported correctly. In case of there is some issue (e.g. pool busy), make sure no process is actually using the pool and try again (closing file manager and terminal session with working directory starting with /mnt/backup_pool is necessary).

The disk can be safely removed with:

udisksctl power-off -b /dev/disk/by-id/EXT_DISK_ID

7. Shell alias

My external drive is neither is neither always connected to my workstation, nor part of a remote backup solution. Instead, it is a removable hard drive that i plug in and out. Therefore, it makes sense to simply use an alias to automate this process:

programs.fish = {
  shellAliases = {
    backupToExternalDrive = "doas zpool import backup && doas syncoid --sendoptions="w" --compress=none --delete-target-snapshots --no-sync-snap --no-rollback rpool/nixos/home backup/eds && doas zpool export backup && udisksctl power-off -b /dev/disk/by-id/usb-Samsung_PSSD_T7_S6XNNS0TB03264T-0:0";
  };
};

Now, all I need to do is:

  • connect the external drive
  • run backupToExternalDrive in a terminal
  • when backupToExternalDrive is finished, the device will be already safely powered off. Thus, the external drive can be immediately disconnected.

If you want to achieve something similar, make sure you replace my disk ID with yours.

Theoretically, it should also be possible to create a systemd service that is triggered when the external device is connected.

8. Periodically ensure data integrity

Make sure the pool is imported:

doas zpool import backup

Regularly scrub the pool:

zpool scrub backup

The amount of repaired data and the number of errors will be displayed in zpool status.

❯ zpool status -v backup
  pool: backup
 state: ONLINE
  scan: scrub repaired 0B in 00:04:42 with 0 errors on Sat Nov  4 11:07:07 2023
config:

	NAME                                             STATE     READ WRITE CKSUM
	backup                                           ONLINE       0     0     0
	  usb-Samsung_PSSD_T7_S6XNNS0TB03264T-0:0-part1  ONLINE       0     0     0

errors: No known data errors

Finally, unmount when finished.

doas zpool export backup

9. Restoring

Snapshots are read-only copies of a filesystem taken at a moment in time. They can be restored by accessing the .zfs folder of the mounted dataset.

doas zpool import backup
doas zfs load-key backup/eds
doas zfs mount backup
doas zfs mount backup/eds

Then navigate to /mnt/backup_pool/eds/.zfs. In this folder you will find a folder called snapshot: choose here the right snapshot to browse and restore what is needed.

When you are finished, close all terminal sessions and file manager windows with the working directory being a child folder of /mnt/backup_pool, then run:

doas zfs unmount backup/eds
doas zfs unmount backup
doas zpool export backup

After successfully restoring, you will have to destroy the backup/eds dataset and create a new one with syncoid.

10. Other commands

It is good to know a few more basic commands.

10.1. Snapshot differences

ZFS snapshot differences can be listed using the zfs diff command. The dataset needs to be mounted. Example:

zfs diff rpool/nixos/home@autosnap_2023-11-01_12:00:08_hourly rpool/nixos/home@autosnap_2023-11-01_13:00:07_hourly

10.2. List snapshots

zfs list -r -t snapshot -o name,creation rpool/nixos/home

10.3. Create and delete a snapshot manually

zfs snapshot rpool/nixos/home@snapshotName
zfs destroy rpool/nixos/home@snapshotName

10.4. Permissions delegation

doas zfs allow -u manuel create,destroy,mount,snapshot rpool/nixos/home