Encrypted offsite backup system: syncing š²
Or how to encrypt a rsync backup
April 26, 2024
Or how to encrypt a rsync backup
April 26, 2024
In the previous post I decided to go with a Hetzner Storage Box for my backups.
It supports a number of file transfer protocols as well as first-class support for backup protocols like BorgBackup and Restic, and of course, the venerated rsync.
I ended up settling for rsync, because itās a lower level option than BorgBackup and Restic, that gives me a ton of freedom do design my backup system the way I want.
rsync is also incredibly simple to use and understand, and at the end of the day it just syncs files from one place to another. Thereās nothing specific to rsync in the layout of my backups, so I donāt actually need rsync for the backups to be usable. Thatās a massive advantage.
It comes to the cost of having to take care of everything else myself, in particular encryption, as well as incremental backups (which I chose to not implement, although itās possible).
I also tried BorgBackup, Restic, Kopia, Duplicaciy, Duplicity.
Having chosen Hetzner as a backend, Kopia, Duplicacy and Duplicity didnāt have native support so they were reduced to syncing over SFTP which put them at a disadvantaged for speed compared to the other options that had native support on Hetzner.
On top of that hereās a few notes of what turned me off for each of those:
The actual syncing part is super easy. Iām just going with a basic:
rsync "$SOURCE" "$DESTINATION" --archive --delete
Iām also adding --no-specials
and --no-devices
if Iām backing up a
directory that could have some of those special handles.
I add --exclude-from exclude-file
to ignore a bunch of patterns that
donāt need to be backed up.
And finally, Iām customizing the output with --itemize-changes
and
--info=progress2
.
Thatās where things get spicy, because rsync doesnāt do encryption itself.
I found a blog post about encrypted offsite backups with rsync which is exactly what I was trying to do. It uses EncFS as the encryption layer.
I ended up using gocryptfs on my side, mainly because itās still actively maintained.
gocryptfs allows you to have an encrypted directory on disk, and mount the decrypted version to use it. But they also have a āreverseā mode, where you can mount a directory into its encrypted representation. Thatās what I need. (I just want the encryption for syncing to my remote storage, the data is already encrypted on disk at a lower level otherwise.)
With gocryptfs, that looks like:
gocryptfs -reverse -init /path/to/directory
gocryptfs -reverse /path/to/directory /path/to/mount
From there, I can apply my rsync command to sync the encrypted
/path/to/mount
with my Hetzner server!
Not that complicated after all.
Wellā¦ except if youāre running macOS. This rabbit hole is deep enough that it deserves its own blog post. š
Now weāre syncing an encrypted directory, the output of rsync only shows the encrypted paths. Thatās OK, but I donāt like it. I wish I saw the actual files it was transferring, so that if one of them takes a long time, I can instantly identify if itās a file that should or not be included in the backup anyway. Maybe just add it to my ignore list.
Luckily, gocryptfs provides an API to translated encrypted paths to their plaintext version!
This comes through a separate util, gocryptfs-xray
, thatās not
included in the Homebrew version, so we need to compile gocryptfs from
source:
git clone https://github.com/rfjakob/gocryptfs
# Checkout the version you actually want, or YOLO and build from `main`
# git checkout v2.4.0
./build-without-openssl
Then make sure to add the gocryptfs
and gocryptfs-xray
binaries
somewhere thatās in your PATH
(or just run them from there if you
prefer).
gocryptfs-xray
needs access to the gocryptfs ctlsock
, a socket to
communicate with the gocryptfs process. You get one by adding -ctlsock /path/to/ctlsock
to your gocryptfs
invocation.
Then, we can parse the rsync output and translate any encrypted path in
its decrypted version. I made a script for that:
gocryptfs-rsync-pretty
.
Just pipe the rsync output to it:
rsync ... 2>&1 | gocryptfs-rsync-pretty /path/to/ctlsock /path/to/mount
We now have a functional encrypted offsite backup system! Itās a combination of:
In this repo you can find the code I use to combine those 3 elements.
Itās not much more than:
gocryptfs -reverse -ctlsock /path/to/ctlsock /path/to/directory /path/to/mount
rsync "$@" /path/to/mount "$DESTINATION" 2>&1 \
| gocryptfs-rsync-pretty /path/to/ctlsock /path/to/mount
In my solution above, the backups are not incremental. Iām just syncing the current state to the remote host, but I keep no history of the previous āsnapshotsā. This could be an issue, for example, if I end up running a backup after my systems gets compromised or after I lose some data, then my backup is useless.
This is fine with me because I also do incremental backups that just donāt happen to be offsite. I guess Iām not edging against my house burning down or getting my computers and drives robbed, while at the same time having experienced some kind of data loss that Iāve accidentally propagated to my offsite server. š
Anyway, in order to add incremental backups to the equation, we could use Linux Time Machine (which also works very well on Mac despite the name š).
It works very much like macOS Time Machine, pretty much down to the underlying way the incremental backups are implemented on the filesystem: each āsnapshotā gets its own directory, but then files that didnāt change since the latest snapshot are just hardlinked to avoid duplication! So essentially, only the files that changed get stored, but you still have a full picture of the snapshot because the other files are hardlinked in the right place!
This is genius, and turns out this is provided by rsync through the
--link-dest
option. Linux Time Machine adds a nice, easy to use
frontend to it which is very appreciated.
Building off our work from above, we can simply replace the rsync
command by timemachine
:
gocryptfs -reverse -ctlsock /path/to/ctlsock /path/to/directory /path/to/mount
timemachine "$@" /path/to/mount "$DESTINATION" 2>&1 \
| gocryptfs-rsync-pretty /path/to/ctlsock /path/to/mount
This is possible because hard links are supported by Hetzner, and thanks to native rsync support, they can be preserved along the way!
Note: I havenāt tested gocryptfs-rsync-pretty
with the output of
timemachine
, but because timemachine
wraps rsync, it should work out
of the box, or require only basic tuning of the underlying rsync output.
Let me know if you try it!
Despite only writing this today, Iāve been using this system for two years already! (Time flies omg.)
The commits Iāve added over time were mostly to refine the rsync output parsing, so looks like the core of the script was pretty solid from the get go.
That setup survived at least two macOS upgrades, and Iāve been using it on my Linux machines as well.
So feel free to use gocryptfs-rsync for your own backups, or use it as an inspiration to build your own backup system! Cheers. āļø