Terraforming NixOS hosts
I’ve made a provider to deploy Nixos hosts with Terraform.
Here is a list of features it support at this moment:
- configuration deployment
- secrets deployment
- SSH bastions
- provider, Nix, SSH settings overriding on per-host basis
- host addresses prioritization
I’ll update this article in case something new will become available.
Before we begin…
Requirements:
- NixOS
- Nix version >= 2.10.3 (lower may also work, but thats what I use)
- QEMU (to run example deployment)
This article is based on
terraform-provider-nixos >= 0.0.14
.
Why
Why making new Terraform provider? There are already:
- https://github.com/tweag/terraform-nixos HCL module
- https://github.com/andrewchambers/terraform-provider-nix native
Because:
- there was nothing I was comfortable with
- I love to make things
I’ve used Tweag HCL module for some time in production. Some cons I’ve realized during use:
- bad support for SSH bastions
- poor secrets support
- it is a HCL module with all HCL limitations
Haven’t used provider from Andrew Chambers (I’ve discovered it in the middle of making my own). From what I found:
- no support for SSH bastions
- no support for secrets
I want more control over configuration of the provider, how it connects to the servers, what known hosts file it uses (preferably on per-host basis). Almost every time I have deployed something into the cloud I need support for bastion SSH servers but not so many tools implement this in a controllable and hackable way.
Also secret management schemes are a thing in NixOS because every user in a system could read /nix/store
. I am not a big fan of solving this problem with encryption (have done some crunching around this problem in past). There is no secrets support (or it is very poor) in mentioned providers.
Why not just use NixOps?
NixOps is OK. I am not a fan of Terraform (actually I hate Terraform and Hashicorp with passion!), but it has so many integrations. So… it is worth to have a way to deploy NixOS configurations with it.
Installation
There are two ways to install it:
- with Nix
- with Terraform
This article is about NixOS, so I will show how to install provider with Nix first.
Installing with Nix is optional, you could skip to initialization.
Also it is good to know that Hashicorp has blocked access for some countries because of government sanctions, installing with Nix is a way to bypass such nonsense.
I have release.nix in the repo which updates with every new version. It shows how to build Terraform with NixOS provider.
Let’s write shell.nix
which will use Terraform with predefined set of providers.
let inherit (pkgs) stdenv ; nixpkgs = <nixpkgs>; config = {}; pkgs = import nixpkgs { inherit config; }; mkProvider = pkgs.terraform_1.plugins.mkProvider; terraform = pkgs.terraform_1.withPlugins (p: [ p.vultr p.linode (mkProvider rec { owner = "corpix"; repo = "terraform-provider-nixos"; rev = "0.0.14"; version = rev; sha256 = "sha256-4QATev3WtwpEwc4/+JjOBfvUVzUre15VZT7tXLkSrXM="; vendorSha256 = null; provider-source-address = "registry.terraform.io/corpix/nixos"; }) ]); in stdenv.mkDerivation { name = "nix-shell"; buildInputs = with pkgs; [ terraform ]; }
Btw, other providers could be added to the list, just like I did with vultr
and linode
in the example. Available providers could be listed using REPL:
λ nix repl '<nixpkgs>' nix-repl> pkgs.terraform-providers.<TAB> ...
Then issue a nix-shell
in the directory where shell.nix
is stored and boom! Terraform with preinstalled providers is available in the shell.
This is working using a directory as a Terraform registry, so providers will be installed from a source code, not downloaded from Terraform registry.
Initialization
But providers is still need an initialization. And here we are, moving forward to the second point of the list: installing providers with Terraform itself.
At this step HCL file is required, name is not important, but let’s call it main.tf
:
Versions are optional, it will install “latest” by default.
terraform { required_providers { nixos = { source = "corpix/nixos" # version = "0.0.14" } vultr = { source = "vultr/vultr" } linode = { source = "linode/linode" } } }
Issue a terraform init
in the shell:
I’ve cut the output to make it shorter.
λ terraform init Initializing provider plugins... - Finding latest version of corpix/nixos... - Finding latest version of vultr/vultr... - Finding latest version of linode/linode... - Installing corpix/nixos v0.0.14... - Installed corpix/nixos v0.0.14 - Installing vultr/vultr v2.11.3... - Installed vultr/vultr v2.11.3 - Installing linode/linode v1.28.1... - Installed linode/linode v1.28.1 Terraform has been successfully initialized!
It will create:
Updating providers which was installed using Nix will require deletion of this artifacts with following re-running of the
terraform init
.
.terraform.lock.hcl
lock file with provider hashes.terraform
directory where provider executables are stored
Configuration deployment
We have provider installed. It is time to write some configuration and deploy it.
Save this example configuration into configuration.nix
, replacing SSH my public key with yours. It should run SSH server with disabled root
user password (can’t login using TTY).
{ pkgs, lib, ... }: { imports = [ <nixpkgs/nixos/modules/profiles/qemu-guest.nix> ]; config = { users = rec { mutableUsers = false; extraUsers.root = { isNormalUser = false; hashedPassword = users.root.hashedPassword; openssh.authorizedKeys.keys = [ "ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBACa4D4ycVdMtyIt1WUeoG3S/cdCARlyffhn6LsogFLHURvKtoMVV4cgZBrexju4SjpO/nAlHio8y8T1U0nV5WKDJAAIH0PhPt79HWQOi6HB4d/7UUncMndktyVYar0Mneir/Ci2yQEVmq6vYKKPTuwVynCB2r6yG1IzD1rhFEAG5OUeSg==" ]; }; users.root.hashedPassword = "!"; }; services = { openssh.enable = true; openssh.passwordAuthentication = false; # haveged.enable = true; }; i18n.defaultLocale = "en_US.UTF-8"; time.timeZone = "UTC"; # NOTE: just to build faster & use less space documentation.nixos.enable = false; documentation.man.man-db.enable = false; fileSystems."/" = { device = "/dev/disk/by-label/nixos"; autoResize = true; fsType = "ext4"; }; # NOTE: this are QEMU-specific settings boot.growPartition = true; boot.kernelParams = ["console=ttyS0"]; boot.loader.grub.device = "/dev/sda"; boot.loader.timeout = 0; }; }
To run a virtual machine QEMU will be used. I assume reader to have it the system. Before running VM a disk image should be built, this is where nixos-generate
tool will help. Run shell to get it and build an image:
λ nix-shell -p nixos-generators λ nixos-generate -f qcow -c configuration.nix ... /nix/store/f0wwhg3vh1h6n06913hd4h763w3nzz5m-nixos-disk-image/nixos.qcow2
It have printed path to the disk image before exit. Copy it to the working directory and chmod
, because it will be read-only after copying from Nix store:
λ cp /nix/store/f0wwhg3vh1h6n06913hd4h763w3nzz5m-nixos-disk-image/nixos.qcow2 ./ λ chmod 644 nixos.qcow2
For BTRFS it could be a good thing to disable CoW using
chattr +C nixos.qcow2
.
Dispatch, we are ready to launch! Run QEMU using this command which will passthrough SSH port from VM, making it available at 2222/tcp
on host machine:
λ qemu-kvm -boot d -m 2048 -net nic -net user,hostfwd=tcp::2222-:22 -hda ./nixos.qcow2
If everything went smoothly here is how QEMU window will look like:
To enter VM shell issue:
λ ssh -p 2222 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@127.0.0.1 [root@nixos:~]#
Next step is to describe an instance, making Terraform know how to connect to the host and what configuration to apply. Time to append some stuff to main.tf
which was created previously.
Need to mention: disabled
userKnownHostsFile
&strictHostKeyChecking
is just for testing purposes, this is insecure to use this settings for real things.
ssh
could be defined globally or per-instance.
resource "nixos_instance" "vm" { address = ["127.0.0.1"] configuration = "./configuration.nix" ssh { port = 2222 config = { userKnownHostsFile = "/dev/null" strictHostKeyChecking = "no" } } }
Ready to apply! Run Terraform, which will ask whether changes are expected or not.
Other things may be asked, for example: if enable userKnownHostsFile
& strictHostKeyChecking
then SSH client would ask to approve host key fingerprint interactively.
Terraform may be started with environment variable
TF_LOG
set to the valueINFO
, this will make a lot of noise with details about the Nix derivation build progress and upload (there is no clear way to print something on the screen from Terraform provider).
λ terraform apply -target nixos_instance.vm Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # nixos_instance.vm will be created + resource "nixos_instance" "vm" { + address = [ + "127.0.0.1", ] + configuration = "./configuration.nix" + id = (known after apply) + secret_fingerprint = (known after apply) + settings = jsonencode({}) + system = "x86_64-linux" + derivations { + outputs = (known after apply) + path = (known after apply) } + ssh { + config = { + "strictHostKeyChecking" = "no" + "userKnownHostsFile" = "/dev/null" } + port = 2222 + user = "root" } } Plan: 1 to add, 0 to change, 0 to destroy.
After typing yes
and pressing Enter
it should say something like:
nixos_instance.vm: Creating... nixos_instance.vm: Creation complete after 7s [id=e64be02d-a4b6-7a0a-1cb2-23cc3cfab449] Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Proove VM has a new generation deployed and switched:
[root@nixos:~]# nix-env --list-generations --profile /nix/var/nix/profiles/system 1 2022-08-10 12:07:01 2 2022-08-10 12:13:36 (current)
We deployed our first configuration with Terraform.
Secrets handling
Provider has support for provisioning secrets onto the hosts. It supports reading secrets from various places, but by default it uses local filesystem.
Here is how to transfer secret from filesystem to vm
instance (more settings):
secret
could be defined globally or per-instance.
resource "nixos_instance" "vm" { # ... secret { source = "./secrets/key" destination = "/root/secrets/key" } }
When using different secret providers
source
used an identifier
for the secret concrete provider could retrieve.
For example, there may be secretexample.com/key
in GoPass which will
havesource = "example.com/key"
if NixOS provider configured to use GoPass as a secret provider.
This will upload ./secrets/key
file to the host to the path /root/secrets/key
. If /root/secrets
does not exists then it will be created.
NixOS provider use
tar
to deliver secrets via SSH, parent directory creation is handled transparently.
To specify multiple secrets just repeat secret
multiple times:
resource "nixos_instance" "vm" { # ... secret { source = "./secrets/key" destination = "/root/secrets/key" } secret { source = "./secrets/another" destination = "/root/secrets/anotherkey" } }
In addition to source
and destination
access information could be specified:
group
name of the group file should belong to (root
)owner
name of the user file should belong to (root
)permissions
octal representation (600 = rw- --- ---
)
Known problem:
ifgroup/owner
does not exists at the moment of the secrets provisioning,
which happens before derivation deployment,
then it will becomeroot/root
.
I have some thoughts, but no solution at the moment.
Let’s try to deploy a sample secret. This is not NixOS instance definition inside main.tf
should look now:
resource "nixos_instance" "vm" { address = ["127.0.0.1"] configuration = "./configuration.nix" ssh { port = 2222 config = { userKnownHostsFile = "/dev/null" strictHostKeyChecking = "no" } } secret { source = "./secrets/key" destination = "/root/secrets/key" } }
Create sample secret with this commands:
λ mkdir -p secrets λ echo "hello world" > secrets/key
Apply configuration:
λ terraform apply -target nixos_instance.vm nixos_instance.vm: Refreshing state... [id=e64be02d-a4b6-7a0a-1cb2-23cc3cfab449] Terraform will perform the following actions: # nixos_instance.vm will be updated in-place ~ resource "nixos_instance" "vm" { id = "e64be02d-a4b6-7a0a-1cb2-23cc3cfab449" ~ secret_fingerprint = { - "kdf_iterations" = "35" - "salt" = "acd73fe9592089ec514626720afe7f29949ad38394fd934c149c6b2b8f3faa53" - "sum" = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" } -> (known after apply) # (4 unchanged attributes hidden) + secret { + destination = "/root/secrets/key" + group = "root" + owner = "root" + permissions = 600 + source = "./secrets/key" } # (2 unchanged blocks hidden) } Plan: 0 to add, 1 to change, 0 to destroy. ... nixos_instance.vm: Modifying... [id=e64be02d-a4b6-7a0a-1cb2-23cc3cfab449] nixos_instance.vm: Modifications complete after 9s [id=e64be02d-a4b6-7a0a-1cb2-23cc3cfab449] Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
NixOS provider maintains a salted fingerprint of all secret contents.
This is made to speedup deployment and save some traffic:
no need to transfer secrets if they doesn’t change.
Proove secret is really reached the destination:
λ ssh -p 2222 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@127.0.0.1 [root@nixos:~]# ls -la /root/secrets/ total 12 drwxr-xr-x 2 root root 4096 Aug 10 20:28 . drwx------ 4 root root 4096 Aug 10 20:28 .. -rw------- 1 root root 12 Aug 10 20:28 key [root@nixos:~]# cat /root/secrets/key hello world
Other secret stores could be used (for example GoPass). See available settings.
Here is an example which transfer secret identified by path secrets/key
from GoPass to vm
instance:
secrets
setting may be defined globally or per-instance.
resource "nixos_instance" "vm" { # ... secret { source = "secrets/key" destination = "/root/secrets/key" } secrets { provider = "gopass" gopass = { store = "./secrets" } } }
Bastion
Sometimes called "jump host".
Bastion servers are edge servers which are responsible for the network resources access control, in this case SSH, which is not exposed to the wild internet, only bastion has it exposed. People connect to bastion after that connect to hosts inside the network.
Many companies use bastions to control and journalize access, so it is a “must have” for a good tool.
Bastion settings (bastion schema) could be defined:
- globally, for whole provider
- locally, for instance
- mixed, override global bastion settings for single instance
This semantics are shared between most NixOS provider settings, it should become more clear as you read, there is separate paragraph just about this.
Bastion section supports same keys as ssh
extending them with host
key, which contains a remote address of the bastion server:
bastion
could be defined globally or per-instance.
bastion { host = "127.0.0.1" port = 2222 }
For demonstration purposes NixOS VM (which was deployed previously) will be used as a bastion to deploy configuration onto itself. This settings will tell NixOS provider to connect to 127.0.0.1:2222
which will forward SSH connection to 127.0.0.1:22
- SSH port on VM.
But first to proove vm
instance is working as SSH bastion
(forwarding connections to self). Turn on DEBUG1
logging level for SSHd:
# ... services = { # ... openssh.logLevel = "DEBUG1"; }; # ...
This will make SSHd write connection forwarding information to log. Run:
λ terraform apply -target nixos_instance.vm ...
Now change instance settings in main.tf
to look like this:
Bastion will inherit settings defined for
ssh
underconfig
key.
resource "nixos_instance" "vm" { address = ["127.0.0.1"] configuration = "./configuration.nix" ssh { port = 22 config = { userKnownHostsFile = "/dev/null" strictHostKeyChecking = "no" } } bastion { host = "127.0.0.1" port = 2222 } secret { source = "./secrets/key" destination = "/root/secrets/key" } }
Then do another configuration apply
:
λ terraform apply -target nixos_instance.vm ... │ Error: subcommand "/run/current-system/sw/bin/ssh -F /run/user/1000/ssh_config.356167380 127.0.0.1 tar -x -C /" exited with: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ │ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ │ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ │ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! │ Someone could be eavesdropping on you right now (man-in-the-middle attack)! │ It is also possible that a host key has just been changed. ...
I should say it loud: this apply will fail because of the fragile nature of the Terraform and poor engineering of the SDK.
I don’t know how to solve this, if you do then please let me know, fill an issue. Thanks!
Here is why it is failing. This part of the plan diff causes an error:
It deletes ssh
block then re-adds it. This makes provider act like ssh
block does not exist at all during terraform apply
so it uses the defaults. And default is StrictHostKeyChecking=yes
. This is why it fails.
This bug is not new. Nested datastructures is a huge pain in Terraform SDK. And one of the many reasons why I hate Terraform. I’ve wasted unacceptable amount of time to debug this and need quadruple (?) more to find out how to fix this (whole SDK is a one big chunk of what we call «over-engineering»).
Second apply
should make everything consistent. If Terraform says “everything is up to date” then change configuration.nix
, for instance uncomment haveged
. After apply
this command journalctl -u sshd.service --follow
should print following lines when bastion is used:
I’ve highlighted lines of the interest, pay attention to
debug1: server_request_direct_tcpip
.
Address matching and filtering
Sometimes hosts have multiple addresses. For instance: IPv4 & IPv6. In this case some priorities for address families or subnets could be defined.
Here is an example where IPv6 addresses have precedence over IPv4 addresses, in this example ::1
will be used to connect to the vm
instance:
address_priority
may be defined only globally.
provider "nixos" { # ... address_priority = { "0.0.0.0/0" = 0, "::/0" = 1, } } resource "nixos_instance" "vm" { address = [ "127.0.0.1", "::1", ] # ... }
Larger numbers raises priority of the subnet making matched addresses to be closer to the start of the instance addresses list, so this addresses will be used when connecting to host with SSH.
This will give reordering capability on top of the set of addresses. Despite nearly 40% of IPv6 adoption people sometimes strugle from IPv6 missconfiguration on ISP level, etc. What if somebody want to use just IPv4 or IPv6 to connect to the hosts and does not care about priorities?
There is address_filter
setting which will filter addresses used by instances with CIDR, in this example known vm
addresses will contain just 127.0.0.1
:
address_filter
may be defined only globally.
provider "nixos" { # ... address_filter = ["0.0.0.0/0"] } resource "nixos_instance" "vm" { address = [ "127.0.0.1", "::1", ] # ... }
Multiple filters could be defined. In this case they will use or
semantics. This basically means: address must match at least one CIDR to be presented in the instance addresses list.
Filters defined in
address_filter
applied before sorting usingaddress_priority
data.
Retries
Network may be not reliable, so it is a good thing to have a way to retry a broken connection.
Here is an example which tells provider to retry SSH connection 3 times with a delay of 1 second:
retry
andretry_wait
could be defined only globally.
provider "nixos" { # ... retry = 3 retry_wait = 1 }
If limits of retries exceeded then
Nix settings
There are some settings which could be altered for Nix package manager:
activation_action
Activation script action, one of:switch|boot|test|dry-activate
activation_script
Path to the system profile activation scriptbuild_wrapper
Path to the configuration wrapper in Nix language (function which returnsdrv_path
&out_path
)cores
Number of CPU cores which Nix should use to perform buildsmode
Nix mode (0 - compat
,1 - default
)output
System derivation output nameprofile
Path to the current system profileshow_trace
Show Nix package manager trace on erroruse_substitutes
Whether or not should Nix use substitutes
I will stop on 3 of them:
activation_action
build_wrapper
mode
The activation_action
setting is an action which should be passed as a first argument to the activation script of the system profile:
λ /nix/var/nix/profiles/system/bin/switch-to-configuration Usage: /nix/var/nix/profiles/system/bin/switch-to-configuration [switch|boot|test] switch: make the configuration the boot default and activate now boot: make the configuration the boot default test: activate the configuration, but don't make it the boot default dry-activate: show what would be done if this configuration were activated
It could be empty, for provider this basically means: do not run activation. This was done for tests but probably may be used for some other things.
The build_wrapper
setting may contain a path to the system derivation builder. This wrapper is written in Nix language and should return at least two keys in attribute set:
drv_path
out_path
Here how looks built into provider wrapper.
The mode
settings controls wether to use experimental Nix CLI or not. By default it is 0
which means “try to not use experimental CLI flags” and 1
means the opposite “use experimental CLI flags”. I’ve introduced this setting to make provider work with Nix versions older than 2.10.x
More older version may still work even with
mode=1
, but haven’t test this.
Configuration override
Provider gives a user way to define settings for:
nix
Nix package managerssh
SSH clientbastion
SSH tunnelingsecrets
secrets providers
Each of them could appear both globally or on instance level.
There are more, but some of this settings may be defined only globally.
Settings defined on the instance level override global settings on instance level:
provider "nixos" { nix { cores = 2 } ssh { port = 2222 } # secrets { ... } } resource "nixos_instance" "vm" { nix { cores = 4 } ssh { port = 22 } bastion { host = "bastion.example.com" port = 2222 } secrets { provider = "gopass" gopass = { store = "./secrets" } } }
This will result in the following settings for vm
:
- global amount of CPU cores Nix will use for build is
2
, but forvm
instance it is4
- global SSH port
2222
defined, but forvm
instance22
will be used - globally no bastion defined, but for
vm
instancebastion.example.com
will be used - globally no secrets provider defined (filesystem will be used), but for
vm
instancegopass
provider is defined (with store location in./secrets
directory)
Each section (set
, like ssh
or bastion
, …) will not be merged deeply. This is a limitation of the Terraform SDK (it has no way to distinguish user-providen values from default values, correct merge is not possible). Because of this following will not work as expected:
Bastion port for
vm
instance will be22
instead of expected2222
.
provider "nixos" { bastion { host = "bastion.example.com" port = 2222 } } resource "nixos_instance" "vm" { # ... bastion { host = "other-bastion.example.com" } # ... }
Licensing
I don’t like the concept of intellectual property on source code and other intangible things.
Thats why this project and this article is public domain, feel free to do anything with this code, this is Internet, I don’t care much. But would be glad to be mentioned.