dan@blog:~
×

Proxnix

Proxnix

What is proxnix?

Proxnix is a GitOps state controller and reconciliation engine (think ArgoCD) written in Rust for the Proxmox platform. It uses Nix as the build engine to allow idempotent, correct and reproducible builds. It deals with the VM management layer rather than the container management layer. I would like to one day support LXC as a target as well.

You can find the code here: https://github.com/whereiendandyoubegin/proxnix

Why make it?

Proxnix was created because I was frustrated with current tooling for deployment on Proxmox. The frustrations were:

  • Ansible takes substantial work to be idempotent and managing the files can be frustrating (thank you yaml...)
  • Ansible can leave dirty state easily
  • Packer is imperative and as such can fail in weird ways. Writing the manifests can be frustrating.
  • Terraform is a great piece of software and largely manages things in a good way, but it's not the tool for Proxmox, it's for the cloud
  • Orchestration and deployment across all of these tools can be messy

So I started by replacing Ansible with Nix, which I began using on my desktop at the same time. I was pretty surprised with how good Nix was, and by how much it fixed. It was a pain to get used to the expressions, but I'm getting better at it. I feel like I've only used a very small subset of Nix, I will continue digging with it.

While using Nix and using it on my desktop as the OS, I realised both Ansible and Packer could be replaced with it. Using this Nix module you can create qcow2 images:

  { config, lib, pkgs, modulesPath, ... }: {
  imports = [
    "${toString modulesPath}/profiles/qemu-guest.nix"
  ];

  fileSystems."/" = {
    device = "/dev/disk/by-label/nixos";
    autoResize = true;
    fsType = "ext4";
  };

  boot.kernelParams = [ "console=tty0" "console=ttyS0,115200n8" ];
  boot.loader.grub.device = lib.mkDefault "/dev/vda";
  boot.loader.grub.enable = true;

  system.build.qcow2 = import "${modulesPath}/../lib/make-disk-image.nix" {
    inherit lib config pkgs;
    diskSize = 4000;
    format = "qcow2";
    partitionTableType = "hybrid";
  };
}

This is incredibly powerful. With only a few Nix files you can declaratively configure the system and create an image with it.

This left me with deployment as the priority. Deployment was hard with this. I tried to use some CI/CD jobs but they became very complicated very fast. I wanted a webhook listener that would deploy the flakes for me. I figured I could use python and get something working pretty quickly, but wanted to learn Rust. I also realised that terraform was probably not the best tool for this. Terraform is for the cloud. There is a proxmox provider, but why should we deal with the Proxmox API when we can reduce the error surface area by using direct qm calls on the host. Terraform is platform agnostic thanks to providers, this is great for the cloud but not useful when the bounds are known and guaranteed by Proxmox.

How does it work?

Proxnix uses a rather similar model to ArgoCD. You commit changes to a repo (these have to be highly structured at the moment, I am looking to change this), this triggers a webhook. I wrote some code to allow any git server to be used. Here is the code, this was my first foray into recursion as well so it was fun to write:

  pub fn webhook_parse(webhook: serde_json::Value) -> Result<ParsedWebhook> {
    let hash = find_string(&webhook, &|s| {
        s.len() == 40 && s.chars().all(|c| c.is_ascii_hexdigit())
    })
    .ok_or(AppError::ParsingModuleError(
        "could not find commit hash".to_string(),
    ))?;

    let repo = find_string(&webhook, &|s| s.contains("ssh://") && s.contains(".git")).ok_or(
        AppError::ParsingModuleError("could not find repo url".to_string()),
    )?;

    Ok(ParsedWebhook {
        repository: repo,
        hash,
    })
}

pub fn find_string(json: &serde_json::Value, predicate: &impl Fn(&str) -> bool) -> Option<String> {
    match json {
        Value::String(s) => {
            if predicate(s) {
                Some(s.clone())
            } else {
                None
            }
        }
        Value::Array(array) => {
            for a in array {
                let result = find_string(a, predicate);
                if result.is_some() {
                    return result;
                }
            }
            return None;
        }
        Value::Object(map) => {
            for v in map.values() {
                let result = find_string(v, predicate);
                if result.is_some() {
                    return result;
                }
            }
            return None;
        }
        _ => {
            return None;
        }
    }
}

Once the webhook has been received, the controller gets the current state and loads it into a struct and onto disk, then diffs the current state against the desired state (known from a config file in the repo) and creates/updates/deletes VMs. Something important to note is that VMs without the Protected tag are treated as ephemeral. This goes hand-in-hand with NixOS. The reconciliation loop is tight and low-cost, which makes state easier to manage than in Terraform.

This is the first in a series on my experiences over the last few months, I will type up some more development experiences next. Namely Rust as a beginner, Nix as a beginner, why correctnessmatters in infrastructure.