Using Proxnix

I posted recently about a project I have been working on called Proxnix. I had just about got it to compile when I wrote that, but hadn't properly tested it. I've now tested it properly and have merged the branch I was using for testing.

Managing state

I had initially used a state file on the machine for POC development. This would write the DeployedState struct that was created by querying the live state to a file. That file was then treated as a source of truth. This is obviously kind of a waste, but for development I liked to have a file I could change to manipulate the state. The real issue is that this file can go stale.

With the state fully in memory, managing access to the state became a little more difficult. We obviously don't want the state to be changed by 2 operations at the same time. This was super simple with a file because I just put a read operation inside the blocking task. This isn't techinically safe, but it was good enough.

Changing the daemon to use in memory state meant I had to approach Rust's ownership model head on. Ownership so far had been relatively simple because I wasn't asking for too much. To be honest I generally would write functions to assume ownership and only change it if the compiler complained (Rust is very good at telling you when a cloned instance is required). Making global app state that can be accessed safely was a little more work.

This is the original global state:

#[derive(Clone)]
	struct AppState {
	    semaphore: Arc<Semaphore>,
	}

Simple right, just a semaphore with an atomic reference counted semaphore so only one pipeline could happen at once.

This is the modified global state:

#[derive(Clone)]
	struct AppState {
	    semaphore: Arc<Semaphore>,
      last_repo: Arc<RwLock<Option<(String, String)>>>,
	}

What a mouthful of a type... This is the last known repo which allows the desired state to be loaded into memory. So rather than accessing a state file it means that the state is always derived from the repo definitions, it cannot go stale like a file can.

The type, left to right is as follows:

ARC (atomic reference counter), this wraps the value and if you make a clone of the value it keeps a reference, the memory only actually gets freed when the count hits zero. It's atomic which makes it great for high throughput web traffic. This isn't really that, but over engineering is fun. We use this because we use the underlying RwLock in 2 places in the main module. We want to always access the same RwLock, but never at the same time.
RwLock - This is pretty simple, just a lock that forces callers to wait if they want to write from the memory while it's being written to.
Option - This is a type that Rust inherited from the ML languages. It's incredibly useful. It allows you to not have a value in the object and to account for that with some very simple builtin methods rather than a bunch of if statements and equality checks. None is returned when there is no value, Some is returned when there is a value. How you deal with that is up to you, there won't be an issue if there is no data though.
Tuple of Strings, in this case the repo url and the commit hash for the webhook repo

This type really forces you to think about how to deal with the memory. Rust forces you to make implicit things explicit, and to show how you will deal with the potential issue of multiple writes. Once this pattern clicks it makes a lot of sense, but it's a lot to take in to begin with. I'm sure you can see how resilient to high traffic something like this is. Again not really needed here, but fascinating to learn.

Diffing

Initially in the code the diffs would rebuild the machines if the commit hash had changed. Obviously this meant every diff would rebuild the machines. I realised quickly after deploying that this would happen, and it was very annoying. I had to figure out how to diff against the actual image the machine was running. Thankfully, this project uses nix!

If you are not familiar with how nix works, the basics required here are that nix does not place files in standard unix install locations. Nix has a directory called /nix/store. This contains every single package you use in any of your nix files. The dir has the name of the hash that nix is able to derive from the inputs of the package. This guarantees the packages always have the same output. For example, proxnix at the moment is in a dir az9dig5821437cfzxa2fpklfwiscc76h-proxnix-0.1.0. Nix then makes symlinks to the expected output locations for everything so it's not impossible to use the computer.

This is useful for me because the generated images were kept in the nix store in a dir with their hash. This meant I could tag each machine with the image hash and then diff against this, which solved the issue entirely.

Config file changes

Originally I was using a .json file to represent the vms. This was seperate to nix. I changed this so that I use a .nix module called proxnix.nix. The inside of it is still json, but it's wrapped in nix and nix is now aware of both the vms and the image definitions.

I am now hosting this very site on a proxnix vm! I already had a flake for the site so I just added that to an image and spun up a vm with that image.

Workflow

The actual workflow is now pretty simple. I have a proxnix.nix file that I use, here is the file:

{
  vms = {
    "test-init" = {
      name = "test-init";
      vm_id = 820;
      image_type = "build-qcow2-init";
      cores = 2;
      sockets = 1;
      memory_mb = 2048;
      disk_gb = 10;
      storage_location = "local-lvm";
      cloud_init = "None";
      protected = false;
    };
    "test-cp" = {
      name = "test-cp";
      vm_id = 821;
      image_type = "build-qcow2-cp";
      cores = 2;
      sockets = 1;
      memory_mb = 2048;
      disk_gb = 10;
      storage_location = "local-lvm";
      cloud_init = "None";
      protected = false;
    };
    "test-worker" = {
      name = "test-worker";
      vm_id = 822;
      image_type = "build-qcow2-worker";
      cores = 2;
      sockets = 1;
      memory_mb = 2048;
      disk_gb = 10;
      storage_location = "local-lvm";
      cloud_init = "None";
      protected = false;
    };
    "test-website" = {
      name = "test-website";
      vm_id = 823;
      image_type = "build-qcow2-website";
      cores = 2;
      sockets = 1;
      memory_mb = 2048;
      disk_gb = 10;
      storage_location = "local-lvm";
      cloud_init = "None";
      protected = false;
    };
  };
}

These are all named test for obvious reasons, but the website one is being used (until I rename it).

If you want to add a machine, you just add another object in there.

I have this file, the flake, that defines all of the images. It can pull other flakes for use in images. Flakes in nix are packages so being able to pull any is very useful. As you can see, I pull my website flake from this flake:

{
  description = "VM image";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
    website.url = "git+ssh://[email protected]/whereiendandyoubegin/website.git";
    website.inputs.nixpkgs.follows = "nixpkgs";
  };

  outputs = { self, nixpkgs, website }:
  let
    system = "x86_64-linux";
  in {
    nixosConfigurations = {
      build-qcow2 = nixpkgs.lib.nixosSystem {
        inherit system;
        modules = [ ./configuration.nix ./qcow.nix ];
      };
      build-qcow2-worker = nixpkgs.lib.nixosSystem {
        inherit system;
        modules = [ ./configuration.nix ./qcow.nix ./k3s-common.nix ./k3s-worker.nix ];
      };
      build-qcow2-cp = nixpkgs.lib.nixosSystem {
        inherit system;
        modules = [ ./configuration.nix ./qcow.nix ./k3s-common.nix ./k3s-control-plane.nix ];
      };
      build-qcow2-init = nixpkgs.lib.nixosSystem {
        inherit system;
        modules = [ ./configuration.nix ./qcow.nix ./k3s-common.nix ./k3s-init.nix ];
      };
      build-qcow2-website = nixpkgs.lib.nixosSystem {
        inherit system;
        specialArgs = {
          websitePackage = website.packages.${system}.default;
        };
        modules = [ ./configuration.nix ./qcow.nix ./website.nix ];
      };
    };

    proxnix = import ./proxnix.nix;
  };
}

To change the packages in an image you just add them here. Here is my website module (not flake package, the actual module for building the OS):

{ pkgs, websitePackage, ... }: {
  environment.systemPackages = with pkgs; [
    certbot
    openssl
    ripgrep
    jq
  ];

  systemd.services.website = {
    description = "Personal website";
    wantedBy = [ "multi-user.target" ];
    after = [ "network.target" ];
    serviceConfig = {
      ExecStart = "${websitePackage}/bin/website-server";
      Restart = "on-failure";
      DynamicUser = true;
      WorkingDirectory = "${websitePackage}/site";
    };
  };

  services.nginx = {
    enable = true;
    virtualHosts."_" = {
      listen = [{ addr = "0.0.0.0"; port = 80; }];
      locations."/" = {
        proxyPass = "http://127.0.0.1:3001";
        proxyWebsockets = true;
      };
    };
  };


  networking.firewall.allowedTCPPorts = [ 80 443 ];

  fileSystems."/mnt/storage" = {
    device = "192.168.1.69:/ZFS/downloads";
    fsType = "nfs";
    options = [ "nfsvers=4" "soft" "timeo=30" ];
  };

  networking.interfaces.ens18.ipv4.addresses = [{
    address = "192.168.1.23";
    prefixLength = 24;
  }];
  networking.defaultGateway = "192.168.1.1";
  networking.nameservers = [ "8.8.8.8" ];
}

Think of the website module as the ansible file, the proxnix.nix module as terraform definitions, and the flakes as higher level orhcestrations. You can see how this can be a really easy system to manage. I find nixos modules to be a lot more expressive than ansible projects also. It annoys me that ansible projects require you to have roles, tasks and playbooks. It makes even small things into projects.

When you have the vms added you can see from proxmox tags which commit they came from, that they're managed by proxnix, and the nix image hash they are using. Alt text

I am gonna stop working on it for now, but I have quite a few changes for the future I would like to make, most interesting being adding support for LXC containers.