Signal drop!
Relay (operand.online) is unreachable.
Usually, a dropped signal means an upgrade is happening. Hold on!
Sorry, no connección.
Hang in there while we get back on track
Re: Deploy
Should be easy by now...
Look for the code on gram:nue/flash.nu.
Year-End Lab Problems!
I'm spending much of the holiday season in a small crisis, my mobility reduced by crucial repairs for a new camper I'll be using next year. As I camp on my friends' couch, a common theme in my career, I am held back from many of my aims because my domain seems to keep dropping offline; rather than heading back to Baltimore to cycle the machine once again, I'd like to come to grips with a more reliable deploy method.
Since mid-summer, the apps are running through a combination of two machines:
pebble(a raspberry pi), running:baseboard(a mini-pc), running:- NixOS
- a local Caddy reverse-proxy
- miniserve
- MicroVM.nix, managed by gram:pool
baseboard runs a couple other open-source programs,
including immich as an image library,
and traccar as a GPS logger.
These are amazing programs for your lab,
and help you upgrade your phone to "basically normal"
with no dependency on Apple or Google.
These programs are possibly also the reasons that this computer is going offline, because each one is public and may include security errors or imperceptible backdoors. I am so incredibly bad at checking log files, at least until I hook up Grafana. They should probably be running hidden from the global internet, instead of my all-in approach of exposing each app under public subdomains.
Of course, I'm focused today on bringing the main app online again. I'm going to assume all of my machines are compromised, for a fun exercise.
Upgrading "Phase One".
One of the real missions of Operand is to reduce our dependency on so-called "cloud" resources.
For today's changes, I have two additional Framework Mainboard computers, ready to go.
One of them has an SSD onboard already, and this used to be a backup lab machine. I'll begin by backing up anything on that disc so we can re-image the OS and record the process from the beginning.
Normally, for a clean install of NixOS I'd begin by flashing an install disc - a "live image", onto a microSD card. I keep a case of around 30 spare microSDs, and nearly all of them are full of some install disc or other. The labeling is becoming a real problem.
To simplify this "phase one" process, I'd like to skip the SD cards and head directly to the m.2 NVMe SSD that I'll be booting the new computer from.
According to the NixOS "manual installation" guide
I should be able to do this by erasing the disc,
formatting it properly for a UEFI boot sequence,
and then binding the disc to /mnt.
I copied the procedures described under "Partitioning > UEFI (GPT)", "Formatting", and "Installing" in the manual.
I then reorganize the partitions so their numbers are sequential.
I add a couple additional partitions, nix and home,
as an early basis for impermanence.
Because my shell of choice is Nushell,
some of the commands have been broken into segments,
passing the desired disc label into the main function
to make the procedure more generic.
I begin by sourcing three helper modules from gram:nue
#!/usr/bin/env nu
source ~/.nue/nix.nu
source ~/.nue/disc.nu
source ~/.nue/grammar.nu
let mem = '32GB'
let swap = [ ('-' + $mem) '100%' ]
let boot = [ '1MB' '512MB' ]
let nix = [ ($boot | last) '128GB' ]
let root = [ ($nix | last) '256GB' ]
let home = [ ($root | last) ($swap | first) ]
def part [disc: string, part: string = ''] { ($disc | into string) + $part }
def main [disc?: string, --recycle (-r) ] { timeit {
unbind
let disc = $disc | assure { "/dev/" + (disc choose) }
nsh parted sudo parted $disc -- mklabel gpt
nsh parted sudo parted $disc -- mkpart ESP fat32 ...($boot)
nsh parted sudo parted $disc -- set 1 esp on
sudo mkfs.fat -F 32 -n BOOT (part $disc 1)
nsh parted sudo parted $disc -- mkpart nix ext4 ...($nix)
sudo mkfs.ext4 -L nix (part $disc 2)
nsh parted sudo parted $disc -- mkpart root ext4 ...($root)
sudo mkfs.ext4 -L nixos (part $disc 3)
nsh parted sudo parted $disc -- mkpart home ext4 ...($home)
sudo mkfs.ext4 -L home (part $disc 4)
nsh parted sudo parted $disc -- mkpart swap linux-swap ...($swap)
sudo mkswap -L swap (part $disc 5)
nsh parted sudo parted $disc -- p
# mount discs!
sudo mount (part $disc 3) /mnt
sudo mkdir /mnt/boot; sudo mount -o umask=077 (part $disc 1) /mnt/boot
sudo mkdir /mnt/nix; sudo mount (part $disc 2) /mnt/nix
sudo mkdir /mnt/home; sudo mount (part $disc 4) /mnt/home
disc ls | where source =~ $disc | print
tree -afFix /mnt
"Copying in codebases." | print
clone all
unbind
me
} }
def unbind [] {
try { sudo umount /mnt/boot }
try { sudo umount /mnt/nix }
try { sudo umount /mnt/home }
try { sudo umount /mnt }
}
The final lines run parted and then disc ls, as quick checks.
We can run nu flash.nu /dev/sdb directly,
or a simple nu flash.nu displays a choice of plugged-in discs.
Near the end of the command, we see the consequences:
➜ chmod +x flash.nu
➜ ./flash.nu
# ... subcommand logs
(parted)> sudo parted /dev/sdb -- p
Model: WDC WDS5 00G2B0C-00PXH0 (scsi)
Disk /dev/sdb: 500GB
Sector size (logical/physical): 512B/8192B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1051kB 512MB 511MB fat32 ESP boot, esp
2 512MB 128GB 127GB ext4 nix
3 128GB 256GB 128GB ext4 root
4 256GB 468GB 212GB ext4 home
5 468GB 500GB 32.0GB linux-swap(v1) swap swap
╭───┬───────────┬───────────┬────────┬──────╮
│ # │ bind │ source │ scheme │ name │
├───┼───────────┼───────────┼────────┼──────┤
│ 0 │ /mnt │ /dev/sdb3 │ ext4 │ mnt │
│ 1 │ /mnt/boot │ /dev/sdb1 │ vfat │ boot │
│ 2 │ /mnt/nix │ /dev/sdb2 │ ext4 │ nix │
│ 3 │ /mnt/home │ /dev/sdb4 │ ext4 │ home │
╰───┴───────────┴───────────┴────────┴──────╯
/mnt/
/mnt/boot/
/mnt/home/
/mnt/lost+found/ [error opening dir]
/mnt/nix/
5 directories, 0 files
Add Operand's Code.
Day by day, I add code to a few programs (called grams in our lingo),
used to make lab-management procedures much easier to handle.
The core of these programs, for those new here, lie in gram:nux (for NixOS code) and gram:nue (for Nushell code).
There is also gram:mech, which is probably crucial somehow, although less production-ready than the others. Used to be, all three of these were included in a single codebase, called gram:build - the pieces were unique enough to break them up.
def "clone home" [--user (-u): string = nixos] {
let home = $"/mnt/home/($user)"
if not ($home | path exists) { sudo mkdir $home }
}
def "clone base" [base: string, --user (-u): string = nixos] {
let loc = $"/mnt/home/($user)/($base)"
if ($loc | path exists) { print $"Oh no! ($loc) is already loaded."; return; }
nsh gitoxide sudo gix clone $"~/($base)" $"/mnt/home/($user)/($base)"
}
def "clone loc" [base: path, --user (-u): string = nixos] {
let loc = [ /mnt/home/ $user $base ] | path join
nsh rsync sudo rsync -av ~/share (clone home -u $user)
}
def "clone all" [--user (-u): string = nixos] {
clone home
clone base .nue --user $user
clone base .nux --user $user
clone base .mech --user $user
clone base page --user $user
clone base diagram --user $user
nsh rsync sudo rsync -av ~/share $"/mnt/home/($user)/"
}
Nix Machine Config?
Nix has a command to generate config files, and I'm skipping that recommended config because after two years, I know enough to rebuild the good pieces and skip the unnecessary.
The challenging piece here is the file commonly called hardware-configuration.nix;
this specifies a number of boot options,
alongside the disc labels and paths they should bind to during boot.
Lucky us, we're already working with those.
Rather than the fickle /dev/sda and so on,
we'll grab the device UUIDs, which should be reliable
even when the disc is moved to a different machine.
def "label" [disc: string, part: any] {
(ls /dev/disk/by-uuid).name | where {|id| ($id | path expand) == (part $disc $part) } | first
}
➜ source ~/disc/flash.nu; label /dev/sda 2
/dev/disk/by-uuid/bf6fd2e3-b0f2-44fe-99c5-9237ad751a48
I can use this helper to build up a new nix config, depending on the architecture of the processor and assuming our 5-partition scheme.
def "nix bind" [disc: string, part: int, bind: string, --form (-f): string = ext4] {
$"fileSystems.''($bind)'' = { device = ''(label $disc $part
)''; fsType = ''($form)''; };" | str replace -a "''" '"'
}
def "nix swap" [disc: string, part: int] {
$"swapDevices = [
{ device = ''(label $disc $part)''; options = [ ''discard=once'' ]; }
];" | str replace -a "''" '"'
}
def "nix machine" [disc: string, arch: string, kernel: string] {
([ (nix preamble)
(nix arch $arch $kernel)
(nix bind $disc 1 "/boot" -f vfat)
(nix bind $disc 2 "/nix")
(nix bind $disc 3 "/")
(nix bind $disc 4 "/home")
(nix swap $disc 5)
] | flatten | each { " " + $in } | str join "\n" | str trim) + "\n}\n"
}
def "nix preamble" [] {
'' + `{ config, lib, modulesPath, ... }:
{
imports = [
(modulesPath + "/installer/scan/not-detected.nix")
];
boot.loader = { systemd-boot.enable = true; efi.canTouchEfiVariables = true; };
boot.initrd.availableKernelModules = [ "xhci_pci" "thunderbolt" "nvme" "usb_storage" "usbhid" "sd_mod" ];
boot.initrd.kernelModules = [ ];
boot.extraModulePackages = [ ];
`
}
def "nix arch" [arch: string, kernel: string] {
[ $"nixpkgs.hostPlatform = lib.mkDefault \"($arch)-($kernel)\";" ] ++ (
if ($arch == 'x86_64') { [
"hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;"
"boot.kernelModules = [ ''kvm-intel'' ];"
] } else {
"boot.kernelModules = [];" })
}
This means a simple one-liner prepares the necessary machine.nix:
➜ source ~/disc/flash.nu; nix machine /dev/sda x86_64 linux
{ config, lib, modulesPath, ... }:
{
imports = [
(modulesPath + "/installer/scan/not-detected.nix")
];
boot.loader = { systemd-boot.enable = true; efi.canTouchEfiVariables = true; };
boot.initrd.availableKernelModules = [ "xhci_pci" "thunderbolt" "nvme" "usb_storage" "usbhid" "sd_mod" ];
boot.initrd.kernelModules = [ ];
boot.extraModulePackages = [ ];
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
boot.kernelModules = [ ''kvm-intel'' ];
fileSystems."/boot" = { device = "/dev/disk/by-uuid/D7F2-26FE"; fsType = "vfat"; };
fileSystems."/nix" = { device = "/dev/disk/by-uuid/bf6fd2e3-b0f2-44fe-99c5-9237ad751a48"; fsType = "ext4"; };
fileSystems."/" = { device = "/dev/disk/by-uuid/847b4009-0ae6-4219-9799-635092ea6879"; fsType = "ext4"; };
fileSystems."/home" = { device = "/dev/disk/by-uuid/f425a3b1-82e6-4c3b-89f0-cbd85e952162"; fsType = "ext4"; };
swapDevices = [
{ device = "/dev/disk/by-uuid/8c6c0ee0-9226-43e2-92fb-5adbef5225c9"; options = [ "discard=once" ]; }
];
}
Because of the odd permission scheme, we need to make a new shell to record this file inside our disc, overwriting the prior config.
sudo nu -c $"(nix machine ($disc) ($arch) ($kernel) | to nuon) | save -f '(clone home -u $user)/.nux/cell/($machine)/machine.nix'"
Now, we can add the kicker to the end of our main function;
sudo nixos-install --flake $"(clone home -u $user)/.nux#($machine)"
Build, Error, and Rebuild.
From here on, I began running the code again and again. I added a rough idea of user accounts, and made sure the code to clone our codebases honors the chosen user.
I spent some hours seeing if I could cross-compile,
because I have deployed nux configs on both x86_64 (intel)
and aarch64 (arm) processors before.
The problem seems deep, so I'm going to pull back.
I can choose a pre-made config
for arm or x86 on a case-by-case basis,
the configs are similar enough.
There seems to be an incredible delay because gram:pool
needs to be copied to the nix cache;
this seems to depend on an 18.5 GB source directory,
so the disc seems to take an endless span to build.
I displaced /pool/disc on my local computer,
so nix should be able to proceed quicker - and yep!
Choose arch: x86_64
Choose kernel: linux
Choose machine: baseboard
warning: Git tree '/mnt/root/.nux' is dirty
warning: Git tree '/mnt/root/.nux' is dirty
copying channel...
building the flake in git+file:///mnt/root/.nux...
warning: Git tree '/mnt/root/.nux' is dirty
evaluation warning: You have set specialArgs.pkgs, which means that options like nixpkgs.config
and nixpkgs.overlays will be ignored. If you wish to reuse an already created
pkgs, which you know is configured correctly for this NixOS configuration,
please import the `nixosModules.readOnlyPkgs` module from the nixpkgs flake or
`(modulesPath + "/misc/nixpkgs/read-only.nix"), and set `{ nixpkgs.pkgs = <your pkgs>; }`.
This properly disables the ignored options to prevent future surprises.
evaluation warning: Using 'addressConfig' is deprecated! Move all attributes inside one level up and remove it.
evaluation warning: Using 'addressConfig' is deprecated! Move all attributes inside one level up and remove it.
evaluation warning: Using 'ipv6PrefixConfig' is deprecated! Move all attributes inside one level up and remove it.
installing the boot loader...
setting up /etc...
Created "/boot/EFI".
Created "/boot/EFI/systemd".
Created "/boot/EFI/BOOT".
Created "/boot/loader".
Created "/boot/loader/keys".
Created "/boot/loader/entries".
Created "/boot/EFI/Linux".
Copied "/nix/store/kiplbb6yv7rmjf21hf9ky01b9kmgmnqn-systemd-257.10/lib/systemd/boot/efi/systemd-bootx64.efi" to "/boot/EFI/systemd/systemd-bootx64.efi".
Copied "/nix/store/kiplbb6yv7rmjf21hf9ky01b9kmgmnqn-systemd-257.10/lib/systemd/boot/efi/systemd-bootx64.efi" to "/boot/EFI/BOOT/BOOTX64.EFI".
Random seed file /boot/loader/random-seed successfully written (32 bytes).
Created EFI boot entry "Linux Boot Manager".
setting up /etc...
setting up /etc...
setting root password...
New password:
Retype new password:
passwd: password updated successfully
installation finished!
Now to reboot and see...
Disc Label Problems.
All my nice code for binding disc labels to nix code seems to be... meh.
I guess I should say - and perhaps already did - that disc labels can be super fickle.
I spent around six build cycles trying to map the disc based on ID / UUID / label, by the disc and by the partition, and during the boot cycle none of these picked up. I have no idea how these labels are applied during phase one of nixos boot.
The piece I did know of is that my specific computer has an internal disc bay,
and that shows up as /dev/nvme0n1. This had been my final chance at a boot sequence.
I re-coded the nix machine command to take an aim,
and placed each partition according to the aim of the disc.
This means that the boot is going to be much less generic than I'd hoped,
and I do plan to come back to this problem,
and look deeper at the phase one boot sequence.
For now, I chose nvme0n1p from the menu and imaged the disc.
I ran shutdown now, opened the disc enclosure and my laptop,
and exchanged the memory cards.
I pressed the power cycle, and held my breath.
The disc booted.
Epilogue.
For the remainder of the day I'm going to be in manual mode, because I need to be on the road soon as possible (as usual!). My main issues are to hook up networking, and then drop nebula certificates onto the machine so I can log in from anyplace.
From there, I'll be able to redeploy the app and share this saga with you all. Look for the code on gram:nue/flash.nu.