Austin Rose
Lab/Home k3s lab

Home k3s lab

LiveUpdated 2026-05-05

The on-prem path. A small k3s cluster (one Apple Silicon Mac Mini control plane, two Raspberry Pi 4 workers, a UNAS Pro NAS) that serves preview.home.austinrose.xyz and acts as the staging gate before AWS and GCP. Same GitOps discipline as the cloud paths, smaller everything.

Mission objective

Run the same GitOps discipline at home that the cloud paths are patterned on, against ARM hardware and a NAS, on my own dollars.

Threats
  • Single-node control plane (the Mini); no HA control-plane today
  • NFS v3 limitation on UNAS: cross-volume hardlinks fail (atomic-move workaround in the *arr stack)
  • Secret sprawl across ~22 ExternalSecrets without per-app audit
  • One bad image promotion silently rolling to preview
Go / No-go criteria
  • Push-to-live under 10 minutes from `git push origin preview` to preview.home.austinrose.xyz
  • Zero secrets in git (SOPS for bootstrap, 1Password Connect via External Secrets Operator for runtime)
  • CNPG WAL backed up to R2 within 15 minutes of write
  • Full cluster rebuild from `ansible/site.yaml` in under an hour
Lessons learned
  • MariaDB service `<cluster>-primary` only exists when replicas > 1; a single-replica pod silently NXDOMAINs and the app falls back to SQLite undetected. birdnet-go burned an evening on this.
  • Per-app restic repos (one S3 subpath each) prevent one app's `forget` from pruning another's snapshots.
  • Flux native image automation beats Renovate for this workload; the contract is `highest run-N` and the controller writes its own commit.
  • UNAS Pro exports NFS v3 only; StorageClass `nfs-unas` must use `nfsvers=3` (no `nconnect`).
Built for this
  • Three-layer architecture in ../home-ops (tofu / ansible / kubernetes-flux), each with its own lifecycle and blast radius
  • Image-promotion contract: this repo emits `:run-N`; home-ops Flux watches and rolls
  • Hybrid SOPS + 1Password secret discipline (6 bootstrap secrets in SOPS, ~22 runtime secrets via External Secrets Operator)
  • Two-tier R2 + UNAS backup strategy with per-app restic isolation
Built on
  • k3s 1.33, Flux 2.8 (flux-operator + FluxInstance)
  • bjw-s/app-template v4 for ~38 HelmReleases
  • CloudNativePG (two clusters), MariaDB Operator, Dragonfly
  • Volsync + restic + Cloudflare R2; UNAS Pro (NFS v3)
  • Traefik, kube-vip, Tailscale Operator, cert-manager, External Secrets Operator, 1Password Connect
Topology
tofu/

External APIs that live outside the cluster lifecycle.

  • Cloudflare (zone settings, DNS, R2 state bucket, Zero Trust tunnel)
  • GitHub (repo settings, branch protection, Actions secrets)
  • Tailscale (ACL HuJSON, split-DNS, OAuth clients)
  • UniFi (static-DNS records for kube-vip ingress)
  • 1Password (item lookups; outputs piped back post-apply)
  • State backend: Cloudflare R2 with native locking
ansible/

Cluster bootstrap from bare hardware to a green k3s cluster.

  • 25 roles; ordered playbooks 00 prerequisites to 99 validate
  • Lima VM provisioning on the Mac Mini control plane
  • Pi prep: cgroups in cmdline.txt, zram, log2ram, swap off
  • k3s server + agent install via the k3s-io/k3s_ansible role
  • Platform stack in dependency order: cert-manager, External Secrets Operator, 1Password Connect, kube-vip, Tailscale Operator, flux-operator
  • Smoke-test pass before handing off to Flux
kubernetes/ + Flux 2.8

GitOps runtime that owns everything from operators to apps.

  • FluxInstance CRD via flux-operator (not bare `flux bootstrap`)
  • GitRepository points at auzroz/home-ops main; Kustomization syncs kubernetes/apps
  • Reconcile interval 10 minutes; cluster-settings ConfigMap supplies substitutions
  • ~38 HelmReleases across 12 namespaces
  • ~22 ExternalSecrets via 1Password Connect through ESO `external-secrets.io/v1`
  • Image Automation watches GHCR for the austinrose-me preview image

Three layers, each with its own lifecycle and blast radius. Source-of-truth lives in the sibling ../home-ops repo; this page is a curated view.

By the numbers
  • 38HelmReleases
  • 12namespaces
  • 22ExternalSecrets
  • 25Ansible roles
  • 5tofu providers
  • 2storage classes
Hardware
  • Mac Mini (Apple Silicon, 16GB)×1control plane

    Debian 13 Trixie arm64 in a Lima VM

    9GB allocated to the VM; Tailscale subnet router runs on the Lima VM, not the Pis.

  • Raspberry Pi 4 (8GB)×2workers

    Pi OS Lite 64-bit Trixie

    USB3 SSD/stick boot; cgroups added to cmdline.txt; log2ram + zram for SD-card protection.

  • UniFi UNAS Pro (40TB usable)×1NAS

    vendor firmware

    NFS v3 only (vendor limitation); offsite replication of select shares to Google Drive.

Storage
  • local-path-mac-miniLocal SSD on the Mac Mini, pinned by nodePath

    Databases, SQLite, app config, small caches

    Default class for stateful workloads with strict consistency requirements.

  • nfs-unasNFS v3 to the UNAS Pro

    App data, media libraries (mounted as-is), Volsync caches

    mountOptions vers=3,proto=tcp,hard,noatime,rsize=1048576,wsize=1048576,timeo=600,retrans=2 (no nconnect; UNAS Pro v3 limitation).

Ingress tiers
  • <app>.austinrose.xyztraefik

    Public via Cloudflared Zero Trust tunnel

    Cert: cert-manager (letsencrypt-prod, DNS-01 via Cloudflare)

  • <app>.home.austinrose.xyztraefik

    Private LAN plus tailnet

    Cert: cert-manager (shared wildcard via DNS-01)

  • <app>.<tailnet>.ts.nettailscale

    Tailnet only (legacy, marked for cleanup)

    Cert: Tailscale auto-provisioned

Workloads
automation2
  • n8n

    Workflow automation; Postgres-backed (shared-postgres); pinned to Mini; public webhooks via Cloudflared tunnel.

  • actions-runner-controller

    Self-hosted GitHub Actions runners; runner scale-set fields out custom builds for the homepage repo.

backups4
  • volsync

    Per-app restic-backed ReplicationSources to R2; copyMethod Direct, pruneIntervalDays 7.

  • restic-rest-server

    In-cluster restic REST endpoint backed by NFS for bulk Volsync repos.

  • reflector

    Cross-namespace replication of base secrets and configmaps so per-app namespaces stay self-contained.

  • reloader

    Watches ConfigMap and Secret changes; restarts pods on update so config rotation is uneventful.

databases7
  • cloudnative-pg

    CNPG operator and Cluster CRDs; the relational substrate for the cluster.

  • shared-postgres

    PostgreSQL 16 cluster (single replica, pinned to Mini) for Paperless, Mealie, n8n; ScheduledBackup to R2.

  • immich-postgres

    PostgreSQL 16 cluster with VectorChord extension for Immich face/CLIP search; ScheduledBackup to R2.

  • mariadb-operator

    MariaDB operator + CR; single-instance cluster pinned to Mini for birdnet-go.

  • mariadb-cluster

    Single-instance MariaDB; ScheduledBackup to NFS nightly.

  • dragonfly

    Redis-API drop-in cache for Immich; pinned to Mini; small footprint.

  • phpmyadmin

    Web admin UI for the MariaDB cluster.

documents3
  • paperless

    Document scan + OCR + tagging; index on local-path, media on NFS; Volsync daily.

  • mealie

    Recipe manager backed by shared-postgres; Volsync covers app state.

  • homebox

    Household inventory; SQLite on NFS; Volsync every 6 hours.

media6
  • plex

    Media server; metadata on local-path, libraries mounted as-is from NFS shares; Pi-ineligible.

  • tautulli

    Plex monitoring and notifications.

  • sonarr

    TV automation; hardlinks disabled (NFS volume isolation), atomic-move used instead.

  • radarr

    Movie automation; same hardlink-free posture as Sonarr.

  • sabnzbd

    Usenet downloader; downloads to NFS, no seeding semantics.

  • prowlarr

    Indexer aggregator for the *arr suite.

monitoring1
  • birdnet-go

    ML bird-call classifier from a backyard mic; persists to MariaDB.

network2
  • traefik-overrides

    Overrides only; k3s default Traefik kept (not replaced with nginx).

  • cloudflared

    Zero Trust tunnel for public ingress; rules in tofu/cloudflare/tunnel.tf.

observability6
  • loki

    Log aggregation; index on local-path, chunks via Volsync.

  • grafana

    Visualization; backed by shared-postgres for state.

  • alloy

    DaemonSet metrics + log forwarder; ships to Loki.

  • gatus

    Endpoint health checker; lightweight status board.

  • headlamp

    Browser-based Kubernetes admin UI for cluster inspection.

  • homepage

    Dashboard linking to the cluster's other surfaces; not the same homepage as this site.

photos2
  • immich

    Self-hosted photo backup; immich-postgres for vector search, Dragonfly for caching, NFS RWX library.

  • immich-public-proxy

    Public-share proxy for Immich albums via Cloudflared tunnel.

portfolio1
  • austinrose-me

    This site, on the home cluster path. Static nginx; image promoted by Flux Image Automation from GHCR.

registry1
  • zot

    OCI-compliant container registry on NFS; reduces external pulls for in-cluster images.

tools2
  • rustdesk

    Self-hosted remote desktop server; keypair on local-path, exposed via TailVIP.

  • scrypted

    HomeKit bridge; hostNetwork for Bonjour/mDNS; LevelDB on NFS, single-replica Recreate rollout.

Observability
  • Alloy

    DaemonSet metrics collector and log forwarder; ships to Loki and Prometheus when configured

  • Loki

    Log aggregation; queried via Grafana

  • Grafana

    Visualization; admin UI on the private hostname tier

  • Gatus

    Endpoint health checker (datasource not yet wired)

  • Headlamp

    Kubernetes admin UI for direct API access

  • Homepage

    Dashboard with links to the rest of the cluster's surfaces

Secrets
Tier 0
6 bootstrap secrets in ansible/inventory/group_vars/all.sops.yaml (SOPS + age)
Tier 1
~22 runtime secrets via 1Password Connect via External Secrets Operator (`external-secrets.io/v1`)
Encryption
SOPS + age for tier 0 (public key in .sops.yaml, private key off-cluster); ESO ClusterSecretStore for tier 1

No sealed-secrets, no external KMS. Two layers handle two different problems: SOPS bootstraps the cluster (including the 1P Connect credentials themselves), ESO owns runtime secrets for the apps. Fewer moving parts and a clean separation between bootstrap and steady state.

Backups
Critical
Cloudflare R2 · CNPG WAL plus base (continuous), tofu state, rustdesk keypair, Scrypted HomeKit pairing, n8n credentials. Total ~1GB.
Bulk
UNAS via in-cluster restic-rest-server · Plex config, Loki chunks, *arr configs, app data, Volsync repos. Total ~10GB.
Offsite
UNAS replicates select shares to Google Drive on a manual cadence.
Isolation
Per-app restic repositories (one S3 subpath each) prevent one app's `forget` from pruning another's snapshots.
Worked example · how this site reaches the home cluster
  1. 01

    `git push origin preview` to auzroz/austinrose-me.

  2. 02

    deploy-preview.yml builds the Next.js static export on the runner; Puppeteer renders the dual-mode CV PDFs in the same job.

  3. 03

    Multi-arch (amd64 + arm64) container build; push to private ghcr.io/auzroz/austinrose-me with `:latest`, `:run-N`, `:<sha>` tags.

  4. 04

    Flux ImageRepository in home-ops polls GHCR every 5 minutes (read PAT from 1Password).

  5. 05

    ImagePolicy selects the highest `run-N` (numeric ascending).

  6. 06

    ImageUpdateAutomation commits the tag bump to auzroz/home-ops main using a fine-grained push PAT from 1Password.

  7. 07

    Flux GitRepository reconciles the new commit; HelmRelease in the portfolio namespace rolls.

  8. 08

    Traefik on kube-vip (LAN plus tailnet) serves the new pod; cert-manager wildcard cert handles TLS.

  9. 09

    Container entrypoint stamps /origin.json with `{cloud: k3s, region: lab}`; the footer reflects it.

Cross-link: /lab/platform documents the workflows on the source side of this contract.