Longhorn

Distributed block storage for Kubernetes on all 4 worker nodes.

Overview

PropertyValue
CDK8s fileplatform/cdk8s/cots/storage/longhorn.go
Namespacelonghorn-system
HTTPRouteNone
UINo (internal Longhorn UI exists but not exposed)
NodesAll 4 workers (k8s-worker1–4)

Purpose

Longhorn provides distributed block storage (RWO PVCs) for the cluster. It replicates volumes across worker nodes for resilience and provides the longhorn StorageClass.

Storage Capacity

NodeDiskUsable
k8s-worker1–4125 GiB~100 GiB each

Total raw capacity: ~400 GiB across 4 nodes (with replication factor 3, effective capacity ~133 GiB).

Overprovisioning

storageOverProvisioningPercentage: 200

This allows 240 GiB scheduled per 120 GiB physical disk (200% = 2x overprovisioning). The default of 100% only allowed ~5 GiB headroom per node — insufficient for large AI PVCs (ComfyUI uses 100 Gi).

Talos Configuration

SettingValueReason
createDefaultDiskLabeledNodes: false(n/a — creates on all nodes)All workers provision Longhorn disk
preUpgradeChecker.jobEnabled: falseDisabledAvoids GitOps upgrade check conflicts

The longhorn-system namespace has pod-security.kubernetes.io/enforce: privileged for the Longhorn DaemonSet (requires host mounts).

Node Labels

All workers carry node.longhorn.io/create-default-disk: config applied via the Talos worker machine patch in Pulumi. This tells Longhorn to provision its disk on every worker node automatically.

Checking Volume Health

# List all Longhorn volumes
kubectl get volumes.longhorn.io -n longhorn-system

# Check volume state and robustness
kubectl get volumes.longhorn.io -n longhorn-system \
  -o custom-columns="NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness"

Healthy volumes show STATE=attached and ROBUSTNESS=healthy.