Longhorn

Distributed block storage for Kubernetes on all 4 worker nodes.

What is Longhorn?

Longhorn is a lightweight, cloud-native distributed block storage system for Kubernetes. It provides persistent volumes that are automatically replicated across multiple nodes, with a built-in UI for volume management, snapshots, and backups.

Why Longhorn?

OptionProsCons
LonghornLightweight, no dedicated disks needed, RWX via NFS share-managerSingle vendor (Rancher/SUSE)
Rook-CephProduction-grade, feature-richRequires dedicated disks, complex setup, heavy resource usage
NFSSimpleNo HA, single point of failure
Local-pathZero overheadNo replication, node-local only

For a 4-worker homelab without dedicated storage disks, Longhorn's ability to use a portion of each worker's OS disk makes it the right choice. Rook-Ceph would require separate disk devices on each node.

How It's Used Here

Longhorn provides the longhorn StorageClass used by virtually every stateful workload:

  • Harbor (registry + jobservice — RWX)
  • Grafana (dashboards — RWX)
  • VictoriaMetrics (metrics storage — RWO)
  • VictoriaLogs (log storage — RWO)
  • Ollama (model storage — RWO, 100 Gi)
  • ComfyUI (models + outputs — RWX, 50 Gi)
  • n8n (workflow data — RWX)
  • OpenBao (secrets data — RWO)
  • NetBird (config — RWO, 100 Mi)

RWX volumes use Longhorn's built-in NFS share-manager — Longhorn automatically provisions an NFS server pod for each RWX volume. This eliminates rolling update deadlocks that occur with RWO volumes when a new pod starts before the old pod releases the volume.

Source: workloads/storage/longhorn.go

Configuration

SettingValueWhy
defaultReplicaCount3Replicate every volume across 3 nodes for resilience
storageOverProvisioningPercentage300Allow 3x scheduling vs physical capacity
createDefaultDiskLabeledNodesfalseCreate disk on ALL nodes, not just labelled ones
preUpgradeChecker.jobEnabledfalseDisable pre-upgrade hook for GitOps compatibility
Namespace PSAprivilegedLonghorn CSI driver requires host mounts

Why 300% overprovisioning? Workers have ~90–95 GiB actually free. At 200% (the previous setting), the 240 GiB scheduling cap was exhausted — there was no room for new 50 GiB AI model replicas. At 300%, each node schedules up to ~285 GiB, providing ~150 GiB of headroom for AI volumes and future growth. Longhorn does not actually allocate all scheduled space immediately; volumes grow on demand.

Storage Capacity

NodeDiskUsable for Longhorn
k8s-worker1200 GiB~90–95 GiB
k8s-worker2200 GiB~90–95 GiB
k8s-worker3200 GiB~90–95 GiB
k8s-worker4 (GPU)250 GiB~90–95 GiB

With replication factor 3 and 4 nodes, each volume's data is replicated to 3 of the 4 workers.

How It Connects

Pod (any namespace)
  → PVC with storageClassName: longhorn
  → Longhorn CSI Driver (DaemonSet on each node)
  → Longhorn Manager (coordinates replicas)
  → Volume replicated across 3 workers
  → (RWX) Longhorn NFS share-manager pod bridges multiple pod attachments

Screenshots

Longhorn volume list showing attached volumes and replica health

Troubleshooting

Volume Stuck Attaching

Symptoms: PVC stays in Pending or pod stuck in ContainerCreating.

Diagnosis:

kubectl get volumes.longhorn.io -n longhorn-system
kubectl describe pvc <pvc-name> -n <namespace>
kubectl get pods -n longhorn-system | grep share-manager  # for RWX

Fix:

# Force detach the volume in Longhorn UI, then delete the stuck pod
kubectl delete pod -n <namespace> <pod-name> --grace-period=0 --force

Degraded Replicas

Symptoms: kubectl get volumes.longhorn.io shows ROBUSTNESS=degraded.

Diagnosis:

kubectl get volumes.longhorn.io -n longhorn-system \
  -o custom-columns="NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness"

Fix: Check if a worker node is offline. If a node is down, Longhorn will rebuild the degraded replica on another available node automatically once it can schedule. This may take several minutes.

RWO Multi-Attach Deadlock

Symptoms: Pod stuck with Multi-Attach error for volume — volume is already used by pod.

Why this happens: RWO (ReadWriteOnce) volumes can only be attached to one node at a time. During a rolling update, the new pod may start on a different node before the old pod fully terminates and releases the volume.

Fix for Harbor (and similar):

# 1. Find the old ReplicaSet
kubectl get replicasets -n harbor

# 2. Scale down the old RS
kubectl scale replicaset <old-rs-name> -n harbor --replicas=0

# 3. Force delete any stuck pod
kubectl delete pod -n harbor <stuck-pod> --grace-period=0 --force

Long-term fix: Switch the PVC to ReadWriteMany — Harbor's registry and jobservice PVCs are already set to RWX in this homelab for exactly this reason.

Checking Volume Health

# List all Longhorn volumes
kubectl get volumes.longhorn.io -n longhorn-system

# Check volume state and robustness
kubectl get volumes.longhorn.io -n longhorn-system \
  -o custom-columns="NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness"

# Healthy volumes show:
# STATE=attached   ROBUSTNESS=healthy