GPU

NVIDIA RTX 5070 Ti setup: PCIe passthrough, Talos extensions, time-slicing.

Hardware

Node: k8s-worker4 (192.168.1.224) GPU: NVIDIA RTX 5070 Ti — 16 GB GDDR7 VRAM PCIe ID: 0000:09:00.0 (passthrough to VM) vCPUs: 8 cores (dedicated AI node) RAM: 16 GiB (16384 MB) Disk: 250 GiB (extra space for AI model volumes)

Talos GPU Extensions

The GPU worker uses a custom Talos image with two additional system extensions:

ExtensionPurpose
nvidia-open-gpu-kernel-modules-productionOpen-source NVIDIA kernel driver (loaded as kernel modules, not compiled)
nvidia-container-toolkit-productionContainer runtime hook — configures containerd CDI automatically

These extensions are baked into the Talos image at boot. No machine.files drop-ins are needed — the container toolkit extension configures containerd automatically. (Talos v1.10+ restricts machine.files writes to /var; /etc/cri/conf.d/ is not writable.)

Time-Slicing

A single physical GPU is shared across multiple workloads (Ollama, ComfyUI, Kubeflow notebooks, training jobs) using NVIDIA GPU time-slicing. This is configured inline in the nvidia-device-plugin Helm values:

sharing:
  timeSlicing:
    resources:
      - name: nvidia.com/gpu
        replicas: 5

Result: the node advertises 5 virtual nvidia.com/gpu resources from 1 physical GPU. VRAM is shared (not partitioned), so all workloads compete for the 16 GB pool.

Resource Requests

WorkloadvCPU limitRAM RequestRAM LimitGPU
Ollama4000m2 Gi4 Gi1
ComfyUI4000m1 Gi8 Gi1

GPU Workload Configuration

All GPU workloads use nodeSelector: nvidia.com/gpu.present: "true" to pin to k8s-worker4. runtimeClassName: nvidia is required — without it the NVIDIA container hook does not fire and CUDA is inaccessible even with the nvidia.com/gpu resource requested.

runtimeClassName: nvidia
nodeSelector:
  nvidia.com/gpu.present: "true"
resources:
  limits:
    nvidia.com/gpu: "1"
env:
  - name: NVIDIA_VISIBLE_DEVICES
    value: all

Kernel Modules

The GPU worker's Talos machine patch loads these modules at boot:

nvidia
nvidia_uvm
nvidia_drm
nvidia_modeset

DCGM Exporter

A DCGM Exporter DaemonSet runs on the GPU node to export GPU metrics (utilisation, VRAM usage, temperature, power draw) to VictoriaMetrics via VMAgent. It also creates a Grafana dashboard ConfigMap.

# Check GPU metrics in Grafana: look for the DCGM dashboard
# Or query directly:
kubectl get pods -n nvidia-gpu-operator -l app=dcgm-exporter