GPU

NVIDIA RTX 5070 Ti setup: PCIe passthrough, Talos extensions, time-slicing.

Hardware

Node: k8s-worker4 (192.168.1.224) GPU: NVIDIA RTX 5070 Ti — 16 GB GDDR7 VRAM PCIe ID: 0000:09:00.0 (passthrough to VM) vCPUs: 8 cores (dedicated AI node) RAM: 16 GiB (16384 MB) Disk: 250 GiB (extra space for AI model volumes)

Talos GPU Extensions

The GPU worker uses a custom Talos image with two additional system extensions:

Extension	Purpose
`nvidia-open-gpu-kernel-modules-production`	Open-source NVIDIA kernel driver (loaded as kernel modules, not compiled)
`nvidia-container-toolkit-production`	Container runtime hook — configures containerd CDI automatically

These extensions are baked into the Talos image at boot. No machine.files drop-ins are needed — the container toolkit extension configures containerd automatically. (Talos v1.10+ restricts machine.files writes to /var; /etc/cri/conf.d/ is not writable.)

Time-Slicing

A single physical GPU is shared across multiple workloads (Ollama, ComfyUI, Kubeflow notebooks, training jobs) using NVIDIA GPU time-slicing. This is configured inline in the nvidia-device-plugin Helm values:

sharing:
  timeSlicing:
    resources:
      - name: nvidia.com/gpu
        replicas: 5

Result: the node advertises 5 virtual nvidia.com/gpu resources from 1 physical GPU. VRAM is shared (not partitioned), so all workloads compete for the 16 GB pool.

Resource Requests

Workload	vCPU limit	RAM Request	RAM Limit	GPU
Ollama	4000m	2 Gi	4 Gi	1
ComfyUI	4000m	1 Gi	8 Gi	1

GPU Workload Configuration

All GPU workloads use nodeSelector: nvidia.com/gpu.present: "true" to pin to k8s-worker4. runtimeClassName: nvidia is required — without it the NVIDIA container hook does not fire and CUDA is inaccessible even with the nvidia.com/gpu resource requested.

runtimeClassName: nvidia
nodeSelector:
  nvidia.com/gpu.present: "true"
resources:
  limits:
    nvidia.com/gpu: "1"
env:
  - name: NVIDIA_VISIBLE_DEVICES
    value: all

Kernel Modules

The GPU worker's Talos machine patch loads these modules at boot:

nvidia
nvidia_uvm
nvidia_drm
nvidia_modeset

DCGM Exporter

A DCGM Exporter DaemonSet runs on the GPU node to export GPU metrics (utilisation, VRAM usage, temperature, power draw) to VictoriaMetrics via VMAgent. It also creates a Grafana dashboard ConfigMap.

# Check GPU metrics in Grafana: look for the DCGM dashboard
# Or query directly:
kubectl get pods -n nvidia-gpu-operator -l app=dcgm-exporter

Edit this page

GPU