OpenTelemetry

Two-tier OTel collection pipeline: DaemonSet agent per node and Gateway deployment.

What is OpenTelemetry?

OpenTelemetry (OTel) is a vendor-neutral observability framework for collecting metrics, logs, and traces. The OpenTelemetry Collector is a configurable pipeline that receives telemetry data, processes it, and exports it to backends.

Why OpenTelemetry?

OTel provides a unified collection pipeline for both metrics and logs, replacing separate tools like Fluent Bit and Prometheus exporters. Using the otel/opentelemetry-collector-contrib image provides access to all Kubernetes-specific receivers (kubeletstats, k8s_cluster, filelog, k8sobjects) in a single binary.

How It's Used Here

Two OTel Collector instances work together:

InstanceModePurpose
otel-agentDaemonSetRuns on every node; collects container logs, kubelet metrics, host metrics
otel-gatewayDeploymentCluster-scoped; collects k8s resource metrics and events

Both export:

  • Metrics → VictoriaMetrics via Prometheus remote-write
  • Logs → VictoriaLogs via OTLP/HTTP

Source: workloads/observability/otel_collector.go

Agent DaemonSet Configuration

Runs on every node (including control plane, via tolerations: [{operator: Exists}]).

PresetWhat it configures
logsCollectionfilelog receiver watching /var/log/pods/; enriches with k8s metadata
kubeletMetricsNode, pod, container, volume metrics from kubelet /stats/summary
hostMetricsCPU, memory, disk, network from the host OS
kubernetesAttributesEnriches all telemetry with k8s.pod.name, k8s.namespace.name, k8s.node.name, etc.

Pipelines:

pipelines:
  logs:
    receivers:  [filelog]
    processors: [memory_limiter, k8sattributes, batch]
    exporters:  [otlphttp/logs]     # → VictoriaLogs
  metrics:
    receivers:  [kubeletstats, hostmetrics]
    processors: [memory_limiter, k8sattributes, batch]
    exporters:  [prometheusremotewrite]   # → VictoriaMetrics

Namespace pod security: privileged — required for host metric collection and /var/log/pods access on Talos.

Gateway Deployment Configuration

Runs as a single-replica Deployment (cluster-scoped receivers don't need multiple instances).

PresetWhat it configures
clusterMetricsk8s_cluster receiver: node/pod/deployment/namespace resource metrics
kubernetesEventsk8sobjects receiver: Kubernetes events forwarded as log records
kubernetesAttributesEnriches with k8s metadata

Pipelines:

pipelines:
  metrics:
    receivers:  [k8s_cluster]
    processors: [memory_limiter, k8sattributes, batch]
    exporters:  [prometheusremotewrite]
  logs:
    receivers:  [k8sobjects]
    processors: [memory_limiter, batch]
    exporters:  [otlphttp/logs]

Exporter Endpoints

ExporterEndpointData
prometheusremotewritehttp://victoria-metrics-victoria-metrics-cluster-vminsert.victoria-metrics.svc.cluster.local:8480/insert/0/prometheus/api/v1/writeMetrics
otlphttp/logshttp://victoria-logs-victoria-logs-single-server.victoria-logs.svc.cluster.local:9428/insert/opentelemetryLogs

Common Processors

ProcessorConfigPurpose
batchtimeout: 10sBatch telemetry to reduce write pressure
memory_limiterlimit: 80%, spike: 25%Prevent OOM

Falco Integration

Falco outputs JSON alerts to stdout. The OTel Agent's filelog receiver collects these from the Falco DaemonSet pod logs (/var/log/pods/falco_*/**/*.log) and forwards them to VictoriaLogs. This allows querying Falco security events in Grafana alongside application logs.

How It Connects

Every node (Agent DaemonSet)
  ← Container logs from /var/log/pods/
  ← kubelet metrics from :10250/stats/summary
  ← Host metrics (CPU, memory, disk, network)
  → VictoriaMetrics (metrics)
  → VictoriaLogs (logs)

One node (Gateway Deployment)
  ← Kubernetes API (cluster metrics, events)
  → VictoriaMetrics (k8s resource metrics)
  → VictoriaLogs (k8s events as logs)

Troubleshooting

Agent Not Collecting Logs

# Check agent is running on all nodes
kubectl get pods -n opentelemetry -o wide

# Check for file permission errors
kubectl logs -n opentelemetry -l app.kubernetes.io/name=otel-agent --tail=50

Missing Metrics in VictoriaMetrics

# Check remote-write errors
kubectl logs -n opentelemetry -l app.kubernetes.io/name=otel-agent | grep "remote_write\|error"

# Verify vminsert is reachable
kubectl exec -n opentelemetry <agent-pod> -- \
  curl -s -o /dev/null -w "%{http_code}" \
  http://victoria-metrics-victoria-metrics-cluster-vminsert.victoria-metrics.svc.cluster.local:8480