Kubernetes
non-privileged pod
/dev/kvm
container security
virtualization

How to mount /dev/kvm in a non-privileged pod?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

/dev/kvm is the Linux kernel device that provides hardware-accelerated virtualization (KVM). Mounting it inside a Kubernetes pod allows running VMs or Android emulators inside containers. By default, non-privileged pods cannot access host devices. The solution is to use a device plugin (like the kubevirt/device-plugins KVM plugin), a hostPath volume with targeted security context capabilities, or a privileged init container that sets up device access.

Why You Need /dev/kvm

KVM (Kernel-based Virtual Machine) provides hardware virtualization support. Common use cases inside pods include:

  • Running Android emulators in CI/CD pipelines
  • Nested virtualization for testing
  • Running QEMU/KVM virtual machines
  • Hardware-accelerated testing environments

Without /dev/kvm, these workloads fall back to software emulation, which is 10-50x slower.

The KVM device plugin exposes /dev/kvm as a schedulable Kubernetes resource:

bash
1# Deploy the KVM device plugin DaemonSet
2kubectl apply -f https://raw.githubusercontent.com/kubevirt/kubernetes-device-plugins/main/manifests/kvm-ds.yml
3
4# Verify the plugin is running
5kubectl get daemonset -n kube-system kvm-device-plugin

Request the device in your pod spec:

yaml
1apiVersion: v1
2kind: Pod
3metadata:
4  name: vm-runner
5spec:
6  containers:
7    - name: emulator
8      image: my-android-emulator:latest
9      resources:
10        limits:
11          devices.kubevirt.io/kvm: "1"
12        requests:
13          devices.kubevirt.io/kvm: "1"

The device plugin automatically mounts /dev/kvm into the container and sets the correct permissions. No privileged mode or extra capabilities are needed.

Solution 2: hostPath Volume with Security Context

If you cannot use a device plugin, mount /dev/kvm as a hostPath volume:

yaml
1apiVersion: v1
2kind: Pod
3metadata:
4  name: kvm-pod
5spec:
6  containers:
7    - name: kvm-container
8      image: my-vm-image:latest
9      securityContext:
10        runAsUser: 0        # Or the UID of the kvm group
11        capabilities:
12          add: ["SYS_RAWIO"]
13      volumeMounts:
14        - name: kvm
15          mountPath: /dev/kvm
16      resources:
17        limits:
18          cpu: "2"
19          memory: "4Gi"
20  volumes:
21    - name: kvm
22      hostPath:
23        path: /dev/kvm
24        type: CharDevice

To avoid running as root, add the container user to the kvm group on the host:

yaml
1securityContext:
2  runAsUser: 1000
3  runAsGroup: 107    # kvm group GID — check with: getent group kvm
4  supplementalGroups: [107]

Solution 3: Init Container for Device Setup

Use a privileged init container that only sets up device permissions, keeping the main container non-privileged:

yaml
1apiVersion: v1
2kind: Pod
3metadata:
4  name: kvm-pod
5spec:
6  initContainers:
7    - name: setup-kvm
8      image: busybox
9      command: ["sh", "-c", "chmod 666 /dev/kvm"]
10      securityContext:
11        privileged: true
12      volumeMounts:
13        - name: kvm
14          mountPath: /dev/kvm
15  containers:
16    - name: emulator
17      image: my-android-emulator:latest
18      securityContext:
19        runAsUser: 1000
20        runAsGroup: 1000
21      volumeMounts:
22        - name: kvm
23          mountPath: /dev/kvm
24  volumes:
25    - name: kvm
26      hostPath:
27        path: /dev/kvm
28        type: CharDevice

Node Affinity for KVM Nodes

Not all nodes have /dev/kvm available. Use node affinity to schedule KVM pods on capable nodes:

yaml
1apiVersion: v1
2kind: Pod
3metadata:
4  name: kvm-pod
5spec:
6  affinity:
7    nodeAffinity:
8      requiredDuringSchedulingIgnoredDuringExecution:
9        nodeSelectorTerms:
10          - matchExpressions:
11              - key: kvm-enabled
12                operator: In
13                values: ["true"]
14  # ... container spec

Label KVM-capable nodes:

bash
1# Check if the node has /dev/kvm
2ssh node01 'ls -la /dev/kvm'
3
4# Label the node
5kubectl label node node01 kvm-enabled=true

Pod Security Standards

With PodSecurityPolicy deprecated since Kubernetes 1.25, use Pod Security Admission:

yaml
1# Namespace label to allow hostPath volumes
2apiVersion: v1
3kind: Namespace
4metadata:
5  name: kvm-workloads
6  labels:
7    pod-security.kubernetes.io/enforce: privileged
8    pod-security.kubernetes.io/warn: privileged

For more granular control, use a policy engine like Kyverno or OPA Gatekeeper:

yaml
1# Kyverno policy — allow only /dev/kvm hostPath
2apiVersion: kyverno.io/v1
3kind: ClusterPolicy
4metadata:
5  name: allow-kvm-device
6spec:
7  validationFailureAction: Enforce
8  rules:
9    - name: restrict-hostpath
10      match:
11        resources:
12          kinds: ["Pod"]
13          namespaces: ["kvm-workloads"]
14      validate:
15        message: "Only /dev/kvm hostPath is allowed"
16        pattern:
17          spec:
18            volumes:
19              - hostPath:
20                  path: "/dev/kvm"

Verifying KVM Access Inside the Pod

bash
1# Exec into the pod and test
2kubectl exec -it kvm-pod -- /bin/sh
3
4# Check device exists
5ls -la /dev/kvm
6# crw-rw---- 1 root kvm 10, 232 ...
7
8# Test KVM access
9apt-get install -y qemu-system-x86
10qemu-system-x86_64 -enable-kvm -nographic -no-reboot -m 128 \
11  -kernel /boot/vmlinuz -append "console=ttyS0" 2>&1 | head -5
12
13# Or check with a simple capability test
14[ -w /dev/kvm ] && echo "KVM accessible" || echo "KVM not accessible"

Common Pitfalls

  • Using privileged mode for the entire pod: Setting privileged: true on the main container gives it full access to all host devices and capabilities, which is a major security risk. Use the device plugin approach or a privileged init container with a non-privileged main container instead.
  • Not checking node KVM support: Scheduling a KVM pod on a node without /dev/kvm (e.g., a VM-based node without nested virtualization) causes the pod to crash. Always use node affinity or the device plugin, which automatically schedules pods only on KVM-capable nodes.
  • Wrong group ID for /dev/kvm: The kvm group GID varies across Linux distributions (commonly 107, 109, or 36). Hardcoding the wrong GID in runAsGroup or supplementalGroups causes permission denied errors. Check the actual GID on the node with getent group kvm.
  • Ignoring Pod Security Admission: In Kubernetes 1.25+, namespaces with baseline or restricted pod security standards reject pods with hostPath volumes. Set the namespace to privileged enforcement level or use the device plugin, which does not require hostPath.
  • Not setting resource limits: KVM workloads (VMs, emulators) are resource-intensive. Without CPU and memory limits, a single pod can starve other workloads on the node. Always set resource requests and limits for KVM pods.

Summary

  • Use the KVM device plugin (devices.kubevirt.io/kvm) for the cleanest, most secure approach — no privileged mode needed
  • Alternative: mount /dev/kvm as a hostPath volume with type: CharDevice and appropriate group permissions
  • Use a privileged init container to set device permissions while keeping the main container non-privileged
  • Schedule KVM pods on capable nodes with node affinity labels or device plugin auto-scheduling
  • Use Pod Security Admission (replacing deprecated PodSecurityPolicy) to control hostPath access at the namespace level

Course illustration
Course illustration

All Rights Reserved.