Kubecon 2024 Paris: Enhancing AI/ML Workloads in Kubernetes - Managing GPU Nodes and Workloads

Mar 29, 2024

At Kubecon 2024 in Paris, one of the central themes revolved around supporting AI/ML workloads within Kubernetes. A critical aspect of achieving this lies in effectively managing GPU nodes and workloads at scale.

When dealing with GPU nodes on a large scale, several considerations come into play:

Different types of GPUs: Understanding the variety of GPUs available is essential. Different models have varying capabilities, memory, and performance characteristics.
GPU sharing: Efficiently sharing GPUs among multiple workloads is crucial. This involves allocating GPU resources to different pods while ensuring fair distribution and isolation.
Scheduling workloads: Properly scheduling GPU specific workloads to nodes with free GPU resources ensures optimal utilization.
Installing/updating GPU drivers: Keep in mind that driver for GPU can be updated only if none of workloads is using GPU at that time.
Updating Runtime for GPU Support: The container runtime (such as Docker or containerd) must be configured to expose GPU capabilities.
Monitoring GPU Usage: Implement monitoring and alerting to track GPU utilization and potential performance issues.

Requirements to support GPU

Running AI/ML workloads in Kubernetes necessitates specific conditions:

GPU-Equipped Nodes
Installed GPU Drivers
Updated Container Runtime: Update the container runtime to support GPU features.

But how can we expose such device for Kubernetes scheduler and request a GPU from application perspective?

Option 1: Device Plugin API

Kubernetes offers a Device Plugin API that allows registering GPU devices and exposing them as allocatable resources (similar to CPU and memory).

Docs: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/

How to enable support for GPU?

To enable GPU support in Kubernetes seamlessly, follow these steps:

Install the NVIDIA GPU Operator: it simplifies GPU management within Kubernetes
- GitHub: NVIDIA GPU Operator
- Documentation: NVIDIA GPU Operator Docs
Requesting GPUs for Pods:

resources:
  limits:
    nvidia.com/gpu: 1

Option 2: DRA (Alpha)

Additionally, consider the Dynamic Resource Allocation (DRA) feature, currently in alpha. DRA provides more well-defined resource scheduling and sharing mechanisms.

Docs: https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/

By default, one GPU is reserved for a single application pod, even if it doesn't fully utilize the GPU's resources. However, there are options for sharing a GPU among multiple pods:

Time-slicing - https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html
MPS (Multi-Process Service) - https://docs.nvidia.com/deploy/mps/index.html
MIG (Multi-Instance GPU) - https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/23.6.1/gpu-operator-mig.html

Remember that all these modes are supported by the NVIDIA GPU Operator, making GPU management in Kubernetes more efficient and flexible.

Scheduling

Using resources for application to get GPU can be sufficient at start however at scale where different GPUs are available with different driver versions a more robust solution may be required.

For that, NFD Operator (Node Feature Discovery), SIG Kubernetes project, can be used to expose particular devices and their options/features as node labels - which may be used in selecting nodes for the workload.

Project: https://github.com/kubernetes-sigs/node-feature-discovery