Amd Gpu: Complete Guide - Progressive Robot

Introduction

As cloud-native applications grow in scale and sophistication, the ability to dynamically allocate computing resources becomes critical—especially for workloads that require GPUs. Kubernetes provides built-in autoscaling, but it often falls short when you need to scale based on external or custom metrics. This is where KEDA (Kubernetes Event-Driven Autoscaling) excels. KEDA allows your Kubernetes workloads to scale based on real-time metrics from sources like Prometheus.

In this tutorial, we'll walk through how to autoscale an AMD GPU-based workload running on Kubernetes (DOKS) using Prometheus and KEDA. This setup allows you to react to live metrics and optimize GPU utilization efficiently and cost-effectively.

Key Takeaways

Event-Driven GPU Autoscaling: Learn how to use KEDA to autoscale AMD GPU workloads on Kubernetes (DOKS) based on real-time metrics from Prometheus, enabling responsive and cost-efficient scaling beyond standard CPU-based autoscaling.
Separation of Workloads: Discover best practices for isolating GPU and non-GPU workloads using multiple node pools, optimizing resource allocation, and reducing operational costs.
Native AMD GPU Support: Take advantage of DOKS’s built-in support for AMD MI300X GPUs, with the AMD GPU device plugin and metrics exporter enabled by default for seamless integration and observability.
Custom Metrics for Scaling: See how to configure Prometheus to collect custom GPU metrics and use them as triggers for scaling, allowing you to match resource allocation to actual workload demand (e.g., machine learning inference, video processing, real-time analytics).
Hands-On, Step-by-Step Guidance: Follow a practical walkthrough covering cluster setup, node pool configuration, enabling GPU metrics, deploying a sample workload, and implementing KEDA-based autoscaling.
Testing and Validation: Learn how to simulate GPU load, observe autoscaling in action, and verify that new GPU nodes are provisioned and workloads are rescheduled automatically as demand increases.
Production-Ready Patterns: Gain insights into building robust, scalable, and cost-effective GPU-powered applications on Kubernetes using open-source tools and cloud-native best practices.

By the end of this tutorial, you’ll be able to deploy and autoscale GPU workloads on DOKS with confidence, leveraging event-driven scaling to optimize performance and cost for demanding AI/ML and compute-intensive applications.

Prerequisites

A cloud account with access to DOKS cluster.
A DO API token with write permissions
Basic knowledge of Kubernetes and Helm

Step 1 - Create a DOKS Cluster with AMD GPU Support

You can create a new DOKS cluster by clicking the Create button on the top-right of your the cloud provider dashboard, or by navigating to Resources within your project and selecting:

Kubernetes → Create a Kubernetes Cluster.

Choose Kubernetes Version and Region
Kubernetes Version: Select the latest stable version to ensure security and feature parity.
Region: Choose a region that supports GPU Droplets—TOR1, NYC2, or ATL1 are recommended if you're planning to deploy AMD MI300X or other GPU workloads.

Next, specify the VPC and the service networks
Once you've named your cluster and chosen a region, the next step is to configure a node pool—this is where your applications will run.

Note: We recommend creating two node pools

Node Pool #1: For GPU Workloads

GPU Type: Under *Choose cluster capacity*, go to the GPU tab.
Machine Type: Select a GPU Droplet, such as AMD MI300X.
Min/Max Nodes: Set a minimum of 0 (to save cost when idle) and a higher maximum depending on workload needs.

Node Pool #2: For General/Peripheral Workloads

Droplet Type: Choose Standard, Dedicated CPU, or Memory-Optimized, depending on your app needs.
Purpose: This pool is ideal for running Prometheus, dashboards (e.g., Grafana), KEDA, web APIs as well as kube-system components.
Autoscaling: You can optionally enable autoscaling as well.

Why two node pools?

Isolating GPU and non-GPU workloads allows better scaling control, reduces resource waste, and avoids scheduling conflicts. Peripheral services don’t need expensive GPU nodes.

Note: For AMD GPUs, the AMD GPU device plugin is enabled by default.

Step 2 - Configure Node Pools

Review your selections.
Click Create Kubernetes Cluster.

Note: For AMD GPUs, the AMD GPU device plugin is enabled by default. Enable the AMD GPU Device Metrics Exporter then, you can make use of the API

				
					 -H "Content-Type: application/json" \
 -H "Authorization: Bearer &lt;your-api-token&gt;" \
 -d '{"name": "&lt;your_cluster_name.", "amd_gpu_device_metrics_exporter_plugin": {"enabled": true}}' \
 "https://kubernetes.io/docs/home/;"

Step 3 - Install the kube-prometheus-stack via Helm

Prometheus will monitor and collect the custom metrics needed by KEDA.

				
					helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack -n kube-system

Step 3: Install KEDA with Helm KEDA will evaluate Prometheus metrics and scale your workloads accordingly.

				
					helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda

Step 4 - Deploy a GPU Workload with Custom Metrics

We’ll deploy a sample Go application that exposes a Prometheus gauge metric. This workload requests a GPU and will be scaled based on the metric value.

Create a file named resources.yaml and paste the following manifests:

				
					apiVersion: v1
kind: ConfigMap
metadata:
 name: go-program
data:
 main.go: |
 package main

 import (
 "log"
 "net/http"
 "strconv"
 "github.com/prometheus/client_golang/prometheus"
 "github.com/prometheus/client_golang/prometheus/promhttp"
 )
 
 var customGauge = prometheus.NewGauge(prometheus.GaugeOpts{
 Name: "my_custom_gauge",
 Help: "This is a custom gauge metric",
 })

 func init() {
 prometheus.MustRegister(customGauge)
 }
 
 func handler(w http.ResponseWriter, r *http.Request) {
 value := r.URL.Query().Get("value")
 floatVal, err := strconv.ParseFloat(value, 64)
 if err != nil {
 http.Error(w, "Bad Request: invalid input", http.StatusBadRequest)
 return
 }
 customGauge.Set(floatVal)
 w.Write([]byte("Hello, from the handler!"))
 }

 func main() {
 http.HandleFunc("/", handler)
 http.Handle("/metrics", promhttp.Handler())
 log.Fatal(http.ListenAndServe(":8080", nil))
 }
---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: go-from-configmap
spec:
 replicas: 1
 selector:
 matchLabels:
 app: go-from-configmap
 template:
 metadata:
 labels:
 app: go-from-configmap
 spec:
 nodeSelector:
 www.progressiverobot.com/gpu-brand: amd
 tolerations:
 - key: CriticalAddonsOnly
 operator: Exists
 - key: amd.com/gpu
 operator: Exists
 containers:
 - name: go-runner
 image: golang:1.24
 command: ["/bin/sh", "-c"]
 resources:
 requests:
 amd.com/gpu: "1"
 limits:
 amd.com/gpu: "1"
 ports:
 - containerPort: 8080
 args:
 - |
 set -e 
 cp /config/*.go /app/
 cd /app
 [ -f go.mod ] || go mod init temp
 go mod tidy 
 go run . 
 volumeMounts:
 - name: go-code
 mountPath: /config
 - name: app-dir
 mountPath: /app
 volumes:
 - name: go-code
 configMap:
 name: go-program
 - name: app-dir
 emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
 name: go-from-configmap
 labels:
 app: go-from-configmap
spec:
 selector:
 app: go-from-configmap
 ports:
 - name: metrics
 port: 8080
 targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
 name: go-from-configmap
 labels:
 release: prometheus
spec:
 selector:
 matchLabels:
 app: go-from-configmap
 endpoints:
 - port: metrics
 path: /metrics
 interval: 30s
 namespaceSelector:
 matchNames:
 - kube-system
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
 name: go-from-configmap
spec:
 scaleTargetRef:
 name: go-from-configmap
 pollingInterval: 30
 cooldownPeriod: 30
 minReplicaCount: 1
 maxReplicaCount: 8
 triggers:
 - type: prometheus
 metadata:
 serverAddress: http://prometheus-operated.kube-system.svc.cluster.local:9090
 metricName: my_custom_gauge
 threshold: '80'
 query: sum(my_custom_gauge{}) / sum(kube_deployment_status_replicas{deployment="go-from-configmap"})

Apply the resources:

				
					kubectl apply -f resources.yaml -n &lt;your-namespace&gt;

Step 5 - What You Just Deployed

ConfigMap

We created a ConfigMap that stores a simple Go HTTP server as a string. This app

HTTP server on port 8080.
Defines a Prometheus gauge metric: my_custom_gauge. This metric represents a simulated load value and will be the core driver for our autoscaling logic.
Exposes a /metrics endpoint compatible with Prometheus scraping
Offers a root endpoint / where a user can update a gauge metric by calling /?value=123

Note: This setup is for demo purposes only and not intended for production use.

Deployment

We define a Deployment that uses the official golang:1.24 image. On startup, it:

Mounts the ConfigMap and compiles the Go server code dynamically
Executes the program which exposes the HTTP endpoints
Requests 1 AMD GPU via resource requests and limits (amd.com/gpu: 1)

Service

The Service resource allows Prometheus to access our /metrics endpoint.
It targets pods labeled app: go-from-configmap
Exposes port 8080.

ServiceMonitor

To make Prometheus aware of our scraping endpoint, we define a ServiceMonitor. This resource:

Tells Prometheus to scrape service matching the app: go-from-configmap label
Configures the scrape path as /metrics
Sets the scrape interval to 30s

Prometheus will now continuously collect the latest values of my_custom_gauge.

ScaledObject (KEDA)

The KEDA ScaledObject ties everything together. It configures KEDA to:

Use Prometheus as the metric source
Tells Keda to watch the my_custom_gauge metric stored in Prometheus
And use the following query for autoscaling events

In short, Keda will:

Poll Prometheus every 30 seconds
Scale between 1 and 8 replicas of the go-from-configmap deployment.
Cool down (scale back) after 30s if load decreases.

Step 6 - Test GPU Autoscaling

Get the pod name and exec into it:

				
					kubectl get pods -l app=go-from-configmap 
kubectl exec -it &lt;go-pod-name&gt; -- bash

Set the gauge to a high value (e.g. 160):

				
					curl http://127.0.0.1:8080/?value=160
exit

Watch for pod scaling:

				
					kubectl get pods -w -l app=go-from-configmap

KEDA will evaluate the query:

160 (gauge value) / 1 (replica) = 160, which exceeds the threshold of 80 \ → A second replica is triggered.

At this point, the new pod should be in a *pending* state since we don’t have enough GPUs based on the requested AMD GPU resources of that deployment. In the cloud UI, you should see a new autoscaling event (a new node coming up) under the AMD GPU node pool. Wait until the new GPU is done bootstrapping and check the pod. The pod must have changed from the *Pending* state to *Running*.

The query is continuously evaluated and scaling stops or continues depending on the metric’s behavior.

FAQs

1. Can I use KEDA to autoscale any type of GPU workload on DOKS?

Yes. KEDA is agnostic to the type of workload as long as you can expose a metric (such as GPU utilization, queue length, or custom application metrics) that Prometheus can scrape. For AMD GPU workloads, ensure the AMD device plugin and metrics exporter are enabled (default on DOKS GPU node pools).

2. What the cloud provider regions support AMD GPU nodes for DOKS?

As of this writing, AMD MI300X GPU nodes are available in select regions such as TOR1, NYC2, and ATL1. Always check the the cloud provider documentation for the latest supported regions and GPU types.

3. How does KEDA differ from Kubernetes’ built-in Horizontal Pod Autoscaler (HPA)?

KEDA extends Kubernetes autoscaling by allowing you to scale workloads based on *external* or *custom* metrics (like Prometheus queries, queue length, or cloud events), not just CPU or memory. This is especially useful for GPU workloads where utilization patterns may not correlate with CPU usage.

4. What happens if there are not enough available GPUs in the node pool?

If your scaling logic triggers more pods than there are available GPUs, new pods will remain in the Pending state until additional GPU nodes are provisioned. DOKS will automatically scale the GPU node pool (if autoscaling is enabled) to accommodate the increased demand, subject to your configured limits.

5. Can I use this approach for other types of accelerators (e.g., NVIDIA GPUs) or on other Kubernetes platforms?

Yes. The KEDA + Prometheus pattern is platform- and vendor-agnostic. You can adapt this approach for NVIDIA GPUs or other accelerators by using the appropriate device plugin and metrics exporter for your hardware and Kubernetes distribution.

Conclusion

This guide demonstrated how to build an autoscaling GPU workload on DOKS using KEDA and Prometheus, all with native Kubernetes constructs and open-source tooling.

This architecture empowers your workloads to:

Automatically scale in response to real-world metrics—not just CPU
Optimize GPU costs by dynamically managing replicas
Easily adapt to use cases like machine learning inference, video processing, or real-time analytics

By combining event-driven scaling with custom observability, teams can efficiently manage GPU resources and deliver high-performance compute workloads with minimal manual intervention.

How to Scale AMD GPU Workloads on DOKS using KEDA

Table of Contents