Prometheus and Grafana are the de facto standard monitoring stack for Kubernetes clusters. Prometheus is a time-series metrics database that scrapes metrics from Kubernetes components (API server, kubelet, etcd) and from applications via HTTP endpoints in a pull-based model. Grafana is a visualisation platform that queries Prometheus and displays metrics as interactive dashboards. The kube-prometheus-stack Helm chart (formerly the Prometheus Operator) packages both tools along with AlertManager (for alerts), node-exporter (for host metrics), and kube-state-metrics (for Kubernetes object metrics) into a single, production-ready deployment. This guide covers installing the kube-prometheus-stack on RHEL 9 Kubernetes, accessing the dashboards, creating custom alerts, and setting up alerting rules for common Kubernetes failure scenarios.

Prerequisites

  • Kubernetes cluster running on RHEL 9
  • Helm installed
  • At least 4 GB RAM available in the cluster for monitoring components

Step 1 — Install kube-prometheus-stack

# Add the Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the full monitoring stack
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack 
    --namespace monitoring 
    --create-namespace 
    --set prometheus.prometheusSpec.retention=30d 
    --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi 
    --set grafana.adminPassword=GrafanaPass123!

kubectl get pods -n monitoring -w

Step 2 — Access Grafana Dashboard

# Expose Grafana via NodePort
kubectl patch svc kube-prometheus-stack-grafana -n monitoring 
    -p '{"spec":{"type":"NodePort"}}'

kubectl get svc kube-prometheus-stack-grafana -n monitoring
# Access: http://server-ip:NODEPORT
# Login: admin / GrafanaPass123!

# Pre-installed dashboards include:
# - Kubernetes / Cluster Overview
# - Kubernetes / Node Exporter / Full (host metrics)
# - Kubernetes / Persistent Volumes
# - Kubernetes / API Server

Step 3 — View Metrics in Prometheus

# Port-forward to Prometheus UI
kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090

# Example PromQL queries:
# CPU usage by namespace:
sum(rate(container_cpu_usage_seconds_total{namespace!=""}[5m])) by (namespace)

# Memory usage by pod:
container_memory_working_set_bytes{container!=""}

# Pod restart count:
kube_pod_container_status_restarts_total

# Node disk usage:
(node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes

Step 4 — Configure Alerting Rules

# /tmp/custom-alerts.yaml — PrometheusRule for custom alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: custom-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack  # Must match Helm release name
spec:
  groups:
  - name: pod-alerts
    rules:
    - alert: PodCrashLooping
      expr: kube_pod_container_status_restarts_total > 5
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod {{ $labels.pod }} is crash-looping"
    - alert: HighMemoryUsage
      expr: container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.9
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Container {{ $labels.container }} memory usage over 90%"
kubectl apply -f /tmp/custom-alerts.yaml

Conclusion

The kube-prometheus-stack on RHEL 9 provides full Kubernetes observability — cluster, node, and application-level metrics — in a single Helm installation. The most impactful out-of-box dashboards are the Node Exporter Full dashboard (CPU, memory, disk, network for each RHEL node) and the Kubernetes / Compute Resources / Cluster dashboard (resource requests vs. limits vs. actual usage across all namespaces). The PodCrashLooping alert rule should be considered mandatory for any production cluster — repeated container restarts are the most common symptom of broken deployments, OOM kills, and application configuration errors.

Next steps: How to Install ArgoCD on RHEL 9, How to Install containerd on RHEL 9, and How to Install Jenkins on RHEL 9.