Prometheus and Grafana are the de facto standard monitoring stack for Kubernetes clusters. Prometheus is a time-series metrics database that scrapes metrics from Kubernetes components (API server, kubelet, etcd) and from applications via HTTP endpoints in a pull-based model. Grafana is a visualisation platform that queries Prometheus and displays metrics as interactive dashboards. The kube-prometheus-stack Helm chart (formerly the Prometheus Operator) packages both tools along with AlertManager (for alerts), node-exporter (for host metrics), and kube-state-metrics (for Kubernetes object metrics) into a single, production-ready deployment. This guide covers installing the kube-prometheus-stack on RHEL 9 Kubernetes, accessing the dashboards, creating custom alerts, and setting up alerting rules for common Kubernetes failure scenarios.
Prerequisites
- Kubernetes cluster running on RHEL 9
- Helm installed
- At least 4 GB RAM available in the cluster for monitoring components
Step 1 — Install kube-prometheus-stack
# Add the Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install the full monitoring stack
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack
--namespace monitoring
--create-namespace
--set prometheus.prometheusSpec.retention=30d
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
--set grafana.adminPassword=GrafanaPass123!
kubectl get pods -n monitoring -w
Step 2 — Access Grafana Dashboard
# Expose Grafana via NodePort
kubectl patch svc kube-prometheus-stack-grafana -n monitoring
-p '{"spec":{"type":"NodePort"}}'
kubectl get svc kube-prometheus-stack-grafana -n monitoring
# Access: http://server-ip:NODEPORT
# Login: admin / GrafanaPass123!
# Pre-installed dashboards include:
# - Kubernetes / Cluster Overview
# - Kubernetes / Node Exporter / Full (host metrics)
# - Kubernetes / Persistent Volumes
# - Kubernetes / API Server
Step 3 — View Metrics in Prometheus
# Port-forward to Prometheus UI
kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090
# Example PromQL queries:
# CPU usage by namespace:
sum(rate(container_cpu_usage_seconds_total{namespace!=""}[5m])) by (namespace)
# Memory usage by pod:
container_memory_working_set_bytes{container!=""}
# Pod restart count:
kube_pod_container_status_restarts_total
# Node disk usage:
(node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes
Step 4 — Configure Alerting Rules
# /tmp/custom-alerts.yaml — PrometheusRule for custom alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: custom-alerts
namespace: monitoring
labels:
release: kube-prometheus-stack # Must match Helm release name
spec:
groups:
- name: pod-alerts
rules:
- alert: PodCrashLooping
expr: kube_pod_container_status_restarts_total > 5
for: 5m
labels:
severity: critical
annotations:
summary: "Pod {{ $labels.pod }} is crash-looping"
- alert: HighMemoryUsage
expr: container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} memory usage over 90%"
kubectl apply -f /tmp/custom-alerts.yaml
Conclusion
The kube-prometheus-stack on RHEL 9 provides full Kubernetes observability — cluster, node, and application-level metrics — in a single Helm installation. The most impactful out-of-box dashboards are the Node Exporter Full dashboard (CPU, memory, disk, network for each RHEL node) and the Kubernetes / Compute Resources / Cluster dashboard (resource requests vs. limits vs. actual usage across all namespaces). The PodCrashLooping alert rule should be considered mandatory for any production cluster — repeated container restarts are the most common symptom of broken deployments, OOM kills, and application configuration errors.
Next steps: How to Install ArgoCD on RHEL 9, How to Install containerd on RHEL 9, and How to Install Jenkins on RHEL 9.