How to Monitor Kubernetes with Prometheus and Grafana on RHEL 8

December 15, 2025
Linux
Comment off

Prometheus and Grafana are the de facto standard observability stack for Kubernetes, providing metrics collection, long-term storage, alerting, and rich dashboarding in a single integrated solution. The kube-prometheus-stack Helm chart bundles Prometheus, Grafana, Alertmanager, and a comprehensive set of pre-built Kubernetes dashboards and alert rules into a single deployment that takes minutes to install. This tutorial walks you through installing the stack on RHEL 8, exposing Grafana for browser access, exploring the built-in Kubernetes dashboards, defining custom PrometheusRule alerts, and configuring Alertmanager routing in the values file. After completing this guide you will have end-to-end visibility into your cluster’s health and performance.

Prerequisites

A running Kubernetes cluster on RHEL 8
Helm 3 installed (helm version should show v3.x)
kubectl configured with cluster-admin privileges
At least 4 GB free memory across the cluster for the full stack
Persistent volume provisioner available (or use --set prometheus.prometheusSpec.storageSpec= to disable PVCs for testing)

Step 1 — Add the Prometheus Community Helm Repository

The kube-prometheus-stack chart is maintained by the Prometheus community and distributed through their Helm repository. Add and update it before installing.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus-community/kube-prometheus-stack

Step 2 — Install kube-prometheus-stack

Create a dedicated namespace and install the chart. The default installation includes Prometheus with 10 day retention, Grafana with admin credentials, Alertmanager, and ServiceMonitors for all core Kubernetes components.

kubectl create namespace monitoring

helm install kube-prometheus-stack 
  prometheus-community/kube-prometheus-stack 
  --namespace monitoring 
  --set grafana.adminPassword='ChangeMe123!' 
  --set prometheus.prometheusSpec.retention=15d 
  --wait

Verify all pods come up in the monitoring namespace:

kubectl get pods -n monitoring

You should see pods for prometheus-kube-prometheus-stack-prometheus, alertmanager-kube-prometheus-stack-alertmanager, kube-prometheus-stack-grafana, and several exporters.

Step 3 — Expose Grafana for Browser Access

You can expose Grafana using either a NodePort Service or kubectl port-forwarding. For persistent access, patch the service to NodePort:

# Option A: port-forward (temporary)
kubectl port-forward svc/kube-prometheus-stack-grafana 
  -n monitoring 3000:80 &

# Option B: change service type to NodePort (persistent)
kubectl patch svc kube-prometheus-stack-grafana 
  -n monitoring 
  -p '{"spec":{"type":"NodePort"}}'

kubectl get svc kube-prometheus-stack-grafana -n monitoring

Open the NodePort in your browser and log in with username admin and the password set during installation. Open firewall access if using NodePort:

NODEPORT=$(kubectl get svc kube-prometheus-stack-grafana -n monitoring 
  -o jsonpath='{.spec.ports[0].nodePort}')
sudo firewall-cmd --permanent --add-port=${NODEPORT}/tcp
sudo firewall-cmd --reload

Step 4 — Explore Built-in Dashboards

The kube-prometheus-stack ships with over 30 pre-built Grafana dashboards. Navigate to Dashboards → Browse in the Grafana UI. Key dashboards to review:

Kubernetes / Cluster (by Namespace) — namespace-level CPU, memory, and network usage
Kubernetes / Nodes — per-node CPU saturation, memory pressure, disk I/O, and network
Kubernetes / Pods — individual pod resource consumption and restart counts
Kubernetes / API server — request latency, error rates, and etcd metrics

# Query Prometheus directly via port-forward
kubectl port-forward svc/kube-prometheus-stack-prometheus 
  -n monitoring 9090:9090 &

# Example PromQL: CPU usage per namespace
# Open http://localhost:9090 and enter:
# sum(rate(container_cpu_usage_seconds_total{namespace!=""}[5m])) by (namespace)

Step 5 — Create a Custom PrometheusRule Alert

Define a custom alert that fires when a pod has been restarting frequently. Save the following as pod-restart-alert.yaml:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: pod-restart-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: pod.rules
      rules:
        - alert: PodHighRestartCount
          expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} restarting frequently"
            description: "Container {{ $labels.container }} in pod {{ $labels.pod }} has restarted more than 5 times in the last hour."

kubectl apply -f pod-restart-alert.yaml

# Verify Prometheus picked up the rule
kubectl get prometheusrule -n monitoring

Step 6 — Configure Alertmanager via values.yaml

Configure Alertmanager to route warning severity alerts to a Slack channel by creating a custom values file and upgrading the Helm release.

cat > alertmanager-values.yaml << 'EOF'
alertmanager:
  config:
    global:
      slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    route:
      receiver: 'slack-warnings'
      group_by: ['alertname', 'namespace']
      routes:
        - match:
            severity: warning
          receiver: 'slack-warnings'
    receivers:
      - name: 'slack-warnings'
        slack_configs:
          - channel: '#k8s-alerts'
            title: '{{ .CommonAnnotations.summary }}'
            text: '{{ .CommonAnnotations.description }}'
            send_resolved: true
EOF

helm upgrade kube-prometheus-stack 
  prometheus-community/kube-prometheus-stack 
  --namespace monitoring 
  --values alertmanager-values.yaml 
  --reuse-values

Conclusion

You have deployed the full kube-prometheus-stack on RHEL 8, exposed Grafana with browser access, explored built-in Kubernetes cluster and node dashboards, created a custom PrometheusRule alert for pod restart detection, and configured Alertmanager to forward warnings to Slack. This observability stack gives you the metrics visibility needed to catch performance degradation and capacity issues before they become outages.

Next steps: Set up persistent storage for Prometheus with a PersistentVolumeClaim for long-term metric retention, Add application-level metrics by instrumenting your services with the Prometheus client library, and Configure Grafana datasource and dashboard provisioning via ConfigMaps for reproducible setups.