How to Monitor Kubernetes with Prometheus and Grafana on RHEL 7

Operating a Kubernetes cluster without monitoring is like running a production database without logs — you will only know something is wrong after an outage has already started. Prometheus is the de facto standard for Kubernetes metrics collection, using a pull-based model where it scrapes metric endpoints exposed by cluster components, nodes, and your own applications at regular intervals. Grafana provides the visualization layer, turning raw time-series data into interactive dashboards that reveal CPU saturation, memory pressure, pod restart patterns, and network throughput at a glance. On RHEL 7, where production workloads demand high availability and operational visibility, deploying the kube-prometheus-stack Helm chart is the fastest and most complete way to get comprehensive cluster monitoring running in minutes. This guide covers the full setup from Helm installation through custom application monitoring with ServiceMonitors.

Prerequisites

  • RHEL 7 nodes with a running Kubernetes cluster
  • Helm 3 installed and configured
  • kubectl with cluster-admin access
  • At least 4 GB of available cluster memory (Prometheus can be memory-intensive with long retention)
  • A StorageClass configured for persistent volumes (recommended for Prometheus data)
  • Outbound internet access to pull Helm charts and container images

Step 1: Add the Prometheus Helm Repository

The kube-prometheus-stack chart bundles Prometheus, Alertmanager, Grafana, kube-state-metrics, and node-exporter into a single deployable package maintained by the Prometheus community:

helm repo add prometheus-community 
  https://prometheus-community.github.io/helm-charts
helm repo update

helm search repo prometheus-community/kube-prometheus-stack

Create a dedicated monitoring namespace to isolate all monitoring components:

kubectl create namespace monitoring

Step 2: Install kube-prometheus-stack

Install the stack with values appropriate for a RHEL 7 bare-metal cluster. Set a retention period and Grafana admin password:

helm install kube-prometheus-stack 
  prometheus-community/kube-prometheus-stack 
  --namespace monitoring 
  --set prometheus.prometheusSpec.retention=15d 
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=standard 
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=20Gi 
  --set grafana.adminPassword=ChangeMeNow123 
  --set alertmanager.enabled=true

If you do not have a StorageClass configured, omit the storageSpec lines and Prometheus will use ephemeral pod storage. Note that with ephemeral storage all historical metric data is lost if the Prometheus pod restarts — configure persistent storage before using this in production on RHEL 7.

Monitor the deployment rollout:

kubectl get pods -n monitoring -w

Wait for the prometheus-kube-prometheus-stack-prometheus-0 StatefulSet pod, the alertmanager-kube-prometheus-stack-alertmanager-0 pod, and all Grafana and exporter pods to reach Running status.

Step 3: Access the Grafana Dashboard

Use port-forward to access Grafana from your RHEL 7 workstation without exposing it externally:

kubectl port-forward svc/kube-prometheus-stack-grafana 
  -n monitoring 3000:80 &

Open http://localhost:3000 in a browser. Log in with username admin and the password set during installation.

Expose Grafana via NodePort for Remote Access

kubectl patch svc kube-prometheus-stack-grafana 
  -n monitoring 
  -p '{"spec": {"type": "NodePort"}}'

kubectl get svc kube-prometheus-stack-grafana -n monitoring

firewall-cmd --permanent --add-port=32000/tcp
firewall-cmd --reload

Step 4: Explore Default Dashboards

kube-prometheus-stack ships with over twenty pre-built Grafana dashboards. Navigate to Dashboards > Browse in the Grafana UI to find them organized by category.

Key dashboards to review immediately

  • Kubernetes / Compute Resources / Cluster: Overall CPU and memory usage across all nodes
  • Kubernetes / Compute Resources / Node (Pods): Per-node pod resource consumption breakdown
  • Node Exporter / Full: OS-level metrics including disk I/O, network, filesystem, and CPU modes like iowait
  • Kubernetes / Networking / Cluster: Network traffic and error rates broken down by namespace
  • Alertmanager / Overview: Active alerts, silences, and inhibition rules

The Node Exporter Full dashboard is particularly valuable on RHEL 7 because it exposes Linux-level metrics like iowait, softirq, and swap utilization that Kubernetes itself does not report through the metrics API.

Step 5: Configure Custom Alertmanager Rules

The stack includes default PrometheusRule objects covering common failure conditions. List the existing rules:

kubectl get prometheusrule -n monitoring

Create a custom alert rule file named custom-alerts.yaml:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: custom-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  groups:
  - name: custom.rules
    rules:
    - alert: HighPodRestartRate
      expr: increase(kube_pod_container_status_restarts_total[15m]) > 3
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.pod }} is restarting frequently"
        description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has restarted more than 3 times in 15 minutes."
    - alert: NodeHighMemoryUsage
      expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.instance }} memory above 85%"
        description: "Node {{ $labels.instance }} has used more than 85% of available memory for 5 consecutive minutes."
kubectl apply -f custom-alerts.yaml

The label release: kube-prometheus-stack is required for the Prometheus Operator to discover and load this rule. Without it, the CRD will be created in Kubernetes but silently ignored by the running Prometheus instance.

Step 6: Add a ServiceMonitor for Custom Applications

To scrape metrics from your own application, it must expose a /metrics endpoint in Prometheus text format. Once it does, create a ServiceMonitor resource to tell Prometheus how to find it:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  namespaceSelector:
    matchNames:
    - production
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: http-metrics
    path: /metrics
    interval: 30s
kubectl apply -f myapp-servicemonitor.yaml

This tells Prometheus to scrape all pods selected by the label app: myapp in the production namespace every 30 seconds on the named port http-metrics. After about one minute, verify the target appears in Prometheus by port-forwarding to the Prometheus service and checking its targets page:

kubectl port-forward svc/kube-prometheus-stack-prometheus 
  -n monitoring 9090:9090 &

Navigate to http://localhost:9090/targets and look for your application target listed under the ServiceMonitor.

Step 7: Install metrics-server and Use kubectl top

The kubectl top command provides real-time pod and node resource usage but requires the metrics-server, which is separate from Prometheus. Install it:

kubectl apply -f 
  https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

On RHEL 7 clusters with self-signed kubelet certificates, metrics-server may fail TLS verification. Patch it to disable certificate verification:

kubectl patch deployment metrics-server 
  -n kube-system 
  --type='json' 
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

Once the metrics-server pod is running, use kubectl top for quick resource snapshots:

# Node-level resource usage
kubectl top nodes

# Pod-level resource usage across all namespaces
kubectl top pods --all-namespaces

# Pod resource usage in a namespace sorted by CPU consumption
kubectl top pods -n production --sort-by=cpu

Step 8: Run Ad-Hoc PromQL Queries

Access the Prometheus query UI to run PromQL expressions against your cluster metrics:

kubectl port-forward svc/kube-prometheus-stack-prometheus 
  -n monitoring 9090:9090 &

Navigate to http://localhost:9090 and try these useful PromQL queries:

# CPU usage per node as a percentage
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory working set per pod in megabytes
sum(container_memory_working_set_bytes{container!=""}) by (pod, namespace) / 1024 / 1024

# Pods with at least one restart in the last hour
increase(kube_pod_container_status_restarts_total[1h]) > 0

Conclusion

You have deployed a complete Kubernetes monitoring stack on RHEL 7 using the kube-prometheus-stack Helm chart, accessed Grafana dashboards covering node exporter OS metrics and cluster-level resource consumption, configured custom Alertmanager rules for pod restart rates and memory pressure, added a ServiceMonitor to scrape custom application metrics, and used both kubectl top for real-time snapshots and the Prometheus UI for historical PromQL queries. Prometheus and Grafana together provide both the historical trends needed for capacity planning and the real-time alerting required for incident response. As your cluster and monitoring needs grow, consider enabling Thanos for long-term metric storage beyond Prometheus retention limits, Grafana alerting rules as a unified alternative to Alertmanager, and integrating alert routing with PagerDuty, Slack, or your organization’s on-call tooling.