How to Monitor Kubernetes with Prometheus and Grafana on RHEL 7
Operating a Kubernetes cluster without monitoring is like running a production database without logs — you will only know something is wrong after an outage has already started. Prometheus is the de facto standard for Kubernetes metrics collection, using a pull-based model where it scrapes metric endpoints exposed by cluster components, nodes, and your own applications at regular intervals. Grafana provides the visualization layer, turning raw time-series data into interactive dashboards that reveal CPU saturation, memory pressure, pod restart patterns, and network throughput at a glance. On RHEL 7, where production workloads demand high availability and operational visibility, deploying the kube-prometheus-stack Helm chart is the fastest and most complete way to get comprehensive cluster monitoring running in minutes. This guide covers the full setup from Helm installation through custom application monitoring with ServiceMonitors.
Prerequisites
- RHEL 7 nodes with a running Kubernetes cluster
- Helm 3 installed and configured
kubectlwith cluster-admin access- At least 4 GB of available cluster memory (Prometheus can be memory-intensive with long retention)
- A StorageClass configured for persistent volumes (recommended for Prometheus data)
- Outbound internet access to pull Helm charts and container images
Step 1: Add the Prometheus Helm Repository
The kube-prometheus-stack chart bundles Prometheus, Alertmanager, Grafana, kube-state-metrics, and node-exporter into a single deployable package maintained by the Prometheus community:
helm repo add prometheus-community
https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus-community/kube-prometheus-stack
Create a dedicated monitoring namespace to isolate all monitoring components:
kubectl create namespace monitoring
Step 2: Install kube-prometheus-stack
Install the stack with values appropriate for a RHEL 7 bare-metal cluster. Set a retention period and Grafana admin password:
helm install kube-prometheus-stack
prometheus-community/kube-prometheus-stack
--namespace monitoring
--set prometheus.prometheusSpec.retention=15d
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=standard
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=20Gi
--set grafana.adminPassword=ChangeMeNow123
--set alertmanager.enabled=true
If you do not have a StorageClass configured, omit the storageSpec lines and Prometheus will use ephemeral pod storage. Note that with ephemeral storage all historical metric data is lost if the Prometheus pod restarts — configure persistent storage before using this in production on RHEL 7.
Monitor the deployment rollout:
kubectl get pods -n monitoring -w
Wait for the prometheus-kube-prometheus-stack-prometheus-0 StatefulSet pod, the alertmanager-kube-prometheus-stack-alertmanager-0 pod, and all Grafana and exporter pods to reach Running status.
Step 3: Access the Grafana Dashboard
Use port-forward to access Grafana from your RHEL 7 workstation without exposing it externally:
kubectl port-forward svc/kube-prometheus-stack-grafana
-n monitoring 3000:80 &
Open http://localhost:3000 in a browser. Log in with username admin and the password set during installation.
Expose Grafana via NodePort for Remote Access
kubectl patch svc kube-prometheus-stack-grafana
-n monitoring
-p '{"spec": {"type": "NodePort"}}'
kubectl get svc kube-prometheus-stack-grafana -n monitoring
firewall-cmd --permanent --add-port=32000/tcp
firewall-cmd --reload
Step 4: Explore Default Dashboards
kube-prometheus-stack ships with over twenty pre-built Grafana dashboards. Navigate to Dashboards > Browse in the Grafana UI to find them organized by category.
Key dashboards to review immediately
- Kubernetes / Compute Resources / Cluster: Overall CPU and memory usage across all nodes
- Kubernetes / Compute Resources / Node (Pods): Per-node pod resource consumption breakdown
- Node Exporter / Full: OS-level metrics including disk I/O, network, filesystem, and CPU modes like iowait
- Kubernetes / Networking / Cluster: Network traffic and error rates broken down by namespace
- Alertmanager / Overview: Active alerts, silences, and inhibition rules
The Node Exporter Full dashboard is particularly valuable on RHEL 7 because it exposes Linux-level metrics like iowait, softirq, and swap utilization that Kubernetes itself does not report through the metrics API.
Step 5: Configure Custom Alertmanager Rules
The stack includes default PrometheusRule objects covering common failure conditions. List the existing rules:
kubectl get prometheusrule -n monitoring
Create a custom alert rule file named custom-alerts.yaml:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: custom-alerts
namespace: monitoring
labels:
release: kube-prometheus-stack
spec:
groups:
- name: custom.rules
rules:
- alert: HighPodRestartRate
expr: increase(kube_pod_container_status_restarts_total[15m]) > 3
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is restarting frequently"
description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has restarted more than 3 times in 15 minutes."
- alert: NodeHighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} memory above 85%"
description: "Node {{ $labels.instance }} has used more than 85% of available memory for 5 consecutive minutes."
kubectl apply -f custom-alerts.yaml
The label release: kube-prometheus-stack is required for the Prometheus Operator to discover and load this rule. Without it, the CRD will be created in Kubernetes but silently ignored by the running Prometheus instance.
Step 6: Add a ServiceMonitor for Custom Applications
To scrape metrics from your own application, it must expose a /metrics endpoint in Prometheus text format. Once it does, create a ServiceMonitor resource to tell Prometheus how to find it:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-monitor
namespace: monitoring
labels:
release: kube-prometheus-stack
spec:
namespaceSelector:
matchNames:
- production
selector:
matchLabels:
app: myapp
endpoints:
- port: http-metrics
path: /metrics
interval: 30s
kubectl apply -f myapp-servicemonitor.yaml
This tells Prometheus to scrape all pods selected by the label app: myapp in the production namespace every 30 seconds on the named port http-metrics. After about one minute, verify the target appears in Prometheus by port-forwarding to the Prometheus service and checking its targets page:
kubectl port-forward svc/kube-prometheus-stack-prometheus
-n monitoring 9090:9090 &
Navigate to http://localhost:9090/targets and look for your application target listed under the ServiceMonitor.
Step 7: Install metrics-server and Use kubectl top
The kubectl top command provides real-time pod and node resource usage but requires the metrics-server, which is separate from Prometheus. Install it:
kubectl apply -f
https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
On RHEL 7 clusters with self-signed kubelet certificates, metrics-server may fail TLS verification. Patch it to disable certificate verification:
kubectl patch deployment metrics-server
-n kube-system
--type='json'
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
Once the metrics-server pod is running, use kubectl top for quick resource snapshots:
# Node-level resource usage
kubectl top nodes
# Pod-level resource usage across all namespaces
kubectl top pods --all-namespaces
# Pod resource usage in a namespace sorted by CPU consumption
kubectl top pods -n production --sort-by=cpu
Step 8: Run Ad-Hoc PromQL Queries
Access the Prometheus query UI to run PromQL expressions against your cluster metrics:
kubectl port-forward svc/kube-prometheus-stack-prometheus
-n monitoring 9090:9090 &
Navigate to http://localhost:9090 and try these useful PromQL queries:
# CPU usage per node as a percentage
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory working set per pod in megabytes
sum(container_memory_working_set_bytes{container!=""}) by (pod, namespace) / 1024 / 1024
# Pods with at least one restart in the last hour
increase(kube_pod_container_status_restarts_total[1h]) > 0
Conclusion
You have deployed a complete Kubernetes monitoring stack on RHEL 7 using the kube-prometheus-stack Helm chart, accessed Grafana dashboards covering node exporter OS metrics and cluster-level resource consumption, configured custom Alertmanager rules for pod restart rates and memory pressure, added a ServiceMonitor to scrape custom application metrics, and used both kubectl top for real-time snapshots and the Prometheus UI for historical PromQL queries. Prometheus and Grafana together provide both the historical trends needed for capacity planning and the real-time alerting required for incident response. As your cluster and monitoring needs grow, consider enabling Thanos for long-term metric storage beyond Prometheus retention limits, Grafana alerting rules as a unified alternative to Alertmanager, and integrating alert routing with PagerDuty, Slack, or your organization’s on-call tooling.