Introduction to Monitoring Windows Server 2022 with Prometheus and Grafana
Prometheus and Grafana form a powerful open-source monitoring stack that, while originally built for Linux environments, provides excellent coverage for Windows Server 2022 through the windows_exporter agent. This combination gives you time-series metrics collection, flexible querying via PromQL, and rich dashboards that surface CPU utilisation, memory pressure, disk throughput, and network activity in real time. This guide walks through every component from initial installation to production alerting.
Installing windows_exporter on Windows Server 2022
windows_exporter (formerly wmi_exporter) is a Prometheus exporter for Windows metrics. It exposes a /metrics HTTP endpoint that Prometheus scrapes on a configurable interval. Download the latest MSI from the official GitHub releases page at github.com/prometheus-community/windows_exporter/releases. At the time of writing the current stable release is 0.27.x.
Open an elevated PowerShell prompt and run the MSI installer with the collectors you need enabled. The following example enables CPU, memory, logical disk, physical disk, network interface, OS, system, service, process, and TCP collectors:
msiexec /i windows_exporter-0.27.0-amd64.msi `
ENABLED_COLLECTORS="cpu,memory,logical_disk,physical_disk,net,os,system,service,process,tcp" `
LISTEN_PORT=9182 `
/qn
The MSI registers windows_exporter as a Windows service named windows_exporter and starts it automatically. Verify it is running and listening:
Get-Service windows_exporter
netstat -ano | findstr :9182
Open a browser on the server and navigate to http://localhost:9182/metrics to confirm the exporter is serving metrics. You should see thousands of lines beginning with go_, process_, windows_cpu_, windows_memory_, and similar prefixes.
Create a Windows Firewall inbound rule to allow Prometheus to scrape the exporter from your monitoring server:
New-NetFirewallRule `
-DisplayName "Prometheus windows_exporter" `
-Direction Inbound `
-Protocol TCP `
-LocalPort 9182 `
-Action Allow `
-Profile Domain,Private
Configuring windows_exporter with a Configuration File
Rather than passing all options via command-line flags or MSI properties, windows_exporter supports a YAML configuration file which is easier to manage at scale. Create C:Program Fileswindows_exporterconfig.yml with the following content:
collectors:
enabled: cpu,memory,logical_disk,physical_disk,net,os,system,service,process,tcp
collector:
service:
services-where: "Name='wuauserv' OR Name='spooler' OR Name='W32Time' OR Name='ADWS'"
logical_disk:
volume-whitelist: "C:|D:"
telemetry:
addr: ":9182"
path: /metrics
Update the service’s ImagePath to reference this config file. In PowerShell:
$svc = Get-WmiObject -Class Win32_Service -Filter "Name='windows_exporter'"
$newPath = '"C:Program Fileswindows_exporterwindows_exporter.exe" --config.file="C:Program Fileswindows_exporterconfig.yml"'
$svc.Change($null,$newPath,$null,$null,$null,$null,$null,$null,$null,$null,$null)
Restart-Service windows_exporter
Installing Prometheus on Windows Server 2022
While Prometheus is commonly run on Linux, it runs perfectly on Windows Server 2022. Download the latest Windows binary from prometheus.io/download and extract it to C:prometheus.
Expand-Archive -Path C:downloadsprometheus-2.52.0.windows-amd64.zip -DestinationPath C:prometheus
Create the Prometheus configuration file at C:prometheusprometheus.yml. This file defines global settings, scrape intervals, and scrape targets:
global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_timeout: 10s
rule_files:
- "rules/windows_alerts.yml"
- "rules/windows_recording.yml"
scrape_configs:
- job_name: "windows_servers"
static_configs:
- targets:
- "192.168.1.10:9182"
- "192.168.1.11:9182"
- "192.168.1.12:9182"
labels:
environment: "production"
os: "windows"
metric_relabel_configs:
- source_labels: [__address__]
target_label: instance
- job_name: "prometheus_self"
static_configs:
- targets: ["localhost:9090"]
Install Prometheus as a Windows service using NSSM (Non-Sucking Service Manager). Download NSSM from nssm.cc, place it in C:toolsnssm, and run:
C:toolsnssmnssm.exe install Prometheus "C:prometheusprometheus.exe"
C:toolsnssmnssm.exe set Prometheus AppParameters "--config.file=C:prometheusprometheus.yml --storage.tsdb.path=C:prometheusdata --storage.tsdb.retention.time=30d --web.listen-address=0.0.0.0:9090"
C:toolsnssmnssm.exe set Prometheus AppDirectory "C:prometheus"
C:toolsnssmnssm.exe set Prometheus Start SERVICE_AUTO_START
Start-Service Prometheus
Verify Prometheus is up by visiting http://localhost:9090 and running a test query such as windows_cpu_time_total in the expression browser.
Key Windows Metrics Exposed by windows_exporter
Understanding which metrics are available is essential for building useful dashboards. The most important metrics for Windows Server monitoring are listed below:
CPU metrics — windows_cpu_time_total is a counter partitioned by core and mode (idle, user, privileged, interrupt, dpc). To calculate overall CPU utilisation as a percentage use the following PromQL expression:
100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[5m])) * 100)
Memory metrics — windows_os_physical_memory_free_bytes and windows_os_visible_memory_bytes allow you to calculate memory utilisation:
(1 - (windows_os_physical_memory_free_bytes / windows_os_visible_memory_bytes)) * 100
Disk metrics — windows_logical_disk_free_bytes and windows_logical_disk_size_bytes for space, plus windows_physical_disk_read_bytes_total and windows_physical_disk_write_bytes_total for I/O throughput.
Network metrics — windows_net_bytes_received_total and windows_net_bytes_sent_total provide per-interface bandwidth, useful for spotting unusual outbound data transfers.
Service metrics — windows_service_state with labels for service name and state (running, stopped, paused) allows alerting when critical services stop unexpectedly.
Creating Prometheus Recording Rules for Windows
Recording rules pre-compute expensive PromQL expressions and store the results as new time series. This dramatically speeds up dashboard loading when querying across many servers. Create C:prometheusruleswindows_recording.yml:
groups:
- name: windows_recording_rules
interval: 1m
rules:
- record: instance:windows_cpu_utilisation:rate5m
expr: 100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[5m])) * 100)
- record: instance:windows_memory_utilisation:ratio
expr: 1 - (windows_os_physical_memory_free_bytes / windows_os_visible_memory_bytes)
- record: instance:windows_disk_read_bytes:rate5m
expr: sum by (instance, volume) (rate(windows_logical_disk_read_bytes_total[5m]))
- record: instance:windows_disk_write_bytes:rate5m
expr: sum by (instance, volume) (rate(windows_logical_disk_write_bytes_total[5m]))
- record: instance:windows_network_receive_bytes:rate5m
expr: sum by (instance, nic) (rate(windows_net_bytes_received_total[5m]))
- record: instance:windows_network_transmit_bytes:rate5m
expr: sum by (instance, nic) (rate(windows_net_bytes_sent_total[5m]))
Creating Alerting Rules for Windows Server Metrics
Create C:prometheusruleswindows_alerts.yml to define threshold-based alerts. These integrate with Alertmanager for email, PagerDuty, or Slack notifications:
groups:
- name: windows_alerts
rules:
- alert: WindowsHighCPU
expr: instance:windows_cpu_utilisation:rate5m > 90
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU utilisation is {{ $value | printf "%.1f" }}% for 10 minutes."
- alert: WindowsLowMemory
expr: instance:windows_memory_utilisation:ratio > 0.90
for: 5m
labels:
severity: critical
annotations:
summary: "Low memory on {{ $labels.instance }}"
description: "Memory utilisation is {{ $value | humanizePercentage }}."
- alert: WindowsLowDiskSpace
expr: (windows_logical_disk_free_bytes / windows_logical_disk_size_bytes) < 0.10
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }} volume {{ $labels.volume }}"
description: "Only {{ $value | humanizePercentage }} free on {{ $labels.volume }}."
- alert: WindowsServiceDown
expr: windows_service_state{state="running"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.name }} is down on {{ $labels.instance }}"
- alert: WindowsExporterDown
expr: up{job="windows_servers"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "windows_exporter unreachable on {{ $labels.instance }}"
Reload Prometheus to pick up the new rules without restarting the service:
Invoke-WebRequest -Uri "http://localhost:9090/-/reload" -Method POST
Installing Grafana on Windows Server 2022
Download the Grafana Windows installer from grafana.com/grafana/download?platform=windows. Run the MSI as Administrator:
msiexec /i grafana-enterprise-11.1.0.windows-amd64.msi /qn
Grafana installs to C:Program FilesGrafanaLabsgrafana and registers a Windows service named Grafana. The default HTTP port is 3000. Start the service and open the firewall:
Start-Service Grafana
New-NetFirewallRule -DisplayName "Grafana Web" -Direction Inbound -Protocol TCP -LocalPort 3000 -Action Allow
Navigate to http://localhost:3000 and log in with admin / admin. You will be prompted to change the password on first login.
Connecting Grafana to Prometheus
In the Grafana web interface, go to Connections > Data Sources > Add data source. Select Prometheus and configure the following fields:
URL: http://localhost:9090 (or the IP of your Prometheus server if Grafana is on a different machine). Scroll down and click Save & Test. You should see a green banner indicating the data source is working.
Alternatively, provision the data source via a YAML file. Create C:Program FilesGrafanaLabsgrafanaconfprovisioningdatasourcesprometheus.yml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:9090
isDefault: true
editable: false
jsonData:
timeInterval: "30s"
httpMethod: POST
Restart the Grafana service to apply the provisioned data source:
Restart-Service Grafana
Importing the Windows Server Dashboard
Grafana dashboard ID 14694 (Windows Node by community contributor) is purpose-built for windows_exporter and displays CPU, memory, disk, and network metrics in a clean layout. Import it by clicking Dashboards > Import, entering 14694 in the Grafana.com Dashboard ID field, and clicking Load. Select your Prometheus data source and click Import.
The dashboard includes panels for: CPU utilisation per core, overall memory usage, top processes by CPU and memory, disk read/write throughput per volume, disk IOPS, network bytes sent/received per interface, system uptime, and OS version. Each panel can be customised by editing the underlying PromQL query.
For a more comprehensive view, also import dashboard ID 10171 (Windows Exporter Detailed) which adds additional panels for TCP connection states, page file usage, and per-service status indicators.
Setting Up Alertmanager for Notifications
Alertmanager handles routing and deduplication of alerts fired by Prometheus. Download it from prometheus.io/download and extract to C:alertmanager. Create C:alertmanageralertmanager.yml:
global:
smtp_smarthost: "smtp.yourdomain.com:587"
smtp_from: "[email protected]"
smtp_auth_username: "[email protected]"
smtp_auth_password: "yourpassword"
route:
group_by: ["alertname", "instance"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: "email-ops"
receivers:
- name: "email-ops"
email_configs:
- to: "[email protected]"
send_resolved: true
inhibit_rules:
- source_match:
severity: "critical"
target_match:
severity: "warning"
equal: ["instance"]
Install Alertmanager as a service with NSSM and add its address to the Prometheus configuration under the alerting block:
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
Verifying the Complete Stack
With all components running, verify the full pipeline. In Prometheus at http://localhost:9090/targets confirm all Windows targets show state UP with recent scrape timestamps. Navigate to http://localhost:9090/alerts to see the loaded alerting rules in their inactive or pending states. In Grafana, open the Windows Server dashboard and confirm panels are displaying data with appropriate legends per instance.
Use the following PromQL queries in the Prometheus expression browser to sanity-check key metrics are flowing correctly:
# Confirm CPU metric is present
count(windows_cpu_time_total) by (instance)
# Check memory free bytes
windows_os_physical_memory_free_bytes
# Check disk free percentage per volume
windows_logical_disk_free_bytes / windows_logical_disk_size_bytes * 100
# Check services that are not running
windows_service_state{state!="running", start_mode="auto"}
This stack gives you a production-grade observability platform for Windows Server 2022 that scales from a handful of servers to hundreds with minimal operational overhead. The 30-day retention configured in Prometheus provides ample historical data for capacity planning and post-incident analysis, while Grafana dashboards deliver the visual layer needed for day-to-day operations.