Introduction to Monitoring Windows Server 2022 with Prometheus and Grafana

Prometheus and Grafana form a powerful open-source monitoring stack that, while originally built for Linux environments, provides excellent coverage for Windows Server 2022 through the windows_exporter agent. This combination gives you time-series metrics collection, flexible querying via PromQL, and rich dashboards that surface CPU utilisation, memory pressure, disk throughput, and network activity in real time. This guide walks through every component from initial installation to production alerting.

Installing windows_exporter on Windows Server 2022

windows_exporter (formerly wmi_exporter) is a Prometheus exporter for Windows metrics. It exposes a /metrics HTTP endpoint that Prometheus scrapes on a configurable interval. Download the latest MSI from the official GitHub releases page at github.com/prometheus-community/windows_exporter/releases. At the time of writing the current stable release is 0.27.x.

Open an elevated PowerShell prompt and run the MSI installer with the collectors you need enabled. The following example enables CPU, memory, logical disk, physical disk, network interface, OS, system, service, process, and TCP collectors:

msiexec /i windows_exporter-0.27.0-amd64.msi `
  ENABLED_COLLECTORS="cpu,memory,logical_disk,physical_disk,net,os,system,service,process,tcp" `
  LISTEN_PORT=9182 `
  /qn

The MSI registers windows_exporter as a Windows service named windows_exporter and starts it automatically. Verify it is running and listening:

Get-Service windows_exporter
netstat -ano | findstr :9182

Open a browser on the server and navigate to http://localhost:9182/metrics to confirm the exporter is serving metrics. You should see thousands of lines beginning with go_, process_, windows_cpu_, windows_memory_, and similar prefixes.

Create a Windows Firewall inbound rule to allow Prometheus to scrape the exporter from your monitoring server:

New-NetFirewallRule `
  -DisplayName "Prometheus windows_exporter" `
  -Direction Inbound `
  -Protocol TCP `
  -LocalPort 9182 `
  -Action Allow `
  -Profile Domain,Private

Configuring windows_exporter with a Configuration File

Rather than passing all options via command-line flags or MSI properties, windows_exporter supports a YAML configuration file which is easier to manage at scale. Create C:Program Fileswindows_exporterconfig.yml with the following content:

collectors:
  enabled: cpu,memory,logical_disk,physical_disk,net,os,system,service,process,tcp

collector:
  service:
    services-where: "Name='wuauserv' OR Name='spooler' OR Name='W32Time' OR Name='ADWS'"
  logical_disk:
    volume-whitelist: "C:|D:"

telemetry:
  addr: ":9182"
  path: /metrics

Update the service’s ImagePath to reference this config file. In PowerShell:

$svc = Get-WmiObject -Class Win32_Service -Filter "Name='windows_exporter'"
$newPath = '"C:Program Fileswindows_exporterwindows_exporter.exe" --config.file="C:Program Fileswindows_exporterconfig.yml"'
$svc.Change($null,$newPath,$null,$null,$null,$null,$null,$null,$null,$null,$null)
Restart-Service windows_exporter

Installing Prometheus on Windows Server 2022

While Prometheus is commonly run on Linux, it runs perfectly on Windows Server 2022. Download the latest Windows binary from prometheus.io/download and extract it to C:prometheus.

Expand-Archive -Path C:downloadsprometheus-2.52.0.windows-amd64.zip -DestinationPath C:prometheus

Create the Prometheus configuration file at C:prometheusprometheus.yml. This file defines global settings, scrape intervals, and scrape targets:

global:
  scrape_interval: 30s
  evaluation_interval: 30s
  scrape_timeout: 10s

rule_files:
  - "rules/windows_alerts.yml"
  - "rules/windows_recording.yml"

scrape_configs:
  - job_name: "windows_servers"
    static_configs:
      - targets:
          - "192.168.1.10:9182"
          - "192.168.1.11:9182"
          - "192.168.1.12:9182"
        labels:
          environment: "production"
          os: "windows"
    metric_relabel_configs:
      - source_labels: [__address__]
        target_label: instance

  - job_name: "prometheus_self"
    static_configs:
      - targets: ["localhost:9090"]

Install Prometheus as a Windows service using NSSM (Non-Sucking Service Manager). Download NSSM from nssm.cc, place it in C:toolsnssm, and run:

C:toolsnssmnssm.exe install Prometheus "C:prometheusprometheus.exe"
C:toolsnssmnssm.exe set Prometheus AppParameters "--config.file=C:prometheusprometheus.yml --storage.tsdb.path=C:prometheusdata --storage.tsdb.retention.time=30d --web.listen-address=0.0.0.0:9090"
C:toolsnssmnssm.exe set Prometheus AppDirectory "C:prometheus"
C:toolsnssmnssm.exe set Prometheus Start SERVICE_AUTO_START
Start-Service Prometheus

Verify Prometheus is up by visiting http://localhost:9090 and running a test query such as windows_cpu_time_total in the expression browser.

Key Windows Metrics Exposed by windows_exporter

Understanding which metrics are available is essential for building useful dashboards. The most important metrics for Windows Server monitoring are listed below:

CPU metrics — windows_cpu_time_total is a counter partitioned by core and mode (idle, user, privileged, interrupt, dpc). To calculate overall CPU utilisation as a percentage use the following PromQL expression:

100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[5m])) * 100)

Memory metrics — windows_os_physical_memory_free_bytes and windows_os_visible_memory_bytes allow you to calculate memory utilisation:

(1 - (windows_os_physical_memory_free_bytes / windows_os_visible_memory_bytes)) * 100

Disk metrics — windows_logical_disk_free_bytes and windows_logical_disk_size_bytes for space, plus windows_physical_disk_read_bytes_total and windows_physical_disk_write_bytes_total for I/O throughput.

Network metrics — windows_net_bytes_received_total and windows_net_bytes_sent_total provide per-interface bandwidth, useful for spotting unusual outbound data transfers.

Service metrics — windows_service_state with labels for service name and state (running, stopped, paused) allows alerting when critical services stop unexpectedly.

Creating Prometheus Recording Rules for Windows

Recording rules pre-compute expensive PromQL expressions and store the results as new time series. This dramatically speeds up dashboard loading when querying across many servers. Create C:prometheusruleswindows_recording.yml:

groups:
  - name: windows_recording_rules
    interval: 1m
    rules:
      - record: instance:windows_cpu_utilisation:rate5m
        expr: 100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[5m])) * 100)

      - record: instance:windows_memory_utilisation:ratio
        expr: 1 - (windows_os_physical_memory_free_bytes / windows_os_visible_memory_bytes)

      - record: instance:windows_disk_read_bytes:rate5m
        expr: sum by (instance, volume) (rate(windows_logical_disk_read_bytes_total[5m]))

      - record: instance:windows_disk_write_bytes:rate5m
        expr: sum by (instance, volume) (rate(windows_logical_disk_write_bytes_total[5m]))

      - record: instance:windows_network_receive_bytes:rate5m
        expr: sum by (instance, nic) (rate(windows_net_bytes_received_total[5m]))

      - record: instance:windows_network_transmit_bytes:rate5m
        expr: sum by (instance, nic) (rate(windows_net_bytes_sent_total[5m]))

Creating Alerting Rules for Windows Server Metrics

Create C:prometheusruleswindows_alerts.yml to define threshold-based alerts. These integrate with Alertmanager for email, PagerDuty, or Slack notifications:

groups:
  - name: windows_alerts
    rules:
      - alert: WindowsHighCPU
        expr: instance:windows_cpu_utilisation:rate5m > 90
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU utilisation is {{ $value | printf "%.1f" }}% for 10 minutes."

      - alert: WindowsLowMemory
        expr: instance:windows_memory_utilisation:ratio > 0.90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low memory on {{ $labels.instance }}"
          description: "Memory utilisation is {{ $value | humanizePercentage }}."

      - alert: WindowsLowDiskSpace
        expr: (windows_logical_disk_free_bytes / windows_logical_disk_size_bytes) < 0.10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }} volume {{ $labels.volume }}"
          description: "Only {{ $value | humanizePercentage }} free on {{ $labels.volume }}."

      - alert: WindowsServiceDown
        expr: windows_service_state{state="running"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Service {{ $labels.name }} is down on {{ $labels.instance }}"

      - alert: WindowsExporterDown
        expr: up{job="windows_servers"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "windows_exporter unreachable on {{ $labels.instance }}"

Reload Prometheus to pick up the new rules without restarting the service:

Invoke-WebRequest -Uri "http://localhost:9090/-/reload" -Method POST

Installing Grafana on Windows Server 2022

Download the Grafana Windows installer from grafana.com/grafana/download?platform=windows. Run the MSI as Administrator:

msiexec /i grafana-enterprise-11.1.0.windows-amd64.msi /qn

Grafana installs to C:Program FilesGrafanaLabsgrafana and registers a Windows service named Grafana. The default HTTP port is 3000. Start the service and open the firewall:

Start-Service Grafana
New-NetFirewallRule -DisplayName "Grafana Web" -Direction Inbound -Protocol TCP -LocalPort 3000 -Action Allow

Navigate to http://localhost:3000 and log in with admin / admin. You will be prompted to change the password on first login.

Connecting Grafana to Prometheus

In the Grafana web interface, go to Connections > Data Sources > Add data source. Select Prometheus and configure the following fields:

URL: http://localhost:9090 (or the IP of your Prometheus server if Grafana is on a different machine). Scroll down and click Save & Test. You should see a green banner indicating the data source is working.

Alternatively, provision the data source via a YAML file. Create C:Program FilesGrafanaLabsgrafanaconfprovisioningdatasourcesprometheus.yml:

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: false
    jsonData:
      timeInterval: "30s"
      httpMethod: POST

Restart the Grafana service to apply the provisioned data source:

Restart-Service Grafana

Importing the Windows Server Dashboard

Grafana dashboard ID 14694 (Windows Node by community contributor) is purpose-built for windows_exporter and displays CPU, memory, disk, and network metrics in a clean layout. Import it by clicking Dashboards > Import, entering 14694 in the Grafana.com Dashboard ID field, and clicking Load. Select your Prometheus data source and click Import.

The dashboard includes panels for: CPU utilisation per core, overall memory usage, top processes by CPU and memory, disk read/write throughput per volume, disk IOPS, network bytes sent/received per interface, system uptime, and OS version. Each panel can be customised by editing the underlying PromQL query.

For a more comprehensive view, also import dashboard ID 10171 (Windows Exporter Detailed) which adds additional panels for TCP connection states, page file usage, and per-service status indicators.

Setting Up Alertmanager for Notifications

Alertmanager handles routing and deduplication of alerts fired by Prometheus. Download it from prometheus.io/download and extract to C:alertmanager. Create C:alertmanageralertmanager.yml:

global:
  smtp_smarthost: "smtp.yourdomain.com:587"
  smtp_from: "[email protected]"
  smtp_auth_username: "[email protected]"
  smtp_auth_password: "yourpassword"

route:
  group_by: ["alertname", "instance"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: "email-ops"

receivers:
  - name: "email-ops"
    email_configs:
      - to: "[email protected]"
        send_resolved: true

inhibit_rules:
  - source_match:
      severity: "critical"
    target_match:
      severity: "warning"
    equal: ["instance"]

Install Alertmanager as a service with NSSM and add its address to the Prometheus configuration under the alerting block:

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["localhost:9093"]

Verifying the Complete Stack

With all components running, verify the full pipeline. In Prometheus at http://localhost:9090/targets confirm all Windows targets show state UP with recent scrape timestamps. Navigate to http://localhost:9090/alerts to see the loaded alerting rules in their inactive or pending states. In Grafana, open the Windows Server dashboard and confirm panels are displaying data with appropriate legends per instance.

Use the following PromQL queries in the Prometheus expression browser to sanity-check key metrics are flowing correctly:

# Confirm CPU metric is present
count(windows_cpu_time_total) by (instance)

# Check memory free bytes
windows_os_physical_memory_free_bytes

# Check disk free percentage per volume
windows_logical_disk_free_bytes / windows_logical_disk_size_bytes * 100

# Check services that are not running
windows_service_state{state!="running", start_mode="auto"}

This stack gives you a production-grade observability platform for Windows Server 2022 that scales from a handful of servers to hundreds with minimal operational overhead. The 30-day retention configured in Prometheus provides ample historical data for capacity planning and post-incident analysis, while Grafana dashboards deliver the visual layer needed for day-to-day operations.