Affected versions: Windows Server 2025

📖 ~3 min read

Table of contents
  1. Symptom & Impact
  2. Environment & Reproduction
  3. Root Cause Analysis
  4. Quick Triage
  5. Step-by-Step Diagnosis
  6. Solution — Primary Fix
  7. Solution — Alternative Approaches
  8. Verification & Acceptance Criteria
  9. Rollback Plan
  10. Prevention & Hardening
  11. Related Errors & Cross-Refs
  12. References & Further Reading

Symptom & Impact

After installing new RAM modules or following a hardware maintenance window, Windows Server 2025 intermittently crashes with a blue screen displaying STOP code WHEA_UNCORRECTABLE_ERROR (0x00000124). The crash occurs under load, sometimes during boot, and produces a minidump file in C:WindowsMinidump. WHEA (Windows Hardware Error Architecture) errors indicate an uncorrectable hardware error — most commonly ECC memory errors, CPU errors, or chipset/PCIe bus errors. The impact is severe: unexpected server reboots, potential data corruption if a write was in progress, and crash-dump analysis overhead. On a Hyper-V host, all running VMs are terminated with the host BSOD.

Environment & Reproduction

Confirmed on Windows Server 2025 with DDR4/DDR5 ECC registered memory. Triggers under high memory utilization (>70% of installed RAM), during SQL Server large query processing, or when running memory-intensive workloads like in-memory databases. Also reproducible when a mismatched memory kit is installed (different frequency, CAS latency, or vendor).

# Check for memory errors
Get-WinEvent -LogName 'System' -FilterXPath '*[System[Provider[@Name="Microsoft-Windows-WHEA-Logger"]]]' | Select TimeCreated,Message | Format-List
# Run memory diagnostic
mdsched.exe  # Schedule on next reboot
# Check event log for hardware errors
Get-EventLog -LogName System -EntryType Error -Source 'WHEA-Logger' -Newest 20

Root Cause Analysis

WHEA_UNCORRECTABLE_ERROR is caused by an unrecoverable hardware error reported by the platform hardware to the OS. Category 1: Memory — a DRAM cell reports a multi-bit error that ECC cannot correct. Category 2: CPU — a machine check exception (MCE) in the processor pipeline (core, cache, or interconnect). Category 3: PCIe — a fatal error on the PCIe bus connecting a storage or network adapter. Memory errors are most common after new RAM installation because of timing incompatibility (XMP/DOCP profile mismatch) or physical seating issues.

Quick Triage

Three quick checks to confirm WHEA as root cause before dismantling hardware.

# Quick WHEA triage (run as administrator)
Get-WinEvent -LogName 'System' -MaxEvents 50 | Where-Object {$_.ProviderName -eq 'Microsoft-Windows-WHEA-Logger'}
# Check minidump
ls C:WindowsMinidump
# Check hardware event log
Get-WinEvent -LogName 'Microsoft-Windows-Kernel-WHEA/Errors' -MaxEvents 10 -ErrorAction SilentlyContinue

Step-by-Step Diagnosis

Analyze the WHEA error source from Event Viewer. Event ID 18 = corrected memory error (warning), Event ID 19 = uncorrected memory error (fatal). The event message contains the physical address and type of error. Use WinDbg or Windows Debugger to analyse the minidump: run `!whea` and `!analyze -v` to determine if the error is memory, CPU, or bus-related. Check BIOS System Event Log (SEL) via iDRAC/iLO for DIMM slot identification.

# Analyse minidump in WinDbg (install from windbg.ms)
# Open File -> Open Crash Dump -> select latest .dmp file
# Run: !analyze -v
# Run: !whea
# Or use automated analysis:
windbg -c "!analyze -v; q" -z C:WindowsMinidumplatest.dmp
# PowerShell: parse minidump header
[System.IO.File]::ReadAllBytes('C:WindowsMinidumplatest.dmp')[0..3]
Illustrative mockup for windows-server-2025 — kernel_panic_or_bsod
Blue screen WHEA_UNCORRECTABLE_ERROR dump — Illustrative mockup — Progressive Robot

Solution — Primary Fix

Remove and re-seat memory modules one at a time. If using XMP/DOCP RAM profiles in BIOS, disable them and run at JEDEC default speeds. Run Windows Memory Diagnostic (mdsched.exe) in extended mode. If a specific DIMM fails, remove it. Test each DIMM slot independently using a known-good module.

Still having issues? Our Server Management team can diagnose and resolve this for you. Get in touch for a free consultation.

# Schedule extended memory test (runs on next reboot)
mdsched.exe  # Select: Restart now and check for problems
# Or via PowerShell:
Start-Process mdsched.exe
# After reboot test: results in Event Viewer -> Windows Logs -> System -> MemoryDiagnostics-Results
Get-EventLog -LogName System -Source 'Microsoft-Windows-MemoryDiagnostics-Results'
Illustrative mockup for windows-server-2025 — event_or_log_viewer
Event Viewer WHEA hardware error entries — Illustrative mockup — Progressive Robot

Solution — Alternative Approaches

Alternative 1: Run MemTest86 (bootable, more thorough than mdsched). Alternative 2: Server vendor diagnostic tool — Dell SupportAssist, HP Insight Diagnostics, or Lenovo DSA — for comprehensive hardware testing including memory, CPU, and PCIe. Alternative 3: If error is CPU MCE type, check Intel/AMD microcode update availability and apply via firmware update.

# Alt: Check for microcode updates
(Get-WmiObject Win32_Processor).Caption
# Verify CPU microcode via CPUID
# Update firmware to latest to include microcode patches
# Alt: disable faulty DIMM slot in BIOS until replacement

Verification & Acceptance Criteria

No WHEA events in Event Viewer for 48 hours under normal load. Memory diagnostic passes with no errors. Server completes a full stress test (e.g., Prime95 or HCI Memtest) without BSOD. Check minidump directory — no new .dmp files created.

# Continuous monitoring for WHEA events
$timer = New-Object System.Timers.Timer(300000)  # Check every 5 min
Register-ObjectEvent $timer Elapsed -Action {
    $errors = Get-WinEvent -LogName System -MaxEvents 10 | Where-Object {$_.ProviderName -eq 'Microsoft-Windows-WHEA-Logger' -and $_.TimeCreated -gt (Get-Date).AddMinutes(-5)}
    if ($errors) { Send-MailMessage -To '[email protected]' -Subject 'WHEA Error Detected' }
}

Rollback Plan

If removing a DIMM resolves the BSOD, do not return to original configuration until the replacement arrives. Disable the faulty DIMM slot in BIOS if the server supports hot-swap memory. Keep the minidump files for vendor RMA documentation. Do not modify memory speed settings back to XMP until faulty hardware is confirmed replaced.

# Disable XMP profile in BIOS if not accessible via CLI
# From BIOS Setup -> Advanced Memory -> XMP/DOCP: Disabled
# Revert to JEDEC default speeds for stability during testing

Prevention & Hardening

Preventive practices for memory-related BSOD: (1) Always check server hardware compatibility list (HCL) before purchasing RAM — use vendor-specific lists (Dell Memory Upgrade, HP QuickSpecs). (2) Enable ECC logging in BIOS and configure IPMI/BMC alerts for corrected ECC errors — a rising corrected error count predicts uncorrectable errors. (3) Test new memory before putting server in production using MemTest86 for at least one full pass. (4) Maintain consistent memory kits — do not mix vendors or speeds in a channel.

# Configure ECC error alerting via Windows event forwarding
wecutil qc  # Configure Windows Event Collector
# Subscribe to WHEA events from remote servers

Related: SYSTEM_SERVICE_EXCEPTION BSOD (usually driver, not hardware), MEMORY_MANAGEMENT BSOD (can be OS-level, not always RAM hardware), Page Fault in Non-Paged Area BSOD (often indicates corrupt kernel memory or driver). WHEA with error type ‘Cache Hierarchy Error’ points to CPU, not RAM.

View all Windows Server 2025 tutorials on the Tutorials Hub →

Browse all common problems & solutions on the Tutorials Hub.

References & Further Reading

Microsoft: ‘Bug Check 0x124 WHEA_UNCORRECTABLE_ERROR’ at learn.microsoft.com/windows-hardware/drivers/debugger. Windows Hardware Error Architecture (WHEA) reference at MSDN. MemTest86 documentation at memtest86.com. JEDEC memory standards at jedec.org. Intel MCA Architecture documentation in Intel SDM Volume 3B Chapter 15.

Need Expert Help?

If you cannot resolve this yourself, our team offers hands-on Server Management, Managed IT Services, and flexible Support Plans. Contact us today — we respond within one business day.