📖 ~3 min read
Table of contents
Symptom & Impact
After installing new RAM modules or following a hardware maintenance window, Windows Server 2025 intermittently crashes with a blue screen displaying STOP code WHEA_UNCORRECTABLE_ERROR (0x00000124). The crash occurs under load, sometimes during boot, and produces a minidump file in C:WindowsMinidump. WHEA (Windows Hardware Error Architecture) errors indicate an uncorrectable hardware error — most commonly ECC memory errors, CPU errors, or chipset/PCIe bus errors. The impact is severe: unexpected server reboots, potential data corruption if a write was in progress, and crash-dump analysis overhead. On a Hyper-V host, all running VMs are terminated with the host BSOD.
Environment & Reproduction
Confirmed on Windows Server 2025 with DDR4/DDR5 ECC registered memory. Triggers under high memory utilization (>70% of installed RAM), during SQL Server large query processing, or when running memory-intensive workloads like in-memory databases. Also reproducible when a mismatched memory kit is installed (different frequency, CAS latency, or vendor).
# Check for memory errors
Get-WinEvent -LogName 'System' -FilterXPath '*[System[Provider[@Name="Microsoft-Windows-WHEA-Logger"]]]' | Select TimeCreated,Message | Format-List
# Run memory diagnostic
mdsched.exe # Schedule on next reboot
# Check event log for hardware errors
Get-EventLog -LogName System -EntryType Error -Source 'WHEA-Logger' -Newest 20
Root Cause Analysis
WHEA_UNCORRECTABLE_ERROR is caused by an unrecoverable hardware error reported by the platform hardware to the OS. Category 1: Memory — a DRAM cell reports a multi-bit error that ECC cannot correct. Category 2: CPU — a machine check exception (MCE) in the processor pipeline (core, cache, or interconnect). Category 3: PCIe — a fatal error on the PCIe bus connecting a storage or network adapter. Memory errors are most common after new RAM installation because of timing incompatibility (XMP/DOCP profile mismatch) or physical seating issues.
Quick Triage
Three quick checks to confirm WHEA as root cause before dismantling hardware.
# Quick WHEA triage (run as administrator)
Get-WinEvent -LogName 'System' -MaxEvents 50 | Where-Object {$_.ProviderName -eq 'Microsoft-Windows-WHEA-Logger'}
# Check minidump
ls C:WindowsMinidump
# Check hardware event log
Get-WinEvent -LogName 'Microsoft-Windows-Kernel-WHEA/Errors' -MaxEvents 10 -ErrorAction SilentlyContinue
Step-by-Step Diagnosis
Analyze the WHEA error source from Event Viewer. Event ID 18 = corrected memory error (warning), Event ID 19 = uncorrected memory error (fatal). The event message contains the physical address and type of error. Use WinDbg or Windows Debugger to analyse the minidump: run `!whea` and `!analyze -v` to determine if the error is memory, CPU, or bus-related. Check BIOS System Event Log (SEL) via iDRAC/iLO for DIMM slot identification.
# Analyse minidump in WinDbg (install from windbg.ms)
# Open File -> Open Crash Dump -> select latest .dmp file
# Run: !analyze -v
# Run: !whea
# Or use automated analysis:
windbg -c "!analyze -v; q" -z C:WindowsMinidumplatest.dmp
# PowerShell: parse minidump header
[System.IO.File]::ReadAllBytes('C:WindowsMinidumplatest.dmp')[0..3]

Solution — Primary Fix
Remove and re-seat memory modules one at a time. If using XMP/DOCP RAM profiles in BIOS, disable them and run at JEDEC default speeds. Run Windows Memory Diagnostic (mdsched.exe) in extended mode. If a specific DIMM fails, remove it. Test each DIMM slot independently using a known-good module.
Still having issues? Our Server Management team can diagnose and resolve this for you. Get in touch for a free consultation.
# Schedule extended memory test (runs on next reboot)
mdsched.exe # Select: Restart now and check for problems
# Or via PowerShell:
Start-Process mdsched.exe
# After reboot test: results in Event Viewer -> Windows Logs -> System -> MemoryDiagnostics-Results
Get-EventLog -LogName System -Source 'Microsoft-Windows-MemoryDiagnostics-Results'

Solution — Alternative Approaches
Alternative 1: Run MemTest86 (bootable, more thorough than mdsched). Alternative 2: Server vendor diagnostic tool — Dell SupportAssist, HP Insight Diagnostics, or Lenovo DSA — for comprehensive hardware testing including memory, CPU, and PCIe. Alternative 3: If error is CPU MCE type, check Intel/AMD microcode update availability and apply via firmware update.
# Alt: Check for microcode updates
(Get-WmiObject Win32_Processor).Caption
# Verify CPU microcode via CPUID
# Update firmware to latest to include microcode patches
# Alt: disable faulty DIMM slot in BIOS until replacement
Verification & Acceptance Criteria
No WHEA events in Event Viewer for 48 hours under normal load. Memory diagnostic passes with no errors. Server completes a full stress test (e.g., Prime95 or HCI Memtest) without BSOD. Check minidump directory — no new .dmp files created.
# Continuous monitoring for WHEA events
$timer = New-Object System.Timers.Timer(300000) # Check every 5 min
Register-ObjectEvent $timer Elapsed -Action {
$errors = Get-WinEvent -LogName System -MaxEvents 10 | Where-Object {$_.ProviderName -eq 'Microsoft-Windows-WHEA-Logger' -and $_.TimeCreated -gt (Get-Date).AddMinutes(-5)}
if ($errors) { Send-MailMessage -To '[email protected]' -Subject 'WHEA Error Detected' }
}
Rollback Plan
If removing a DIMM resolves the BSOD, do not return to original configuration until the replacement arrives. Disable the faulty DIMM slot in BIOS if the server supports hot-swap memory. Keep the minidump files for vendor RMA documentation. Do not modify memory speed settings back to XMP until faulty hardware is confirmed replaced.
# Disable XMP profile in BIOS if not accessible via CLI
# From BIOS Setup -> Advanced Memory -> XMP/DOCP: Disabled
# Revert to JEDEC default speeds for stability during testing
Prevention & Hardening
Preventive practices for memory-related BSOD: (1) Always check server hardware compatibility list (HCL) before purchasing RAM — use vendor-specific lists (Dell Memory Upgrade, HP QuickSpecs). (2) Enable ECC logging in BIOS and configure IPMI/BMC alerts for corrected ECC errors — a rising corrected error count predicts uncorrectable errors. (3) Test new memory before putting server in production using MemTest86 for at least one full pass. (4) Maintain consistent memory kits — do not mix vendors or speeds in a channel.
# Configure ECC error alerting via Windows event forwarding
wecutil qc # Configure Windows Event Collector
# Subscribe to WHEA events from remote servers
Related Errors & Cross-Refs
Related: SYSTEM_SERVICE_EXCEPTION BSOD (usually driver, not hardware), MEMORY_MANAGEMENT BSOD (can be OS-level, not always RAM hardware), Page Fault in Non-Paged Area BSOD (often indicates corrupt kernel memory or driver). WHEA with error type ‘Cache Hierarchy Error’ points to CPU, not RAM.
View all Windows Server 2025 tutorials on the Tutorials Hub →
Browse all common problems & solutions on the Tutorials Hub.
References & Further Reading
Microsoft: ‘Bug Check 0x124 WHEA_UNCORRECTABLE_ERROR’ at learn.microsoft.com/windows-hardware/drivers/debugger. Windows Hardware Error Architecture (WHEA) reference at MSDN. MemTest86 documentation at memtest86.com. JEDEC memory standards at jedec.org. Intel MCA Architecture documentation in Intel SDM Volume 3B Chapter 15.
Need Expert Help?
If you cannot resolve this yourself, our team offers hands-on Server Management, Managed IT Services, and flexible Support Plans. Contact us today — we respond within one business day.