How to Perform Disaster Recovery Testing on Windows Server 2025

A disaster recovery plan that has never been tested is not a plan — it is a hypothesis. The uncomfortable truth that most IT organizations discover only during an actual disaster is that DR documentation written six months ago may not account for new servers, changed IP addresses, updated application dependencies, or service accounts that were rotated since the last manual review. Disaster recovery testing on Windows Server 2025 is the process of deliberately creating the conditions of a disaster in a controlled environment, measuring how long recovery actually takes compared to your documented RTO and RPO targets, and using that data to close gaps before they matter. This guide covers the full spectrum of DR testing methodologies — from tabletop exercises that require no infrastructure disruption to full test failovers of Hyper-V VMs and Azure Site Recovery environments — along with Active Directory restore testing, application validation, and the discipline of maintaining a living DR runbook.

Prerequisites

  • Windows Server 2025 production environment with at least Hyper-V, AD DS, or Azure workloads
  • An isolated Hyper-V host or isolated Azure VNet for recovery testing (must not touch production network)
  • Most recent backup or replication recovery points available and documented
  • Azure Site Recovery configured (if testing ASR failover — see the ASR guide for setup)
  • Windows Server Backup, Veeam, or equivalent backup software with successful recent backups
  • DR runbook document (even a draft) to test against
  • Test window scheduled with stakeholders (minimize change freeze conflicts)
  • Service account credentials for all critical applications documented in a vault (HashiCorp Vault, LAPS, or KeePass in a secure offline copy)

Step 1: Conduct a Tabletop Exercise Before Any Technical Testing

A tabletop exercise is a structured discussion-based walkthrough of your DR plan that identifies gaps without touching any infrastructure. Gather IT staff and key business stakeholders and walk through specific disaster scenarios step by step. Document every “I don’t know” and “we’d have to check” as action items to close before the technical test.

# Generate a current-state inventory to use as tabletop reference material
# Run this before the tabletop to ensure documentation is current

# All running services on critical servers
Invoke-Command -ComputerName @("dc01", "dc02", "fileserver01", "appserver01") -ScriptBlock {
    Get-Service | Where-Object { $_.Status -eq "Running" -and $_.StartType -eq "Automatic" } |
        Select-Object Name, DisplayName, Status |
        Sort-Object DisplayName
} | Export-Csv -Path "C:DRDocsRunningServices-$(Get-Date -Format 'yyyyMMdd').csv" -NoTypeInformation

# All installed applications
Invoke-Command -ComputerName @("appserver01", "webserver01") -ScriptBlock {
    Get-ItemProperty `
        "HKLM:SoftwareMicrosoftWindowsCurrentVersionUninstall*",
        "HKLM:SoftwareWow6432NodeMicrosoftWindowsCurrentVersionUninstall*" |
        Where-Object { $_.DisplayName } |
        Select-Object DisplayName, DisplayVersion, Publisher, InstallDate
} | Export-Csv -Path "C:DRDocsInstalledApps-$(Get-Date -Format 'yyyyMMdd').csv" -NoTypeInformation

# Network configuration snapshot (IPs, DNS, gateways)
Invoke-Command -ComputerName @("dc01", "dc02", "fileserver01") -ScriptBlock {
    Get-NetIPConfiguration | Select-Object InterfaceAlias, IPv4Address, IPv4DefaultGateway, DNSServer
} | Export-Csv -Path "C:DRDocsNetworkConfig-$(Get-Date -Format 'yyyyMMdd').csv" -NoTypeInformation

# Active Directory site topology for DC restore planning
Get-ADReplicationSite -Filter * | Select-Object Name, DistinguishedName |
    Export-Csv -Path "C:DRDocsADSites-$(Get-Date -Format 'yyyyMMdd').csv" -NoTypeInformation

Step 2: Test Hyper-V VM Recovery in an Isolated Environment

The most common DR test for on-premises workloads is exporting a production Hyper-V VM, importing it into an isolated Hyper-V host (one with no connectivity to the production network), and verifying that it boots and that its services are healthy. This tests your backup/export process, your import process, and your application stack.

# Export a production VM snapshot (not a live migration — this is a copy)
$vmName    = "appserver01"
$exportPath = "\backup-serverVMExports$(Get-Date -Format 'yyyyMMdd')"

# Create the export directory
New-Item -Path $exportPath -ItemType Directory -Force

# Export the VM (creates a complete exportable copy)
Export-VM -Name $vmName -Path $exportPath
Write-Output "Export completed: $exportPath$vmName"

# On the isolated RECOVERY Hyper-V host (run these commands there)
# The recovery host should have NO network path to production

$importSourcePath = "\backup-serverVMExports20260517appserver01"
$importVMPath     = "C:RecoveryVMs"
$importVHDPath    = "C:RecoveryVHDs"

# Get VMCX file for import
$vmcxFile = Get-ChildItem -Path "$importSourcePathVirtual Machines" -Filter "*.vmcx" |
    Select-Object -First 1

# Import as a new unique VM (generates new VM ID to avoid conflicts)
$importParams = @{
    Path              = $vmcxFile.FullName
    VirtualMachinePath = $importVMPath
    VhdDestinationPath = $importVHDPath
    Copy              = $true
    GenerateNewId     = $true
}

$recoveredVM = Import-VM @importParams
Write-Output "Imported VM ID: $($recoveredVM.VMId) | Name: $($recoveredVM.Name)"

# Disconnect from all networks before booting (isolation safety)
Get-VMNetworkAdapter -VM $recoveredVM | Disconnect-VMNetworkAdapter
Write-Output "Network adapters disconnected. VM is isolated."

# Boot the recovered VM
Start-VM -VM $recoveredVM

# Wait for the VM to reach a running state
$timeout  = [datetime]::Now.AddMinutes(10)
do {
    $state = (Get-VM -Id $recoveredVM.VMId).State
    Write-Output "$(Get-Date -Format 'HH:mm:ss') VM state: $state"
    Start-Sleep -Seconds 15
} until ($state -eq "Running" -or [datetime]::Now -gt $timeout)

Step 3: Verify Application Health on the Recovered VM

Booting successfully is not the same as recovering successfully. Once the isolated VM is running, connect to it via Hyper-V console (not the network — it is disconnected) and verify every critical service and application dependency is functional.

# Run these commands from within the recovered VM via Hyper-V console / PowerShell Direct
# PowerShell Direct connects to a Hyper-V VM without network connectivity

$recoveredVM  = Get-VM -Name "appserver01 - recovered"
$adminCred    = Get-Credential -Message "Enter local admin credentials for recovered VM"

# Connect via PowerShell Direct (Hyper-V host to guest, no network required)
$session = New-PSSession -VMName $recoveredVM.Name -Credential $adminCred

$healthReport = Invoke-Command -Session $session -ScriptBlock {
    $results = @{}

    # Check critical Windows services
    $services = @("W3SVC", "MSSQLSERVER", "WinRM", "EventLog", "BITS")
    $results.Services = foreach ($svc in $services) {
        $s = Get-Service -Name $svc -ErrorAction SilentlyContinue
        [PSCustomObject]@{
            Service = $svc
            Status  = if ($s) { $s.Status } else { "Not Found" }
        }
    }

    # Check IIS application pools
    if (Get-Module -ListAvailable -Name WebAdministration) {
        Import-Module WebAdministration
        $results.AppPools = Get-WebConfiguration -Filter "system.applicationHost/applicationPools/add" |
            Select-Object name, state
        $results.Sites = Get-Website | Select-Object Name, State, PhysicalPath
    }

    # Check SQL Server databases
    $sqlTest = Invoke-Sqlcmd -ServerInstance "localhost" `
        -Query "SELECT name, state_desc FROM sys.databases ORDER BY name" `
        -ErrorAction SilentlyContinue
    $results.Databases = $sqlTest

    # Check event log for boot errors
    $results.BootErrors = Get-WinEvent -LogName System -MaxEvents 100 |
        Where-Object { $_.LevelDisplayName -in @("Error", "Critical") -and
                       $_.TimeCreated -gt (Get-Date).AddHours(-1) } |
        Select-Object TimeCreated, Id, ProviderName, Message

    # Check disk space
    $results.DiskSpace = Get-PSDrive -PSProvider FileSystem |
        Select-Object Name, @{N="UsedGB";E={[math]::Round($_.Used/1GB,1)}},
                      @{N="FreeGB";E={[math]::Round($_.Free/1GB,1)}}

    return $results
}

# Output results
Write-Output "=== Service Status ==="
$healthReport.Services | Format-Table -AutoSize
Write-Output "`n=== IIS Sites ==="
$healthReport.Sites | Format-Table -AutoSize
Write-Output "`n=== SQL Databases ==="
$healthReport.Databases | Format-Table -AutoSize
Write-Output "`n=== Boot Errors (last hour) ==="
$healthReport.BootErrors | Format-Table TimeCreated, Id, ProviderName -AutoSize

Remove-PSSession $session

Step 4: Test Active Directory Restore (Non-Authoritative and Authoritative)

AD restore testing is the highest-stakes DR test you can perform because an incorrect restore — especially an authoritative restore in the wrong context — can replicate deleted or outdated objects across all domain controllers. Always perform AD restore tests on completely isolated VMs with no AD connectivity to production.

# On the isolated recovered DC VM (via PowerShell Direct or console)
# This VM must have NO network connectivity to production AD

# Step 1: Non-authoritative restore test (normal boot after backup restore)
# Verify the DC boots and AD DS service starts
Get-Service ADWS, NTDS, Netlogon, DNS, KDC | Select-Object Name, Status, StartType

# Check AD database integrity (run in DSRM or after stopping AD services for offline check)
# This is safe to run on the isolated recovered VM
& ntdsutil.exe @("activate instance ntds", "files", "integrity", "quit", "quit")

# Check replication metadata (will fail on isolated VM - expected)
repadmin /showrepl 2>&1 | Select-Object -First 20

# Verify the AD database is consistent
& ntdsutil.exe @("activate instance ntds", "files", "integrity", "quit", "quit")

# Step 2: Authoritative restore test - recover deleted objects
# Only safe in isolated environment with NO connectivity to production

# Simulate locating a "deleted" OU to restore
# In DSRM mode (boot with F8, select Directory Services Restore Mode)
# After restoring backup, mark objects authoritative:
$authRestoreCommands = @"
The authoritative restore process requires DSRM boot.
In a real recovery:
1. Boot DC to DSRM (hold Shift during restart, or bcdedit /set safeboot dsrepair)
2. Restore the AD backup (Windows Server Backup or equivalent)
3. Run: ntdsutil "activate instance ntds" "authoritative restore" 
         "restore subtree ou=Finance,dc=contoso,dc=com" "quit" "quit"
4. Reboot normally
5. Verify replicated to other DCs (on isolated test: verify locally)
"@

Write-Output $authRestoreCommands

# Verify DNS is resolving correctly on isolated DC
Resolve-DnsName -Name "contoso.local" -Server "127.0.0.1" -ErrorAction SilentlyContinue

# Test Kerberos ticket issuance (requires network in real test)
klist purge
klist

Step 5: Test Azure Site Recovery Failover

If your DR strategy includes ASR for physical-to-Azure replication, test failover within ASR is the cleanest way to validate cloud recovery. It spins up the Azure VM from the latest recovery point in an isolated VNet, allowing you to verify services without affecting production replication.

# Initiate ASR test failover (Azure PowerShell - run from management workstation)
Connect-AzAccount
$vault = Get-AzRecoveryServicesVault -Name "rsv-contoso-dr" -ResourceGroupName "rg-dr-eastus"
Set-AzRecoveryServicesAsrVaultContext -Vault $vault

# Get the protected item
$protectedItem = Get-AzRecoveryServicesAsrReplicationProtectedItem -FriendlyName "appserver01"

# Get the most recent application-consistent recovery point
$recoveryPoints = Get-AzRecoveryServicesAsrRecoveryPoint -ReplicationProtectedItem $protectedItem
$latestAppConsistent = $recoveryPoints |
    Where-Object { $_.RecoveryPointType -eq "AppConsistent" } |
    Sort-Object RecoveryPointTime -Descending |
    Select-Object -First 1

Write-Output "Recovery point: $($latestAppConsistent.RecoveryPointTime) [$($latestAppConsistent.RecoveryPointType)]"

# Calculate RPO at time of test
$rpoActual = (Get-Date) - $latestAppConsistent.RecoveryPointTime
Write-Output "Actual RPO at test time: $($rpoActual.TotalMinutes.ToString('F1')) minutes"

# Start test failover to isolated VNet
$isolatedVNet = Get-AzVirtualNetwork -Name "vnet-asr-test-isolated" -ResourceGroupName "rg-dr-eastus"

$tfJob = Start-AzRecoveryServicesAsrTestFailoverJob `
    -ReplicationProtectedItem $protectedItem `
    -RecoveryPoint $latestAppConsistent `
    -AzureVMNetworkId $isolatedVNet.Id

# Record start time for RTO measurement
$rtoStart = Get-Date
Write-Output "Test failover started at: $rtoStart"

# Monitor until complete
do {
    $status = Get-AzRecoveryServicesAsrJob -Job $tfJob
    Write-Output "$(Get-Date -Format 'HH:mm:ss') - State: $($status.State)"
    Start-Sleep -Seconds 30
} until ($status.State -in @("Succeeded", "Failed", "Cancelled"))

$rtoActual = (Get-Date) - $rtoStart
Write-Output "Test failover RTO: $($rtoActual.TotalMinutes.ToString('F1')) minutes"
Write-Output "Test failover status: $($status.State)"

Step 6: Document RTO/RPO Actuals and Clean Up

The measurements you collect during a DR test are the most valuable outputs of the exercise. Compare actual RTO and RPO against your documented targets, identify gaps, and capture lessons learned before cleaning up the test environment.

# Create a DR test report
$drTestReport = [PSCustomObject]@{
    TestDate               = Get-Date -Format "yyyy-MM-dd"
    Tester                 = $env:USERNAME
    TestType               = "ASR Test Failover + Hyper-V VM Restore"
    TargetRTO_Minutes      = 60
    ActualRTO_Minutes      = $rtoActual.TotalMinutes
    RTO_Met                = ($rtoActual.TotalMinutes -le 60)
    TargetRPO_Minutes      = 15
    ActualRPO_Minutes      = $rpoActual.TotalMinutes
    RPO_Met                = ($rpoActual.TotalMinutes -le 15)
    ServicesVerified       = "IIS, SQL Server, AD Auth, DNS"
    IssuesFound            = "SQL Agent service failed to start - dependency on MSDTC not documented"
    CorrectiveActions      = "Added MSDTC auto-start to DR runbook pre-checks"
    NextTestDate           = (Get-Date).AddMonths(3).ToString("yyyy-MM-dd")
}

$drTestReport | Export-Csv -Path "C:DRDocsDRTestReport-$(Get-Date -Format 'yyyyMMdd').csv" -NoTypeInformation
$drTestReport | Format-List

# Clean up ASR test failover (critical - must be done or next test failover will fail)
$cleanupJob = Start-AzRecoveryServicesAsrTestFailoverCleanupJob `
    -ReplicationProtectedItem $protectedItem `
    -Comment "Test validated on $(Get-Date -Format 'yyyy-MM-dd'). RTO: $([math]::Round($rtoActual.TotalMinutes,1)) min. Issues: SQL Agent dependency."

Get-AzRecoveryServicesAsrJob -Job $cleanupJob | Select-Object State, DisplayName, EndTime

# Clean up isolated Hyper-V recovered VMs
$isolatedVMs = Get-VM | Where-Object { $_.Name -like "*recovered*" -or $_.Name -like "*restore-test*" }
foreach ($vm in $isolatedVMs) {
    Stop-VM -VM $vm -Force -TurnOff
    Remove-VM -VM $vm -Force
    Write-Output "Removed: $($vm.Name)"
}

# Archive the VHD files from the import
Get-ChildItem -Path "C:RecoveryVHDs" -Filter "*.vhd*" |
    Remove-Item -Force
Write-Output "Recovery VHD files removed."

Step 7: Maintain the DR Runbook

A DR test that does not result in an updated runbook is only half-complete. Every gap found, every step that took longer than expected, and every undocumented dependency discovered during the test must be reflected in the runbook before the next test cycle.

# Generate a current-state runbook data dump to feed into documentation
# Run this after every test to capture the current state

$runbookData = @{
    GeneratedOn       = Get-Date -Format "yyyy-MM-dd HH:mm"
    DomainControllers = Get-ADDomainController -Filter * | 
        Select-Object Name, IPv4Address, IsGlobalCatalog, Site, OperationMasterRoles
    FSMORoles         = Get-ADDomain | Select-Object PDCEmulator, RIDMaster, InfrastructureMaster
    DNSZones          = Get-DnsServerZone -ErrorAction SilentlyContinue | 
        Select-Object ZoneName, ZoneType, IsDsIntegrated
    CriticalServers   = Invoke-Command -ComputerName @("appserver01","fileserver01","webserver01") -ScriptBlock {
        [PSCustomObject]@{
            Server   = $env:COMPUTERNAME
            IP       = (Get-NetIPAddress -AddressFamily IPv4 | Where-Object { $_.PrefixOrigin -ne "WellKnown" }).IPAddress -join ","
            Services = (Get-Service | Where-Object { $_.Status -eq "Running" -and $_.StartType -eq "Automatic" }).Count
        }
    }
}

# Export as XML for offline reference
$runbookData | Export-Clixml -Path "C:DRDocsRunbookData-$(Get-Date -Format 'yyyyMMdd').xml"
Write-Output "Runbook data exported to C:DRDocsRunbookData-$(Get-Date -Format 'yyyyMMdd').xml"

# Verify backup jobs completed successfully before closing out the test
Get-WBSummary | Select-Object LastBackupResultHR, LastSuccessfulBackupTime, LastBackupTime |
    Format-List

Conclusion

Disaster recovery testing on Windows Server 2025 is not a once-a-year checkbox activity — it is a continuous engineering discipline that closes the gap between the DR plan you wrote and the reality of what your team can actually execute under pressure. By combining tabletop exercises to surface documentation gaps, isolated Hyper-V VM restores to verify your backup integrity, ASR test failovers to validate cloud recovery, and AD restore tests to confirm your most critical infrastructure can be rebuilt, you build a comprehensive picture of your actual recovery capability rather than your theoretical one. Measure RTO and RPO actuals on every test, publish those numbers to leadership, and use them as the driving force for infrastructure investment decisions — whether that means better replication bandwidth, more frequent backup schedules, or dedicated DR environment infrastructure. A DR plan tested quarterly with lessons captured and acted on is exponentially more valuable than a perfect document that has never been challenged by reality.