Why Disaster Recovery Testing Matters

Disaster recovery plans that are never tested are disaster plans that will fail when you need them most. On Windows Server 2022 environments, backups may silently fail, recovery procedures may have drifted from reality, and key personnel may have changed — none of which becomes apparent until an actual incident. DR testing validates that your recovery time objectives (RTO) and recovery point objectives (RPO) are achievable, identifies gaps in documentation, trains staff on recovery procedures, and satisfies audit requirements from frameworks such as ISO 27001, SOC 2, and NIST 800-34.

A mature DR testing programme moves beyond simple backup checks to include full application validation, network restoration, database consistency verification, and stakeholder communication drills. This article covers the complete DR testing process for Windows Server 2022 environments.

Defining RTO and RPO

Before testing, your organisation must have formally documented RTO and RPO targets for each system. These define what success looks like in a DR test:

Recovery Time Objective (RTO): The maximum acceptable time from the moment of failure to the moment the system is available to users. For example, an RTO of 4 hours means the system must be back online within 4 hours of a disaster. RTO drives infrastructure choices — a 4-hour RTO may be achievable with tape restore, while a 15-minute RTO requires hot standby or cloud failover.

Recovery Point Objective (RPO): The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means you can tolerate losing up to 1 hour of data — meaning backups or replication must run at least every hour. An RPO of zero requires synchronous replication.

Document your RTO/RPO per system in a table stored in your DR runbook. A practical format:

# Example RTO/RPO documentation (stored as part of DR runbook)
System Name          | Tier | RTO    | RPO     | Backup Method
---------------------|------|--------|---------|------------------
SQL-PROD-01          | 1    | 1 hr   | 15 min  | SQL Always On + Azure ASR
FILE-SERVER-01       | 2    | 4 hrs  | 1 hr    | Robocopy + Veeam
DC-01 (Domain Ctrl)  | 1    | 2 hrs  | 30 min  | Active Directory + Veeam
WEB-APP-01           | 2    | 4 hrs  | 2 hrs   | Veeam Backup & Replication
MGMT-SERVER-01       | 3    | 8 hrs  | 4 hrs   | wbadmin full backup

DR Testing Methodologies

There are four main DR testing approaches, ordered by increasing thoroughness and disruption:

Tabletop Exercise: A discussion-based walkthrough where key personnel talk through the DR procedure step by step without actually executing anything. The facilitator presents a disaster scenario and team members describe what they would do at each step. Tabletops are low-risk and can be conducted in 2–4 hours. They are effective for identifying procedural gaps, training new staff, and reviewing updated runbooks. They do not validate that backups are valid or that systems actually recover.

Walkthrough Test: Team members actually review the DR runbook and verify that all prerequisites are in place — backup storage is accessible, recovery media exists, credentials are current, and contact lists are up to date. This is slightly more thorough than a tabletop but still does not involve actually restoring any system.

Simulation Test: A simulated disaster is declared for a subset of systems. Recovery procedures are executed in a non-production environment (isolated network or test lab). Systems are restored from backup and tested for functionality. Production is not interrupted. This is the most practical test type for regular use.

Full Interruption Test: Production systems are actually failed over or taken offline, and full recovery is performed. This is the most thorough test but carries the highest risk of extended downtime if the recovery fails. Full interruption tests should be rare (annually at most) and require executive sign-off, maintenance window scheduling, and rollback plans.

Testing Backup Restores with wbadmin

For Windows Server 2022 systems backed up with the built-in Windows Server Backup (wbadmin), restore testing involves selecting a backup version and restoring individual files or a full volume to a test location.

List available backups on a backup target:

wbadmin get versions -backupTarget:\backupserverWS2022Backups

Get details of a specific version (note the VersionIdentifier from the output above):

wbadmin get items -version:05/16/2026-23:00 -backupTarget:\backupserverWS2022Backups

Restore a specific folder to an alternate location for validation:

wbadmin start recovery `
    -version:05/16/2026-23:00 `
    -backupTarget:\backupserverWS2022Backups `
    -itemType:File `
    -items:C:DataCriticalDocs `
    -recoveryTarget:D:TestRestoreCriticalDocs `
    -quiet

After restore, validate file integrity by comparing checksums:

# Compare SHA256 hashes of original vs restored files
$sourceFiles = Get-ChildItem "C:DataCriticalDocs" -Recurse -File
$restoreFiles = Get-ChildItem "D:TestRestoreCriticalDocs" -Recurse -File

foreach ($src in $sourceFiles) {
    $relPath = $src.FullName.Substring("C:DataCriticalDocs".Length)
    $rst = Join-Path "D:TestRestoreCriticalDocs" $relPath
    if (Test-Path $rst) {
        $srcHash = (Get-FileHash $src.FullName -Algorithm SHA256).Hash
        $rstHash = (Get-FileHash $rst -Algorithm SHA256).Hash
        if ($srcHash -ne $rstHash) {
            Write-Warning "HASH MISMATCH: $relPath"
        }
    } else {
        Write-Warning "MISSING in restore: $relPath"
    }
}
Write-Host "File integrity check complete."

VM Failover Testing with Hyper-V Replica

If you use Hyper-V Replica to replicate Windows Server 2022 VMs to a secondary Hyper-V host, a test failover creates a temporary copy of the replicated VM using a separate virtual switch (so it does not conflict with the production VM). This is non-disruptive — the primary VM keeps running and replication continues.

Initiate a test failover in PowerShell on the replica host:

# On the Replica Hyper-V host
Start-VMFailover -VMName "WEB-APP-01" -AsTest

# Start the test VM
Start-VM -Name "WEB-APP-01 - Test"

Connect to the test VM console and validate:

vmconnect.exe localhost "WEB-APP-01 - Test"

After validation, remove the test failover VM:

Stop-VMFailover -VMName "WEB-APP-01"

Check replication health after the test to confirm replication resumed normally:

Get-VMReplication -VMName "WEB-APP-01" | 
    Select-Object VMName, State, Health, LastReplicationTime, ReplicationFrequencySec

Azure Site Recovery Test Failover

For VMs replicated to Azure via Azure Site Recovery, test failovers can be triggered from the Azure portal or via PowerShell. A test failover boots the VM in an isolated Azure VNet without affecting the protected item or production replication:

$rpi = Get-AzRecoveryServicesAsrReplicationProtectedItem `
    -Name "RP-WS2022-AppServer01" `
    -ProtectionContainer $protectionContainer

# Get the latest application-consistent recovery point
$recoveryPoint = Get-AzRecoveryServicesAsrRecoveryPoint -ReplicationProtectedItem $rpi |
    Where-Object { $_.RecoveryPointType -eq "AppConsistent" } |
    Sort-Object RecoveryPointTime -Descending |
    Select-Object -First 1

# Trigger test failover to isolated VNet
$testJob = Start-AzRecoveryServicesAsrTestFailoverJob `
    -ReplicationProtectedItem $rpi `
    -Direction PrimaryToRecovery `
    -AzureVMNetworkId "/subscriptions/_{/resourceGroups/rg-test/providers/Microsoft.Network/virtualNetworks/vnet-isolated" `
    -RecoveryPoint $recoveryPoint

# Wait for completion
Get-AzRecoveryServicesAsrJob -Job $testJob | Wait-AzRecoveryServicesAsrJob}

Once the VM is online in Azure, validate by connecting to it via Azure Bastion or RDP, checking that services are running, and running application-specific checks (e.g., querying a SQL Server database, verifying a web app responds on port 80/443).

Clean up after the test:

Start-AzRecoveryServicesAsrTestFailoverCleanupJob `
    -ReplicationProtectedItem $rpi `
    -Comment "DR test 2026-05-17: all checks passed"

Validating Application Functionality Post-Restore

Restoring a VM or file does not guarantee the application is working correctly. Post-restore validation should be scripted wherever possible to ensure consistent, repeatable checks. For a Windows Server 2022 IIS web application:

# Check IIS service is running on the restored server
$iisStatus = Invoke-Command -ComputerName "WEB-APP-01-TEST" -ScriptBlock {
    Get-Service W3SVC | Select-Object Name, Status
}

if ($iisStatus.Status -ne "Running") {
    Write-Error "IIS not running on restored server!"
}

# Check HTTP response
$response = Invoke-WebRequest -Uri "http://WEB-APP-01-TEST" -UseBasicParsing
if ($response.StatusCode -ne 200) {
    Write-Error "HTTP check failed: $($response.StatusCode)"
} else {
    Write-Host "HTTP check passed: 200 OK"
}

# Verify SQL Server connectivity from the web app server
$sqlCheck = Invoke-Command -ComputerName "WEB-APP-01-TEST" -ScriptBlock {
    $conn = New-Object System.Data.SqlClient.SqlConnection
    $conn.ConnectionString = "Server=SQL-PROD-01;Database=AppDB;Integrated Security=True;Connect Timeout=5"
    try { $conn.Open(); $conn.State } catch { "FAILED: $_" } finally { $conn.Close() }
}
Write-Host "SQL connectivity: $sqlCheck"

Documenting Test Results and Gaps

Every DR test must produce a written record. Create a test report that includes: date and time of test, systems tested, test type (tabletop/simulation/full), who participated, the simulated failure scenario, what passed, what failed, actual RTO/RPO achieved versus target, and corrective actions identified. Store this report with your DR runbook and track corrective actions in your change management system.

Example summary structure:

# DR Test Report - 2026-05-17
Test Type:        Simulation (isolated environment)
Systems Tested:   WEB-APP-01, SQL-PROD-01
Participants:     SysAdmin1, DBAdmin1, AppOwner1

Results:
  WEB-APP-01 restore from ASR test failover: PASSED
    Actual RTO: 48 minutes  (Target: 4 hrs)  PASS
    Actual RPO: 35 minutes  (Target: 2 hrs)  PASS
    HTTP health check: PASS
    SQL connectivity: PASS

  SQL-PROD-01 backup restore (wbadmin):
    Actual RTO: 3h 12min    (Target: 1 hr)   FAIL
    Actual RPO: 18 minutes  (Target: 15 min) FAIL (marginally)

Gaps Identified:
  1. SQL restore took 3+ hrs - investigate Veeam instant recovery
  2. wbadmin backup had a failed run on 2026-05-15 - investigate
  3. DR runbook recovery steps for SQL outdated - update required

Corrective Actions:
  Action 1: Evaluate Veeam Instant VM Recovery for SQL-PROD-01 (Owner: SysAdmin1, Due: 2026-06-01)
  Action 2: Fix wbadmin backup failure on SQL-PROD-01 (Owner: SysAdmin1, Due: 2026-05-20)
  Action 3: Update SQL recovery runbook section (Owner: DBAdmin1, Due: 2026-05-24)

Updating the DR Runbook

A DR runbook is only useful if it reflects current reality. After every test, review the runbook and update any steps that were inaccurate, missing, or outdated. Specifically verify: server names and IP addresses, credential locations (e.g., break-glass accounts in a password vault), backup locations, network topology diagrams, third-party vendor contact numbers, and escalation paths.

Store the runbook in at least two locations — one on-premises (printed copy or local share) and one in a cloud repository (SharePoint, Azure DevOps, or a secure cloud storage location) so it is accessible even if the on-premises environment is unavailable. Version-control the runbook to maintain history of changes.

DR Testing Frequency Best Practices

Industry guidance and compliance frameworks generally recommend the following DR testing schedule:

Monthly: Automated backup validation — verify that the last backup completed successfully, the backup file is restorable (test a file restore), and backup logs show no errors. This can be fully automated with a PowerShell script that runs after each backup job.

Quarterly: Simulation test for Tier 1 systems. Restore VMs in an isolated environment and validate application functionality against your checklist. Document results.

Annually: Tabletop exercise including business stakeholders for all critical systems. Consider a full interruption test for one Tier 1 system per year during a scheduled maintenance window.

On major change: After any significant infrastructure change (new server, DR site change, backup tool upgrade, network reconfiguration), run a simulation test for affected systems before the change goes into production.

Notifying Stakeholders

DR tests should follow a defined communication process. Before the test: notify the change management board, inform application owners, and schedule the test during a low-impact window. During the test: maintain a live status log in a shared location (Teams channel, shared document) so all participants can see progress. After the test: send a brief executive summary covering what was tested, pass/fail results, and corrective actions. For compliance audits, retain all DR test records for a minimum of three years, or as required by your applicable framework.

Links

Newsletter

Contact

How to Perform Disaster Recovery Testing on Windows Server 2022