What Is Cluster-Aware Updating and Why It Matters

Cluster-Aware Updating (CAU) is a Windows Server feature that automates the process of applying software updates to failover cluster nodes with zero planned downtime. Without CAU, patching a cluster requires a manual, error-prone procedure: drain one node, patch it, reboot, wait for it to rejoin, verify cluster health, then move to the next node. CAU automates every step of this cycle, including pausing nodes, live-migrating workloads, rebooting, validating cluster health, and resuming normal operations before moving to the subsequent node.

CAU is built into Windows Server 2022 and integrates directly with Windows Update, WSUS, and Windows Server Update Services. It supports all cluster workloads: Hyper-V clusters, Storage Spaces Direct (S2D) hyper-converged clusters, SQL Server Failover Cluster Instances (FCI), Scale-Out File Server (SOFS), and generic service clusters. CAU is managed via the Failover Cluster Manager GUI, PowerShell, or a dedicated Updating Run Coordinator role that can be hosted on the cluster itself.

CAU Operating Modes: Self-Updating vs Remote-Updating

CAU operates in two distinct modes, each suited to different operational preferences.

Self-Updating Mode — A CAU Clustered Role is added to the cluster itself. The cluster runs its own Updating Run Coordinator as a highly available role. Patching runs are triggered automatically on a schedule (weekly, monthly, etc.) without requiring an external machine. This mode is ideal for unattended, lights-out operations.

Remote-Updating Mode — An administrator triggers Updating Runs from a remote machine (or the CAU UI on their workstation). The remote machine acts as the Updating Run Coordinator. This mode is preferred when you want manual approval before each patch cycle or when integrating CAU into an existing change management workflow.

You can switch between modes at any time. The most common production pattern is to use Self-Updating Mode with a monthly schedule but to disable the automatic trigger and invoke runs manually after reviewing patch content in WSUS — giving you schedule-awareness without fully automated application.

Installing the CAU Feature and Tools

CAU tools are included in the Failover Clustering feature set. Ensure the required components are present on your management workstation and cluster nodes:

# Install on each cluster node
Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools

# Install CAU tools on remote management workstation (RSAT)
Install-WindowsFeature -Name RSAT-Clustering-CauUI, RSAT-Clustering-PowerShell

Verify the CAU PowerShell module is available:

Get-Command -Module ClusterAwareUpdating | Select-Object Name

This should return commands including Add-CauClusterRole, Invoke-CauRun, Get-CauReport, Set-CauClusterRole, and Test-CauSetup.

Configuring Self-Updating Mode with Add-CauClusterRole

To enable Self-Updating Mode, add the CAU Clustered Role to your cluster. This creates a clustered role that owns the Updating Run Coordinator function and runs on a schedule:

Add-CauClusterRole -ClusterName "SQLCLUSTER01" `
    -EnableFirewallRules `
    -Force `
    -CauPluginName "Microsoft.WindowsUpdatePlugin" `
    -MaxFailedNodes 1 `
    -MaxRetriesPerNode 2 `
    -RequireAllNodesOnline `
    -WarnAfter (New-TimeSpan -Hours 2) `
    -StopAfter (New-TimeSpan -Hours 6) `
    -RebootTimeoutMinutes 15 `
    -DaysOfWeek Sunday `
    -IntervalWeeks 4 `
    -StartTime "02:00"

Breaking down key parameters:

MaxFailedNodes — Number of nodes that can fail to update before CAU aborts the entire run. Set to 1 for a 2-node cluster (one failure would leave no running node) or a small number for larger clusters.

RequireAllNodesOnline — Prevents the run from starting if any node is offline. Recommended to avoid updating an already-degraded cluster.

StopAfter — Hard ceiling on run duration. If patching is still in progress after 6 hours, CAU aborts the run cleanly. Prevents indefinitely stalled updates from blocking cluster operations.

RebootTimeoutMinutes — How long CAU waits for a node to come back online after a reboot before declaring it failed.

Verify the CAU role was created successfully:

Get-CauClusterRole -ClusterName "SQLCLUSTER01" | Format-List

Triggering a Manual CAU Run with Invoke-CauRun

Even with Self-Updating Mode configured, you can trigger an immediate Updating Run manually using Invoke-CauRun. This is useful after a critical security patch is released outside the normal schedule:

# Perform a WhatIf dry run first to see what would happen
Invoke-CauRun -ClusterName "SQLCLUSTER01" `
    -CauPluginName "Microsoft.WindowsUpdatePlugin" `
    -MaxFailedNodes 1 `
    -MaxRetriesPerNode 2 `
    -RequireAllNodesOnline `
    -RebootTimeoutMinutes 15 `
    -WhatIf

Review the WhatIf output, then execute the real run:

Invoke-CauRun -ClusterName "SQLCLUSTER01" `
    -CauPluginName "Microsoft.WindowsUpdatePlugin" `
    -MaxFailedNodes 1 `
    -MaxRetriesPerNode 2 `
    -RequireAllNodesOnline `
    -RebootTimeoutMinutes 15 `
    -Force

Invoke-CauRun runs synchronously by default, returning progress to the console. Add -Async to return immediately and monitor with Get-CauRun.

Pointing CAU at WSUS Instead of Windows Update

In air-gapped or controlled enterprise environments, you likely use WSUS or SCCM/MECM rather than public Windows Update. CAU integrates with WSUS via Group Policy — if the cluster nodes are already pointed at WSUS through GPO, the Microsoft.WindowsUpdatePlugin will use that WSUS server automatically. No additional CAU configuration is needed.

Verify WSUS configuration on a node:

$wuReg = "HKLM:SOFTWAREPoliciesMicrosoftWindowsWindowsUpdate"
Get-ItemProperty -Path $wuReg | Select-Object WUServer, WUStatusServer, UseWUServer

For more granular control — such as specifying a particular WSUS update approval group or filtering by KB article — use the Microsoft.HotfixPlugin instead. This plugin reads a hotfix config file listing specific patches to apply:

Invoke-CauRun -ClusterName "SQLCLUSTER01" `
    -CauPluginName "Microsoft.HotfixPlugin" `
    -CauPluginArguments @{HotfixRootFolderPath = "\fileserverCauPatches2026-05"} `
    -MaxFailedNodes 1 `
    -Force

Pre-Update and Post-Update Scripts

CAU supports pre-update and post-update scripts that run at the beginning and end of each Updating Run (not per-node). These scripts are useful for sending notifications, taking VSS snapshots, or pausing monitoring alerts to avoid false positives during the patch window.

Set-CauClusterRole -ClusterName "SQLCLUSTER01" `
    -PreUpdateScript "\dc01scriptscau-preupdate.ps1" `
    -PostUpdateScript "\dc01scriptscau-postupdate.ps1" `
    -Force

Example pre-update script that sends a Slack/Teams webhook notification and acknowledges the monitoring alert:

# cau-preupdate.ps1
param(
    [string]$ClusterName,
    [string]$UpdateRunName
)

$webhookUrl = "https://hooks.example.com/services/YOUR/WEBHOOK/HERE"
$body = @{
    text = "CAU Updating Run starting on cluster $ClusterName at $(Get-Date -Format 'yyyy-MM-dd HH:mm'). Maintenance window active."
} | ConvertTo-Json

Invoke-RestMethod -Uri $webhookUrl -Method Post -Body $body -ContentType "application/json"

# Disable monitoring for cluster nodes during patching
# Set-MonitoringMaintenance -Hosts (Get-ClusterNode -Cluster $ClusterName).Name -Duration 120
Write-Host "Pre-update script completed."

CAU with Storage Spaces Direct (S2D) Hyper-Converged Clusters

Storage Spaces Direct clusters require extra care during patching because each node contributes both compute and storage capacity. Draining a node for patching removes storage capacity from the pool, which can trigger S2D to begin repair operations. CAU handles this correctly by pausing and draining the node — S2D storage remains accessible through the other nodes — but you should ensure your S2D cluster is healthy and has sufficient resiliency before running CAU.

Check S2D health before triggering CAU:

# Verify all virtual disks are healthy
Get-VirtualDisk | Select-Object FriendlyName, HealthStatus, OperationalStatus

# Verify storage pool health
Get-StoragePool | Where-Object IsPrimordial -eq $false | Select-Object FriendlyName, HealthStatus

# Check that no repair jobs are already running
Get-StorageJob

CAU for S2D also uses Storage Maintenance Mode on the node’s storage contribution. When CAU drains a node, it calls Suspend-ClusterNode with the -Drain flag, which triggers S2D to make repair copies of data hosted on that node’s drives before allowing the drain to complete. This ensures no data loss even if the node fails to come back after the reboot.

For S2D clusters, increase the RebootTimeoutMinutes significantly because after rebooting, the node must resync with the storage pool before being considered healthy. A value of 30–60 minutes is appropriate for large S2D deployments:

Set-CauClusterRole -ClusterName "S2DCLUSTER01" `
    -RebootTimeoutMinutes 45 `
    -Force

Reviewing CAU Run History with Get-CauReport

CAU maintains a run history that you can query with Get-CauReport. This is essential for compliance reporting, post-patch verification, and troubleshooting failed runs:

# Get all past runs
Get-CauReport -ClusterName "SQLCLUSTER01" | Select-Object RunStartTime, RunEndTime, Succeeded, Failed

# Get detailed report for the most recent run
$latestRun = Get-CauReport -ClusterName "SQLCLUSTER01" -Last 1
$latestRun | Format-List

# Get node-level detail from the last run
Get-CauReport -ClusterName "SQLCLUSTER01" -Last 1 -Detailed

CAU also writes events to the event log. Check the CAU channel:

Get-WinEvent -LogName "Microsoft-Windows-ClusterAwareUpdating/Admin" |
    Where-Object { $_.TimeCreated -gt (Get-Date).AddDays(-7) } |
    Select-Object TimeCreated, Id, LevelDisplayName, Message |
    Format-List

Validating CAU Readiness with Test-CauSetup

Before your first CAU run, or after cluster topology changes, run Test-CauSetup to validate that CAU can function correctly. This cmdlet checks firewall rules, WMI accessibility, cluster health, and CAU plugin availability:

Test-CauSetup -ClusterName "SQLCLUSTER01" -Verbose

Address any warnings or errors returned before proceeding. Common issues include Windows Firewall blocking the required ports (TCP 135, 445, dynamic RPC) and the Remote Registry service being disabled on cluster nodes.

CAU for SQL Server FCI and Hyper-V Clusters

For SQL Server Failover Cluster Instances, CAU drains a node by failing over the SQL Server role to another node before patching. This ensures the SQL listener remains available throughout the process. SQL Server FCI patching should be coordinated with SQL maintenance windows because the failover, while fast (typically under 30 seconds), does interrupt active connections briefly.

For Hyper-V clusters, CAU initiates live migration of all running VMs off each node before draining it. Verify live migration is configured and functional before relying on CAU:

# Verify live migration settings on all Hyper-V hosts in cluster
$clusterNodes = (Get-ClusterNode -Cluster "HVCLUSTER01").Name
foreach ($node in $clusterNodes) {
    $vmHost = Get-VMHost -ComputerName $node
    Write-Host "$node - LiveMigration: $($vmHost.VirtualMachineMigrationEnabled), " +
               "MaxConcurrent: $($vmHost.MaximumVirtualMachineMigrations)"
}

With CAU properly configured, Windows Server 2022 failover clusters can apply monthly cumulative updates, .NET patches, and Windows Defender definition updates with zero planned downtime and full audit trails, transforming cluster maintenance from a high-risk weekend operation into a routine automated process.

Summary

Cluster-Aware Updating on Windows Server 2022 provides a robust, built-in solution for zero-downtime cluster patching. By choosing the appropriate mode (Self-Updating for automation, Remote-Updating for manual control), configuring sensible safety parameters like MaxFailedNodes and RebootTimeoutMinutes, leveraging pre/post scripts for notifications, and validating health before runs — especially on S2D clusters — you can make cluster patching a low-risk, repeatable operation that meets both security requirements and uptime SLAs.