How to Configure Failover Clustering on Windows Server 2022

Windows Server Failover Clustering (WSFC) provides high availability for server roles and applications by grouping multiple servers (nodes) into a cluster. If one node fails, the cluster automatically moves the workload (a clustered role) to another healthy node — a process called failover. Windows Server 2022 introduces several improvements to failover clustering including cluster sets, Azure integration, improved cluster upgrade processes, and enhanced diagnostics. Failover Clustering is the foundation for SQL Server Always On Availability Groups, Hyper-V high availability, Scale-Out File Servers, and Storage Spaces Direct.

Prerequisites

Before building a failover cluster, verify that your environment meets these requirements. All nodes must run the same edition and version of Windows Server. Nodes should be in the same Active Directory domain (workgroup clusters are supported but have limitations). For a traditional cluster with shared storage, nodes must be connected to the same shared storage (iSCSI, Fibre Channel, or SAS). Each node needs at least two network adapters: one for management/client traffic, one for cluster heartbeat and storage traffic (separate networks prevent a single NIC failure from triggering false failovers).

Hardware should be on the Windows Server Catalog. All nodes should be identical in hardware configuration for predictable failover behavior. For the cluster IP address and cluster name, ensure your DNS server will accept dynamic DNS registration or create the DNS record manually in advance.

Installing the Failover Clustering Feature

Install the Failover Clustering feature on all nodes that will be cluster members. You can install from a management machine targeting remote nodes, or run locally on each node:

# Install on all nodes simultaneously using Invoke-Command
$nodes = @("node01","node02","node03")
Invoke-Command -ComputerName $nodes -ScriptBlock {
    Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools -Restart
}

# Alternatively, install locally on each node
Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools -Restart

# After reboot, verify installation
Get-WindowsFeature -Name Failover-Clustering | Select DisplayName, InstallState, FeatureType

If you are building a Hyper-V cluster, also install the Hyper-V role on all nodes before cluster creation:

Invoke-Command -ComputerName $nodes -ScriptBlock {
    Install-WindowsFeature -Name Hyper-V, Hyper-V-PowerShell, RSAT-Hyper-V-Tools -Restart
}

Running Cluster Validation (Test-Cluster)

Cluster validation is mandatory before creating a production cluster. The Test-Cluster cmdlet runs a comprehensive suite of tests that verify hardware compatibility, network configuration, storage connectivity, and Active Directory prerequisites. Microsoft Support will not assist with clusters that have not passed validation. Run Test-Cluster from one of the nodes or a management machine:

# Full validation — tests all categories
Test-Cluster -Node "node01","node02","node03" -ReportName "ClusterValidation_$(Get-Date -Format yyyyMMdd)"

# Run validation with specific test categories
Test-Cluster -Node "node01","node02","node03" -Include "Inventory","Network","System Configuration","Storage" -ReportName "ClusterValidation_Storage"

# For S2D clusters — include S2D-specific tests and exclude legacy storage tests
Test-Cluster -Node "node01","node02","node03" -Include "Storage Spaces Direct","Inventory","Network","System Configuration" -ReportName "ClusterValidation_S2D"

Test-Cluster generates an HTML report in the Documents folder of the current user. Open it and review every item. Warnings in the Network category about node connectivity or adapter teaming configuration must be understood and accepted. Failures in the Storage or System Configuration categories typically must be resolved before creating the cluster.

Creating the Failover Cluster

After validation passes, create the cluster with New-Cluster. The cluster name becomes a computer object in Active Directory (the Cluster Name Object, or CNO). Ensure the account creating the cluster has permissions to create computer objects in the target AD OU:

# Create a cluster with a static management IP
New-Cluster -Name "PROD-CLUSTER01" -Node "node01","node02","node03" -StaticAddress "10.10.1.100" -AdministrativeAccessPoint ActiveDirectoryAndDns

# Create a cluster with DNS-only access point (no AD CNO — useful for workgroup or when AD unavailable)
New-Cluster -Name "PROD-CLUSTER01" -Node "node01","node02","node03" -StaticAddress "10.10.1.100" -AdministrativeAccessPoint Dns

# For S2D clusters — do not add storage during creation
New-Cluster -Name "S2D-CLUSTER01" -Node "node01","node02","node03" -NoStorage -StaticAddress "10.10.1.200"

# Verify the cluster was created
Get-Cluster -Name "PROD-CLUSTER01" | Select Name, SharedVolumesRoot, Domain

After cluster creation, verify all nodes are online and healthy:

Get-ClusterNode -Cluster "PROD-CLUSTER01" | Select Name, State, NodeWeight | Format-Table -AutoSize

Quorum Configuration

The quorum mechanism determines how many nodes (and witness votes) are needed for the cluster to remain operational. Without quorum (majority of votes), the cluster shuts down to prevent a split-brain scenario where two isolated node groups both believe they own the workload. Windows Server 2022 supports four quorum witness types:

Node Majority: No witness. Works for odd-numbered clusters. Requires (N/2)+1 nodes to remain online. Suitable for 3, 5, 7 node clusters. Disk Witness: A small clustered disk acts as a vote. Suitable for traditional clusters with shared storage. File Share Witness: A file share on a remote server acts as a vote. Does not store cluster data. Cloud Witness: An Azure Blob Storage account acts as the witness. Best for geographically distributed clusters or when a local share/disk witness is not available.

# Set quorum to Node Majority (no witness) — for 3+ node clusters with odd count
Set-ClusterQuorum -Cluster "PROD-CLUSTER01" -NodeMajority

# Set a File Share Witness
Set-ClusterQuorum -Cluster "PROD-CLUSTER01" -FileShareWitness "\witness-serverClusterWitnessPROD-CLUSTER01"

# Set a Disk Witness (specify the cluster disk resource name)
Set-ClusterQuorum -Cluster "PROD-CLUSTER01" -DiskWitness "Cluster Disk 1"

# Set a Cloud Witness (Azure Blob Storage)
Set-ClusterQuorum -Cluster "PROD-CLUSTER01" -CloudWitness -AccountName "myclusterwitness" -AccessKey "base64encodedkey=="

# Check current quorum configuration
Get-ClusterQuorum -Cluster "PROD-CLUSTER01" | Select Cluster, QuorumResource, QuorumType

Configuring Cloud Witness (Azure Blob)

Cloud Witness requires an Azure Storage Account (Standard tier, LRS or GRS redundancy). The cluster writes a small blob file to the storage account, which serves as the witness vote. Create the storage account in a region closest to your cluster for lowest latency:

# Create Azure Storage Account (requires Az PowerShell module on management machine)
Connect-AzAccount
$rg = New-AzResourceGroup -Name "ClusterWitness-RG" -Location "East US"
$sa = New-AzStorageAccount -ResourceGroupName "ClusterWitness-RG" -Name "prodclusterwitness" -SkuName "Standard_LRS" -Kind "StorageV2" -Location "East US"

# Get the primary access key
$key = (Get-AzStorageAccountKey -ResourceGroupName "ClusterWitness-RG" -Name "prodclusterwitness")[0].Value

# Set Cloud Witness on the cluster using the storage account name and key
Set-ClusterQuorum -Cluster "PROD-CLUSTER01" -CloudWitness -AccountName "prodclusterwitness" -AccessKey $key

# Verify Cloud Witness is configured
Get-ClusterQuorum -Cluster "PROD-CLUSTER01"

Cloud Witness does not require outbound internet access from cluster nodes if you configure a service endpoint for Azure Blob Storage on your virtual network. For on-premises clusters, ensure TCP 443 (HTTPS) outbound to *.blob.core.windows.net is permitted through your firewall.

Cluster Network Configuration

After cluster creation, review and configure cluster networks. Each distinct IP subnet is identified as a separate cluster network. Assign roles to networks: management/client traffic, cluster internal (heartbeat), and storage:

# List all cluster networks and their roles
Get-ClusterNetwork -Cluster "PROD-CLUSTER01" | Select Name, State, Role, Address, AddressMask | Format-Table -AutoSize

# Rename cluster networks for clarity
(Get-ClusterNetwork -Cluster "PROD-CLUSTER01" "Cluster Network 1").Name = "Management"
(Get-ClusterNetwork -Cluster "PROD-CLUSTER01" "Cluster Network 2").Name = "Heartbeat"
(Get-ClusterNetwork -Cluster "PROD-CLUSTER01" "Cluster Network 3").Name = "Storage"

# Set network roles:
# Role 0 = Do not allow cluster network communication
# Role 1 = Allow cluster network communication only
# Role 3 = Allow clients to connect (management + heartbeat)
(Get-ClusterNetwork -Cluster "PROD-CLUSTER01" "Heartbeat").Role = 1  # Heartbeat only
(Get-ClusterNetwork -Cluster "PROD-CLUSTER01" "Management").Role = 3  # Management + heartbeat
(Get-ClusterNetwork -Cluster "PROD-CLUSTER01" "Storage").Role = 0  # Storage — no cluster comms

# Verify network interface assignments per node
Get-ClusterNetworkInterface -Cluster "PROD-CLUSTER01" | Select Node, Name, Network, IPv4Addresses | Format-Table -AutoSize

Adding Storage to the Cluster

For traditional clusters with shared storage (iSCSI/FC/SAS), after connecting the shared disks, add them to the cluster. Cluster Shared Volumes (CSV) allow all nodes to access the same disk simultaneously:

# Scan for eligible disks and add them to the cluster
Get-ClusterAvailableDisk -Cluster "PROD-CLUSTER01" | Add-ClusterDisk

# List cluster disks
Get-ClusterResource -Cluster "PROD-CLUSTER01" | Where-Object { $_.ResourceType -eq "Physical Disk" } | Format-Table Name, State, OwnerNode -AutoSize

# Add a cluster disk to CSV (Cluster Shared Volumes)
Add-ClusterSharedVolume -Name "Cluster Disk 2" -Cluster "PROD-CLUSTER01"

# List all CSVs
Get-ClusterSharedVolume -Cluster "PROD-CLUSTER01" | Select Name, State, OwnerNode | Format-Table -AutoSize

Cluster-Aware Updating (CAU)

Cluster-Aware Updating (CAU) automates the process of applying Windows Updates to cluster nodes while maintaining application availability. CAU coordinates node evacuation, patching, and rejoining one node at a time. Install the CAU feature:

Install-WindowsFeature -Name RSAT-Clustering-AutomationServer -IncludeManagementTools

# Configure CAU self-updating mode (cluster manages its own updates)
Add-CauClusterRole -ClusterName "PROD-CLUSTER01" -EnableFirewallRules $true -Force -CauPluginName "Microsoft.WindowsUpdatePlugin" -RequireAllNodesConnected $true -MaxFailedNodes 1

# Run a CAU preview to see what would be updated
Invoke-CauScan -ClusterName "PROD-CLUSTER01" -CauPluginName "Microsoft.WindowsUpdatePlugin" | Select NodeName, UpdateId, Title | Format-Table -AutoSize

# Trigger a CAU run immediately
Invoke-CauRun -ClusterName "PROD-CLUSTER01" -CauPluginName "Microsoft.WindowsUpdatePlugin" -MaxFailedNodes 1 -MaxRetriesPerNode 2 -RequireAllNodesConnected $true -Force

# Check CAU run status
Get-CauRun -ClusterName "PROD-CLUSTER01"

Monitoring Cluster Health

Monitoring cluster health is critical for early detection of problems before they lead to downtime. Use Get-ClusterNode, Get-ClusterResource, and event logs:

# Check all node states (Up/Down/Paused)
Get-ClusterNode -Cluster "PROD-CLUSTER01" | Select Name, State, NodeWeight, DrainStatus | Format-Table -AutoSize

# Check all cluster resources and their states
Get-ClusterResource -Cluster "PROD-CLUSTER01" | Select Name, State, OwnerNode, ResourceType | Format-Table -AutoSize

# Get cluster groups (roles) and which node owns them
Get-ClusterGroup -Cluster "PROD-CLUSTER01" | Select Name, State, OwnerNode, Priority | Format-Table -AutoSize

# Check for failed cluster resources
Get-ClusterResource -Cluster "PROD-CLUSTER01" | Where-Object { $_.State -ne "Online" } | Format-Table Name, State, OwnerNode -AutoSize

# Get cluster health summary
Get-Cluster -Name "PROD-CLUSTER01" | Select SharedVolumesRoot, QuorumResourceName, WitnessDynamicWeight, CrossSiteDelay | Format-List

Cluster Event Log Analysis

The cluster event log is the primary diagnostic tool for understanding failover events, node departures, resource failures, and quorum changes. The cluster log is stored in C:WindowsClustercluster.log on each node and is also queryable via Get-WinEvent:

# Query the System Failover Clustering event log
Get-WinEvent -LogName "Microsoft-Windows-FailoverClustering/Operational" -MaxEvents 100 | Select TimeCreated, Id, LevelDisplayName, Message | Format-Table -AutoSize -Wrap

# Get cluster diagnostic log (very detailed — useful for support)
Get-ClusterLog -Cluster "PROD-CLUSTER01" -Destination "C:ClusterLogs" -TimeSpan 60  # Last 60 minutes of logs

# Look for node eviction events (Event ID 1069 = resource failure)
Get-WinEvent -LogName "System" | Where-Object { $_.Id -in @(1069,1135,1177) } | Select TimeCreated, Id, Message | Format-Table -AutoSize -Wrap

Failover Testing

Test cluster failover regularly to ensure it works before you need it in a real failure. For controlled testing, move cluster groups between nodes manually and verify the workload continues operating:

# Move all cluster groups off a node gracefully (drains the node)
Suspend-ClusterNode -Name "node01" -Cluster "PROD-CLUSTER01" -Drain -Wait

# Verify node is Paused and all roles have moved
Get-ClusterNode -Cluster "PROD-CLUSTER01" | Select Name, State | Format-Table -AutoSize
Get-ClusterGroup -Cluster "PROD-CLUSTER01" | Select Name, OwnerNode | Format-Table -AutoSize

# Resume the node (allows roles to fail back if preferred node is set)
Resume-ClusterNode -Name "node01" -Cluster "PROD-CLUSTER01" -Failback Immediate

# Simulate a hard failure by stopping the cluster service (run on the node directly)
# WARNING: This will cause all resources on the node to fail over immediately
Stop-Service -Name ClusSvc -Force

# Force a specific resource to fail over to a specific node
Move-ClusterGroup -Name "SQL Server (MSSQLSERVER)" -Node "node02" -Cluster "PROD-CLUSTER01" -Wait

After failover testing, verify all cluster resources are online on the new owner node and that the application is serving requests normally. Review the cluster log to confirm the failover was clean (no resource failures, correct quorum maintained). Document the RTO (Recovery Time Objective) measured during testing to compare against your SLA requirements.

Windows Server 2022 Failover Clustering is a mature, reliable platform for high-availability workloads. Combining proper hardware selection, network isolation, tested quorum configuration, and regular maintenance with Cluster-Aware Updating provides a strong foundation for critical services requiring minimal downtime.

Links

Newsletter

Contact