How to Set Up Pacemaker and Corosync for High Availability on RHEL 8

December 26, 2025
Linux
Comment off

Pacemaker and Corosync together form the de facto high-availability (HA) cluster stack for Linux. Corosync provides the cluster messaging and quorum layer, while Pacemaker manages resources — ensuring that services like virtual IP addresses and web servers start, stop, and migrate between nodes according to defined constraints. On RHEL 8, both packages are available through the High Availability Add-on or EPEL. This tutorial walks through installing and configuring a two-node active/passive HA cluster, creating a VirtualIP resource, a managed httpd resource, ordering constraints, and configuring STONITH fencing to prevent split-brain scenarios.

Prerequisites

Two RHEL 8 nodes (node1 at 192.168.1.11, node2 at 192.168.1.12) with hostname resolution via /etc/hosts or DNS
Root or sudo access on both nodes
EPEL 8 enabled (dnf install -y epel-release) or RHEL HA Add-on subscription
A cluster virtual IP address to allocate (example: 192.168.1.100)
SSH key-based authentication between nodes (Pacemaker/PCS requires passwordless root SSH for some operations)
firewalld running and managing the active zone on both nodes

Step 1 — Install Pacemaker, PCS, and Fence Agents

Install the cluster stack packages on both nodes. The pcs tool is the high-level management interface that wraps Pacemaker and Corosync configuration. fence-agents-all provides STONITH fencing agents for a wide range of hardware and virtual platforms.

# Run on BOTH node1 and node2
dnf install -y pacemaker pcs fence-agents-all

# Enable and start the PCS daemon (required for cluster auth and management)
systemctl enable --now pcsd

# Set the password for the hacluster user (same password on both nodes)
echo "ClusterPass1!" | passwd --stdin hacluster

# Open the required firewall ports on both nodes
firewall-cmd --permanent --add-service=high-availability
firewall-cmd --reload

# Verify pcsd is running
systemctl status pcsd

Step 2 — Authenticate Nodes and Set Up the Cluster

Cluster setup is performed from node1 only. The pcs cluster auth command authenticates against the pcsd daemon on each node using the hacluster credentials. After authentication, pcs cluster setup generates and distributes the Corosync configuration.

# Run on node1 only — authenticate both nodes
pcs host auth node1 node2 -u hacluster -p ClusterPass1!

# Create the cluster (generates corosync.conf on both nodes)
pcs cluster setup mycluster node1 node2

# Start the cluster on all nodes simultaneously
pcs cluster start --all

# Enable cluster services to start on boot
pcs cluster enable --all

# Verify cluster status (may take 15-30 seconds to fully form)
pcs status

Step 3 — Configure STONITH Fencing

STONITH (Shoot The Other Node In The Head) fencing prevents split-brain by forcibly powering off or rebooting a failed node before its resources are recovered elsewhere. Without a working fence device, Pacemaker will refuse to promote resources. For virtual machines, the fence_virt or fence_vmware_soap agents are common; for bare metal, use your server’s IPMI/iDRAC/iLO agent.

# Example using fence_ipmilan (substitute your BMC IPs and credentials)
# Run on node1

# Create STONITH resource for node1 (fences node1 via its BMC)
pcs stonith create fence_node1 fence_ipmilan 
    pcmk_host_list="node1" 
    ipaddr="192.168.1.21" 
    login="admin" 
    passwd="ipmipw" 
    lanplus=1 
    op monitor interval=60s

# Create STONITH resource for node2
pcs stonith create fence_node2 fence_ipmilan 
    pcmk_host_list="node2" 
    ipaddr="192.168.1.22" 
    login="admin" 
    passwd="ipmipw" 
    lanplus=1 
    op monitor interval=60s

# Verify fence devices are registered
pcs stonith status

Step 4 — Create a Virtual IP Resource

A Virtual IP (VIP) resource is a floating IP address that follows the active cluster node. The ocf:heartbeat:IPaddr2 resource agent handles bringing the IP up with ip addr add, sending a gratuitous ARP to update network switches, and removing it cleanly during failover.

# Create the VirtualIP resource
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 
    ip=192.168.1.100 
    cidr_netmask=24 
    op monitor interval=30s

# Check resource status — it should start on one node
pcs status resources

# Pin the VIP to node1 initially using a location preference
pcs constraint location VirtualIP prefers node1=50

# View all configured constraints
pcs constraint show

Step 5 — Create a Web Server Resource and Add Ordering

Add an Apache HTTP server resource managed by Pacemaker. The systemd:httpd resource agent wraps the httpd systemd unit. An ordering constraint ensures the VirtualIP comes online before httpd starts, and a colocation constraint keeps both resources on the same node.

# Ensure httpd is installed but NOT enabled in systemd directly
dnf install -y httpd
systemctl disable httpd   # Pacemaker controls it

# Create the WebServer resource
pcs resource create WebServer systemd:httpd 
    op monitor interval=30s

# Colocation: WebServer must run on the same node as VirtualIP
pcs constraint colocation add WebServer with VirtualIP INFINITY

# Ordering: VirtualIP must start before WebServer
pcs constraint order VirtualIP then WebServer

# Verify the cluster is healthy and both resources are running
pcs status

# Test failover — standby node1, watch WebServer move to node2
pcs node standby node1
pcs status
pcs node unstandby node1

Step 6 — Monitor Cluster Health

Use pcs status as your primary health dashboard. It reports node membership, resource states, and recent failures. The crm_mon tool provides a live updating view, and pcs resource debug-start helps diagnose resource agent failures.

# Full cluster overview
pcs status

# Live updating monitor (press Ctrl+C to exit)
crm_mon -1

# View cluster log for recent events
journalctl -u corosync -u pacemaker --since "1 hour ago"

# Clear a failed resource count so Pacemaker retries
pcs resource cleanup VirtualIP

# Show Pacemaker's scoring for resource placement
crm_simulate -sL

# Validate the CIB (cluster configuration database)
crm_verify -L -V

Conclusion

You have built a two-node Pacemaker/Corosync HA cluster on RHEL 8, configured STONITH fencing to safely recover from node failures, created a floating VirtualIP resource using ocf:heartbeat:IPaddr2, added a Pacemaker-managed httpd instance with colocation and ordering constraints, and practiced node standby failover testing. The cluster persists across reboots and automatically moves resources to the surviving node within seconds of a failure. Proper fencing configuration is the most critical safety element — never run a production cluster with fencing disabled.

Next steps: Configuring GFS2 Shared Storage for a Pacemaker Cluster on RHEL 8, Adding a DRBD Resource to Pacemaker for Active/Passive Storage Replication, and Setting Up a Three-Node Quorum with Corosync QDevice.