How to Set Up Pacemaker and Corosync for High Availability on RHEL 7
High availability (HA) clustering prevents individual server failures from causing service outages by automatically detecting node failures and restarting workloads on surviving nodes. On RHEL 7, the standard HA stack is built from three components: Corosync, which provides reliable cluster messaging and quorum; Pacemaker, the cluster resource manager that monitors services and orchestrates failover; and pcs (Pacemaker/Corosync Configuration System), the command-line and web-based management tool that ties both together. This tutorial walks through a two-node HA cluster capable of hosting a virtual IP address and an nginx web service, demonstrating resource creation, placement constraints, STONITH fencing, and manual failover testing.
Prerequisites
- Two RHEL 7 servers:
node1(192.168.1.101) andnode2(192.168.1.102) - Root access on both nodes
- Both nodes resolvable by hostname (add entries to
/etc/hostsif DNS is unavailable) - Firewall ports open between nodes: TCP 2224 (pcsd), UDP 5404–5405 (Corosync)
- Shared or replicated storage if the clustered service requires it (not covered here)
- A fencing device (IPMI, iLO, DRAC, or a virtual fence agent for lab use)
Step 1: Prepare Both Nodes — Hostname and /etc/hosts
Run the following on both nodes unless otherwise indicated.
# On node1
hostnamectl set-hostname node1.example.com
# On node2
hostnamectl set-hostname node2.example.com
# Add host entries on both nodes (if DNS is not available)
cat >> /etc/hosts <<'EOF'
192.168.1.101 node1 node1.example.com
192.168.1.102 node2 node2.example.com
EOF
# Verify connectivity
ping -c3 node2 # from node1
ping -c3 node1 # from node2
Step 2: Install Pacemaker, Corosync, and pcs
# Install the HA cluster packages on both nodes
yum install -y pacemaker corosync pcs fence-agents-all
# Optionally install nginx for the service resource example
yum install -y nginx
The pcs package provides the pcsd daemon, which listens on TCP port 2224 and handles authenticated communication between cluster nodes. You must enable and start it before running any pcs cluster commands.
Step 3: Enable pcsd and Set the hacluster Password
# Enable and start pcsd on both nodes
systemctl enable pcsd
systemctl start pcsd
# Set the password for the built-in 'hacluster' user (same password on both nodes)
# This account is used exclusively for cluster authentication
passwd hacluster
# Enter the same password on both nodes when prompted
# Open required firewall ports on both nodes
firewall-cmd --permanent --add-service=high-availability
firewall-cmd --reload
Step 4: Authenticate the Cluster Nodes
Run the following commands from node1 only. The pcs cluster auth command authenticates node1 with node2 using the hacluster credentials, establishing the trust required for subsequent cluster operations.
# Authenticate both nodes (run from node1)
pcs cluster auth node1 node2 -u hacluster -p YourPassword --force
# Expected output:
# node1: Authorized
# node2: Authorized
Step 5: Create and Start the Cluster
# Create the cluster named 'mycluster' with both nodes (run from node1)
pcs cluster setup --name mycluster node1 node2
# Start the cluster on all nodes simultaneously
pcs cluster start --all
# Enable the cluster to start automatically at boot on all nodes
pcs cluster enable --all
# Verify cluster status
pcs status
# Check Corosync ring status
corosync-cfgtool -s
# Show the full cluster configuration
pcs config show
The pcs status output should show both nodes as Online. At this stage there are no resources configured, so the cluster is operational but idle.
Step 6: Disable STONITH Temporarily for Initial Testing
STONITH (Shoot The Other Node In The Head) is a fencing mechanism that ensures a failed node is powered off before its resources are restarted elsewhere, preventing split-brain data corruption. Pacemaker requires fencing by default. During initial lab setup it is common to disable this requirement temporarily, but you must configure a real fencing device before using the cluster in production.
# Disable the STONITH requirement for lab testing only
pcs property set stonith-enabled=false
# Disable the no-quorum policy for a 2-node cluster
# (prevents the cluster from stopping all resources when quorum is lost)
pcs property set no-quorum-policy=ignore
# Verify property changes
pcs property list
Step 7: Add a Virtual IP Resource
A virtual IP (VIP) is the most fundamental cluster resource. It floats between nodes and provides a stable endpoint for clients regardless of which physical node is currently active.
# Create a Virtual IP resource
# ocf:heartbeat:IPaddr2 is the standard virtual IP resource agent
pcs resource create VirtualIP
ocf:heartbeat:IPaddr2
ip=192.168.1.200
cidr_netmask=24
op monitor interval=30s
# Verify the resource was created and is running
pcs status
# Show details of the resource
pcs resource show VirtualIP
Step 8: Add an nginx Service Resource
# Ensure nginx does NOT start at boot independently (Pacemaker manages it)
systemctl disable nginx
# Create an nginx service resource
pcs resource create WebService
systemd:nginx
op monitor interval=30s
# Verify both resources are running
pcs status
Step 9: Add Resource Constraints
Constraints control where resources run, in what order they start, and which resources must run together. These are the three constraint types:
Colocation Constraint — Run VIP and nginx on the same node
# WebService must run on the same node as VirtualIP
pcs constraint colocation add WebService with VirtualIP INFINITY
# List all constraints
pcs constraint list
Order Constraint — VirtualIP must start before nginx
# VirtualIP must be started before WebService
pcs constraint order VirtualIP then WebService
pcs constraint list
Location Constraint — Prefer node1 as the primary node
# Give node1 a higher preference score for the VirtualIP resource
pcs constraint location VirtualIP prefers node1=100
pcs constraint list
Step 10: Configure a STONITH Fencing Device
In a production cluster, re-enable STONITH and configure a fencing agent appropriate to your hardware. Common agents include IPMI, iLO, DRAC, and (for virtual machines) the VMware or libvirt fence agents.
# Re-enable STONITH
pcs property set stonith-enabled=true
# Example: configure an IPMI/IPMI-LAN fencing device for node2
# (Replace ipaddr, login, passwd with your BMC credentials)
pcs stonith create node2-fence fence_ipmilan
ipaddr=192.168.1.252
login=admin
passwd=secretpassword
lanplus=1
pcmk_host_list=node2
op monitor interval=60s
# For lab VMs, use the fence_virt or fence_xvm agent instead
# pcs stonith create node2-fence fence_virt pcmk_host_list=node2
# Verify the fencing resource
pcs stonith show
Step 11: Test Failover
# Confirm which node currently runs the resources
pcs status
# Move all resources to node2 manually
pcs node standby node1
# Verify resources have moved to node2
pcs status
ping 192.168.1.200 # VIP should still respond
# Bring node1 back into the cluster
pcs node unstandby node1
# Verify resources can move back or stay on node2
pcs status
# Force a resource back to node1 if desired
pcs resource move VirtualIP node1
# Remove the temporary location constraint created by the move
pcs resource clear VirtualIP
Conclusion
You have built a two-node Pacemaker/Corosync high availability cluster on RHEL 7, authenticated the nodes using pcs, added a floating virtual IP and a managed nginx service, applied colocation and ordering constraints to ensure the service and its VIP always run together in the correct start sequence, configured a STONITH fencing device to protect against split-brain scenarios, and verified failover behavior by placing a node into standby. This foundation is directly extensible to more complex scenarios: multi-node clusters, DRBD-backed replicated storage, clustered NFS, Oracle RAC, PostgreSQL with Pacemaker resource agents, and active-active topologies. Always test failover regularly in production — a cluster that has never been exercised is a cluster whose failover behavior is unknown.