πŸ“– ~1 min read

Table of contents
  1. Symptom & Impact
  2. Environment & Reproduction
  3. Root Cause Analysis
  4. Quick Triage
  5. Step-by-Step Diagnosis
  6. Solution – Primary Fix
  7. Solution – Alternative Approaches
  8. Verification & Acceptance Criteria
  9. Rollback Plan
  10. Prevention & Hardening
  11. Related Errors & Cross-Refs
  12. References & Further Reading

Symptom & Impact

Cluster resources stop or relocate unpredictably because fencing requests fail or time out.

Environment & Reproduction

In RHEL 8 HA clusters, node fault tests trigger unsuccessful STONITH operations.

Root Cause Analysis

Invalid agent credentials, unreachable fence device, or too-short timeout values cause incomplete fencing.

Quick Triage

Use pcs status, journalctl -u pacemaker, and fence agent debug tests to identify failing stage.

Step-by-Step Diagnosis

Validate network reachability to fence endpoints, confirm credentials, and review fencing topology configuration.

Illustrative mockup for rhel-8 β€” rhel8-b10-247-diagnosis.webp
Reviewing pacemaker and fencing agent timeout diagnostics β€” Illustrative mockup β€” Progressive Robot

Solution – Primary Fix

Update STONITH device parameters and timeout values, then retest fencing and recover cluster resources.

Still having issues? Our IT Solutions & Services team can diagnose and resolve this for you. Get in touch for a free consultation.

Illustrative mockup for rhel-8 β€” rhel8-b10-247-fix.webp
Correcting STONITH parameters and restoring healthy cluster quorum β€” Illustrative mockup β€” Progressive Robot

Solution – Alternative Approaches

Introduce redundant fencing paths or alternate agents supported by the hardware platform.

Verification & Acceptance Criteria

Manual fence tests succeed, quorum remains stable, and failover scenarios complete predictably.

Rollback Plan

Revert to previous pcs configuration backup and restore prior validated fencing definitions.

Prevention & Hardening

Schedule regular fence validation drills and monitor pacemaker event logs for early drift indicators.

Commonly connected to DNS resolution issues, management network ACL changes, and certificate expiry.

Related tutorial: View the step-by-step tutorial for rhel-8.

View all rhel-8 tutorials on the Tutorials Hub β†’

Browse all common problems & solutions on the Tutorials Hub.

References & Further Reading

Consult Red Hat High Availability and Pacemaker fencing best-practice documentation for RHEL 8.

Need Expert Help?

If you cannot resolve this yourself, our team offers hands-on Server Management, Managed IT Services, and flexible Support Plans. Contact us today β€” we respond within one business day.