📖 ~1 min read
Table of contents
Symptom & Impact
`crm status` shows both nodes as DC; resources flap and stonith fires repeatedly.
Environment & Reproduction
Triggered on SLES 16 HAE clusters with only one corosync ring and a noisy switch.
Root Cause Analysis
Loss of corosync heartbeats causes each node to claim quorum independently.
Quick Triage
Run `corosync-cfgtool -s` and `crm status` on each node to see token state.
Step-by-Step Diagnosis
Tail `/var/log/cluster/corosync.log` and `/var/log/pacemaker.log` during the event.

Solution – Primary Fix
Add a redundant corosync ring or qdevice and run `crm cluster restart`.
Still having issues? Our IT Solutions & Services team can diagnose and resolve this for you. Get in touch for a free consultation.

Solution – Alternative Approaches
Use a hardware watchdog STONITH agent to fence cleanly until heartbeats stabilise.
Verification & Acceptance Criteria
`crm_mon -1` shows a single DC and all resources `Started` on the expected node.
Rollback Plan
If recovery is slow, put the cluster in maintenance mode and restart corosync per node.
Prevention & Hardening
Run periodic ring health checks via `corosync-cfgtool -R` and Prometheus alerts.
Related Errors & Cross-Refs
Linked to `sbd` fencing alarms when watchdog timeouts trip.
Related tutorial: View the step-by-step tutorial for sles-16.
View all sles-16 tutorials on the Tutorials Hub →
Browse all common problems & solutions on the Tutorials Hub.
References & Further Reading
SUSE Linux Enterprise High Availability Extension administration guide.
Need Expert Help?
If you cannot resolve this yourself, our team offers hands-on Server Management, Managed IT Services, and flexible Support Plans. Contact us today — we respond within one business day.