Corosync + Pacemaker - For high availability clusters

mail

How to switch the VIP to another host ?

  1. the node acting as the master will have to let go the VIP : identify the master node
  2. as root, on the master node, release the VIP by making the node (temporarily) unavailable :
    crm node standby
  3. make sure the VIP actually switched to another host :
    crm status
    see result
  4. on the same node (now slave), put it back online :
    crm node online
  5. check master / slave switched with :
    crm status
mail

How to toggle a node standby / online ?

As root, on the node you want to alter (tested from nodeA) :

standby :

  1. Before, crm status reports :
    
    Online: [ nodeA nodeB ]				both nodes are online
    
     Resource Group: RESOURCES-SQL
    	VIP_SQL		(ocf::heartbeat:IPaddr2):	Started nodeA
     Master/Slave Set: MasterMaster_MYSQL [MYSQL]
    	Masters: [ nodeA ]				nodeA is the master
    	Slaves: [ nodeB ]				nodeB is the slave
    • alter the current node : crm node standby
    • alter any node : crm node standby nodeName
  2. after a few seconds, check :
    crm status
    
    Node nodeA: standby					nodeA is offline
    Online: [ nodeB ]					nodeB is the only one online
    
     Resource Group: RESOURCES-SQL
    	VIP_SQL		(ocf::heartbeat:IPaddr2):	Started nodeB
     Master/Slave Set: MasterMaster_MYSQL [MYSQL]
    	Masters: [ nodeB ]				nodeB has become the master
    	Stopped: [ nodeA ]

back online :

  1. "Before" status : see above
  2. crm node online
  3. after a few seconds, check :
    crm status
    
    Online: [ nodeA nodeB ]				both nodes are back online
    
     Resource Group: RESOURCES-SQL
    	VIP_SQL		(ocf::heartbeat:IPaddr2):	Started nodeB
     Master/Slave Set: MasterMaster_MYSQL [MYSQL]
    	Masters: [ nodeB ]				nodeB stays the master
    	Slaves: [ nodeA ]				nodeA is now the slave
mail

How to identify the master / slave nodes ?

As root on both nodes :
On nodeA :

crm status

============
Last updated: Tue Feb 14 10:27:56 2017
Last change: Tue Feb 14 10:05:49 2017 via crm_attribute on nodeB
Stack: openais
Current DC: nodeB - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ nodeA nodeB ]		both nodes are online

 Resource Group: RESOURCES-SQL
	VIP_SQL		(ocf::heartbeat:IPaddr2):	Started nodeA
 Master/Slave Set: MasterMaster_MYSQL [MYSQL]
	Masters: [ nodeA ]
	Slaves: [ nodeB ]		Here's the answer !
On nodeB :

crm status

============
Last updated: Tue Feb 14 10:27:58 2017
Last change: Tue Feb 14 10:05:49 2017 via crm_attribute on nodeB
Stack: openais
Current DC: nodeB - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ nodeA nodeB ]

 Resource Group: RESOURCES-SQL
	VIP_SQL		(ocf::heartbeat:IPaddr2):	Started nodeA
 Master/Slave Set: MasterMaster_MYSQL [MYSQL]
	Masters: [ nodeA ]
	Slaves: [ nodeB ]

crm_mon -1 does the same, but is harder to remember

mail

crm status reports Failed actions

Situation

crm status returns :

Failed actions:
	MYSQL:1_monitor_31000 (node=nodeA, call=59, rc=1, status=complete): unknown error

Details

(no idea )

Solution

"Fixing" this was as simple as :
  1. crm_resource -P
    Waiting for 1 replies from the CRMd. OK
  2. crm status
    The Failed actions message is gone.
mail

Corosync + Pacemaker

In essence, Corosync enables servers to communicate as a cluster, while Pacemaker provides the ability to control how the cluster behaves.

Corosync (source) :

Corosync APIs provide :
  • membership : a list of peers
  • messaging : the ability to talk to processes on those peers
  • quorum : do we have a majority ?
capabilities to projects such as Apache Qpid and Pacemaker.

Pacemaker :

to do