{"id":450,"date":"2017-08-04T13:21:41","date_gmt":"2017-08-04T03:21:41","guid":{"rendered":"https:\/\/icicimov.com\/blog\/?p=450"},"modified":"2017-08-05T00:24:02","modified_gmt":"2017-08-04T14:24:02","slug":"pacemaker-vm-cluster-fencing-in-proxmox-with-fence_pve","status":"publish","type":"post","link":"https:\/\/icicimov.com\/blog\/?p=450","title":{"rendered":"Pacemaker VM cluster fencing in Proxmox with fence_pve"},"content":{"rendered":"<p>We can use the <code>fence_pve<\/code> agent to fence\/stonith peers in Pacemaker cluster running on VM&#8217;s in Proxmox PVE host(s). This works and has been tested on <code>Ubuntu-14.04<\/code> with <code>Pacemaker-1.1.12<\/code> from Hastexo PPA repository. Use:<\/p>\n<pre><code>$ sudo add-apt-repository ppa:hastexo\/ha\n<\/code><\/pre>\n<p>to add it and then run:<\/p>\n<pre><code>$ sudo aptitude update\n$ sudo aptitude install pacemaker=1.1.12-0ubuntu2 libcib4 libcrmcluster4 \\\n  libcrmcommon3 libcrmservice1 liblrmd1 libpe-rules2 libpe-status4 \\\n  libpengine4 libstonithd2 libtransitioner2 pacemaker-cli-utils\n<\/code><\/pre>\n<p>to install the needed packages. Accept the solution to remove <code>libcib3<\/code> during the process.<\/p>\n<p>OS and Pacemaker details in the VM&#8217;s:<\/p>\n<pre><code>root@sl01:~# lsb_release -a\nNo LSB modules are available.\nDistributor ID: Ubuntu\nDescription:    Ubuntu 14.04.5 LTS\nRelease:    14.04\nCodename:   trusty\n\nroot@sl01:~# dpkg -l pacemaker | grep ^ii\nii  pacemaker    1.1.12-0ubuntu2   amd64      HA cluster resource manager\n<\/code><\/pre>\n<p>For the fencing agent I used the current PVE <code>fence-agents-4.0.20<\/code> repository from github:<\/p>\n<pre><code>$ wget https:\/\/github.com\/proxmox\/fence-agents-pve\/raw\/master\/fence-agents-4.0.20.tar.gz\n$ .\/autogen.sh \n$ sudo pip install suds\n$ .\/configure \n$ make\n$ sudo make install\n<\/code><\/pre>\n<p>The agents are installed under <code>\/usr\/sbin\/<\/code>. To get the resource metadata (the input parameters supported) run:<\/p>\n<pre><code>$ \/usr\/sbin\/fence_pve -o metadata\n<\/code><\/pre>\n<p>Run a manual test to confirm the fence_pve agent is working, here I&#8217;ll check the status of the two VM&#8217;s that Pacemaker cluster is runing on:<\/p>\n<pre><code>$ \/usr\/sbin\/fence_pve --ip=192.168.0.100 --nodename=virtual --username=root@pam --password=&lt;password&gt; --plug=126 --action=status\nStatus: ON\n$ \/usr\/sbin\/fence_pve --ip=192.168.0.100 --nodename=virtual --username=root@pam --password=&lt;password&gt; --plug=149 --action=status\nStatus: ON\n<\/code><\/pre>\n<p>Now we need to set the stonith agent for Pacemaker:<\/p>\n<pre><code>$ sudo mkdir -p \/usr\/lib\/stonith\/plugins\/pve\n$ sudo ln -s \/usr\/sbin\/fence_pve \/usr\/lib\/stonith\/plugins\/pve\/fence_pve\n<\/code><\/pre>\n<p>and configure the primitives (on one of the nodes):<\/p>\n<pre><code>primitive p_fence_sl01 stonith:fence_pve \\\n    params ipaddr=\"192.168.0.100\" inet4_only=\"true\" node_name=\"virtual\" \\\n           login=\"root@pam\" passwd=\"&lt;password&gt;\" port=\"126\" delay=\"15\" action=\"reboot\" \\\n    op monitor interval=\"60s\" \\\n    meta target-role=\"Started\" is-managed=\"true\"\nprimitive p_fence_sl02 stonith:fence_pve \\\n    params ipaddr=\"192.168.0.100\" inet4_only=\"true\" node_name=\"virtual\" \\\n           login=\"root@pam\" passwd=\"&lt;password&gt;\" port=\"149\" action=\"reboot\" \\\n    op monitor interval=\"60s\" \\\n    meta target-role=\"Started\" is-managed=\"true\"\nlocation l_fence_sl01 p_fence_sl01 -inf: sl01\nlocation l_fence_sl02 p_fence_sl02 -inf: sl02\n<\/code><\/pre>\n<p>Now if we check the cluster status:<\/p>\n<pre><code>root@sl01:~# crm status\nLast updated: Fri Aug  4 13:57:23 2017\nLast change: Fri Aug  4 13:56:52 2017 via crmd on sl01\nStack: corosync\nCurrent DC: sl02 (2) - partition with quorum\nVersion: 1.1.12-561c4cf\n2 Nodes configured\n10 Resources configured\n\nOnline: [ sl01 sl02 ]\n\n Master\/Slave Set: ms_drbd [p_drbd_r0]\n     Masters: [ sl01 sl02 ]\n Clone Set: cl_dlm [p_controld]\n     Started: [ sl01 sl02 ]\n Clone Set: cl_fs_gfs2 [p_fs_gfs2]\n     Started: [ sl01 sl02 ]\n p_fence_sl01   (stonith:fence_pve):    Started sl02 \n p_fence_sl02   (stonith:fence_pve):    Started sl01 \n Clone Set: cl_clvmd [p_clvmd]\n     Started: [ sl01 sl02 ]\n<\/code><\/pre>\n<p>we can see the fencing devices up and ready.<\/p>\n<h3>Testing<\/h3>\n<p>I will shutdown corosync on node <code>sl01<\/code> simulating failure and monitor the status of the VM&#8217;s and the cluster logs on node <code>sl02<\/code>:<\/p>\n<pre><code>root@sl02:~# while true; do echo -n \"126: \" &amp;&amp; \/usr\/sbin\/fence_pve --ip=192.168.0.100 --nodename=virtual --username=root@pam --password=&lt;password&gt; --plug=126 --action=status; echo -n \"149: \" &amp;&amp; \/usr\/sbin\/fence_pve --ip=192.168.0.100 --nodename=virtual --username=root@pam --password=&lt;\/password&gt;&lt;password&gt; --plug=149 --action=status; sleep 1; done\n126: Status: ON\n149: Status: ON\n...\n126: Status: ON\n149: Status: ON\n126: Status: OFF\n149: Status: ON\n126: Status: ON\n^C\nroot@sl02:~#\n<\/code><\/pre>\n<p>We can see the <code>sl01<\/code> VM being restarted and in the logs:<\/p>\n<pre><code>Aug  4 14:22:04 sl02 corosync[1173]:   [MAIN  ] Completed service synchronization, ready to provide service.\nAug  4 14:22:04 sl02 dlm_controld[22329]: 82908 fence request 1 pid 6631 nodedown time 1501820524 fence_all dlm_stonith\nAug  4 14:22:04 sl02 kernel: [82908.857103] dlm: closing connection to node 1\n...\nAug  4 14:22:05 sl02 pengine[1230]:  warning: process_pe_message: Calculated Transition 102: \/var\/lib\/pacemaker\/pengine\/pe-warn-0.bz2\nAug  4 14:22:05 sl02 crmd[1232]:   notice: te_fence_node: Executing reboot fencing operation (64) on sl01 (timeout=60000)\nAug  4 14:22:05 sl02 crmd[1232]:   notice: te_rsc_command: Initiating action 78: notify p_drbd_r0_pre_notify_demote_0 on sl02 (local)\nAug  4 14:22:05 sl02 stonithd[1227]:   notice: handle_request: Client crmd.1232.44518730 wants to fence (reboot) 'sl01' with device '(any)'\nAug  4 14:22:05 sl02 stonithd[1227]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for sl01: 9b1fe415-c935-4acf-bb43-6ffd9183e5f8 (0)\nAug  4 14:22:05 sl02 crmd[1232]:   notice: process_lrm_event: Operation p_drbd_r0_notify_0: ok (node=sl02, call=103, rc=0, cib-update=0, confirmed=true)\nAug  4 14:22:06 sl02 stonithd[1227]:   notice: can_fence_host_with_device: p_fence_sl01 can fence (reboot) sl01: dynamic-list\n...\nAug  4 14:22:28 sl02 stonithd[1227]:   notice: log_operation: Operation 'reboot' [6684] (call 2 from crmd.1232) for host 'sl01' with device 'p_fence_sl01' returned: 0 (OK)\nAug  4 14:22:28 sl02 stonithd[1227]:  warning: get_xpath_object: No match for \/\/@st_delegate in \/st-reply\nAug  4 14:22:28 sl02 stonithd[1227]:   notice: remote_op_done: Operation reboot of sl01 by sl02 for crmd.1232@sl02.9b1fe415: OK\nAug  4 14:22:28 sl02 crmd[1232]:   notice: tengine_stonith_callback: Stonith operation 2\/64:102:0:7d571539-fab2-43fe-8574-ebfb48664083: OK (0)\nAug  4 14:22:28 sl02 crmd[1232]:   notice: tengine_stonith_notify: Peer sl01 was terminated (reboot) by sl02 for sl02: OK (ref=9b1fe415-c935-4acf-bb43-6ffd9183e5f8) by client crmd.1232\n...\nAug  4 14:22:55 sl02 crm-fence-peer.sh[6913]: INFO peer is fenced, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-r0-ms_drbd'\nAug  4 14:22:55 sl02 kernel: [82959.650435] drbd r0: helper command: \/sbin\/drbdadm fence-peer r0 exit code 7 (0x700)\nAug  4 14:22:55 sl02 kernel: [82959.650453] drbd r0: fence-peer() = 7 &amp;&amp; fencing != Stonith !!!\nAug  4 14:22:55 sl02 kernel: [82959.650549] drbd r0: fence-peer helper returned 7 (peer was stonithed)\n...\n<\/code><\/pre>\n<p>we can see STONITH in operation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We can use the fence_pve agent to fence\/stonith peers in Pacemaker cluster running on VM&#8217;s in Proxmox PVE host(s). This works and has been tested on Ubuntu-14.04 with Pacemaker-1.1.12 from Hastexo PPA repository. Use: $ sudo add-apt-repository ppa:hastexo\/ha to add&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[26,25,20,23],"class_list":["post-450","post","type-post","status-publish","format-standard","hentry","category-virtualization","tag-cluster","tag-high-availability","tag-pacemaker","tag-proxmox"],"_links":{"self":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/450","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=450"}],"version-history":[{"count":1,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/450\/revisions"}],"predecessor-version":[{"id":451,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/450\/revisions\/451"}],"wp:attachment":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}