{"id":400,"date":"2017-02-09T11:57:43","date_gmt":"2017-02-09T00:57:43","guid":{"rendered":"https:\/\/icicimov.com\/blog\/?p=400"},"modified":"2017-02-23T18:05:34","modified_gmt":"2017-02-23T07:05:34","slug":"postgresql-high-availibility-with-pacemaker","status":"publish","type":"post","link":"https:\/\/icicimov.com\/blog\/?p=400","title":{"rendered":"PostgreSQL High Availibility with Pacemaker"},"content":{"rendered":"<p>[serialposts]<\/p>\n<p>Setting up PostgreSQL synchronous or asynchronous replication cluster with Pacemaker is described in couple of resources like the official Pacemaker site <a href=\"http:\/\/wiki.clusterlabs.org\/wiki\/PgSQL_Replicated_Cluster\">PgSQL Replicated Cluster<\/a> and the GitHub wiki of the OCF agent creator <a href=\"https:\/\/github.com\/t-matsuo\/resource-agents\/wiki\/Resource-Agent-for-PostgreSQL-9.1-streaming-replication\">Resource Agent for PostgreSQL 9.1 streaming replication<\/a>. My setup is displayed in the ASCII chart below:<\/p>\n<pre><code>                        GW:192.168.0.1\/24\n                               |\n  VIP(Master):192.168.0.241\/24 | VIP(Slave):192.168.0.242\/24\n  ------------------------------------------------\n         |                              |\n         |eth0:192.168.0.134\/24         |eth0:192.168.0.135\/24\n   ------------                   ------------ \n   | oattic01 |                   | oattic02 |\n   ------------                   ------------\n      |   |eth1:10.10.1.16\/24        |     |eth1:10.10.1.17\/24\n   ---|---x--------------------------|-----x------\n   ---x------------------------------x------------\n      eth2:10.20.1.10\/24             eth2:10.20.1.18\/24\n             VIP(Replication):10.20.1.200\/24\n<\/code><\/pre>\n<p>The <code>10.10.1.0\/24<\/code> network will be used for <code>Corosync<\/code> communication ie the primary <code>ring0<\/code> channel. The <code>10.20.1.0\/24<\/code> network will be used for the PostgreSQL replication link and as a secondary standby <code>ring1<\/code> for Corosync in case of ring0 failure. The <code>192.168.0.0\/24<\/code> is the network where the PostgreSQL clustered service will be available to the clients. This network is also used to provide Internet access to the boxes. The <code>Master VIP<\/code> will be the IP on which the PGSQL service will be provided in the cluster. The <code>Slave VIP<\/code> is a read-only IP that will always be associated with the Slave so the applications can use it for read operations only (this is really optional so can be dropped from the setup if not needed). The <code>Replication VIP<\/code> is the IP on which the PGSQL replication will be running over separate link from the service one to avoid any mutual interference.<\/p>\n<p>PostgreSQL version installed is 9.3.5 in this case.<\/p>\n<h1>Streaming Replication Setup<\/h1>\n<p>Create our PostgreSQL streaming replication configuration file <code>\/etc\/postgresql\/9.3\/main\/custom.conf<\/code> on <code>oattic01<\/code> which we want to be our initial Master:<\/p>\n<pre><code># REPLICATION AND ARCHIVING #\nlisten_addresses = '*'\nlog_line_prefix = '%t [%p] %u@%d '\nwal_level = hot_standby\narchive_mode = on\narchive_command = 'test ! -f \/var\/lib\/postgresql\/9.3\/main\/pg_archive\/%f &amp;&amp; cp %p \/var\/lib\/postgresql\/9.3\/main\/pg_archive\/%f'\nmax_wal_senders = 5\nwal_keep_segments = 32\ncheckpoint_segments = 16\nhot_standby = on\nhot_standby_feedback = on\nwal_sender_timeout = 5000\nwal_receiver_status_interval = 2\nmax_standby_streaming_delay = -1\nmax_standby_archive_delay = -1\nsynchronous_commit = on\nrestart_after_crash = off\n<\/code><\/pre>\n<p>and add it at the bottom of PGSQL configuration file <code>\/etc\/postgresql\/9.3\/main\/postgresql.conf<\/code>:<\/p>\n<pre><code>include_if_exists = 'custom.conf'   # include file only if it exists\n<\/code><\/pre>\n<p>Then create a replication user:<\/p>\n<pre><code>root@oattic01:~$ sudo -u postgres psql -c \"CREATE ROLE replication WITH REPLICATION PASSWORD 'password' LOGIN;\"\n<\/code><\/pre>\n<p>and restart the service:<\/p>\n<pre><code>user@oattic02:~$ sudo service postgresql restart\n<\/code><\/pre>\n<p>Then we move to <code>oattic02<\/code>, stop the service and perform initial sync with the <code>oattic01<\/code> database:<\/p>\n<pre><code>user@oattic02:~$ sudo service postgresql stop\n\nuser@oattic02:~$ sudo -u postgres rm -rf \/var\/lib\/postgresql\/9.3\/main\/\n\nuser@oattic02:~$ sudo -u postgres pg_basebackup -h 10.20.1.10 -D \/var\/lib\/postgresql\/9.3\/main -U replication -v -P\n57253\/57253 kB (100%), 1\/1 tablespace                                         \nNOTICE:  pg_stop_backup complete, all required WAL segments have been archived\npg_basebackup: base backup completed\n<\/code><\/pre>\n<p>Then create the same streaming replication file <code>\/etc\/postgresql\/9.3\/main\/custom.conf<\/code> on <code>oattic02<\/code>.<\/p>\n<p>To test the replication we create a recovery file <code>\/var\/lib\/postgresql\/9.3\/main\/recovery.conf<\/code> with the following content:<\/p>\n<pre><code>standby_mode = 'on'\nprimary_conninfo = 'host=10.20.1.10 port=5432 user=postgres application_name=oattic02'\nrestore_command = 'cp \/var\/lib\/postgresql\/9.3\/main\/pg_archive\/%f %p'\nrecovery_target_timeline='latest'\n<\/code><\/pre>\n<p>and restart the service:<\/p>\n<pre><code>user@oattic02:~$ sudo service postgresql start\n * Starting PostgreSQL 9.3 database server           [OK ] \n<\/code><\/pre>\n<p>while monitoring the db log file in same time:<\/p>\n<pre><code>user@oattic02:~$ tail -f \/var\/log\/postgresql\/postgresql-9.3-main.log\n2017-02-08 16:50:19 AEDT [6019] [unknown]@[unknown] LOG:  incomplete startup packet\n2017-02-08 16:50:20 AEDT [6022] postgres@postgres FATAL:  the database system is starting up\ncp: cannot stat \u2018\/var\/lib\/postgresql\/9.3\/main\/pg_archive\/00000002.history\u2019: No such file or directory\n2017-02-08 16:50:20 AEDT [6018] @ LOG:  entering standby mode\ncp: cannot stat \u2018\/var\/lib\/postgresql\/9.3\/main\/pg_archive\/000000010000000000000002\u2019: No such file or directory\n2017-02-08 16:50:20 AEDT [6027] @ LOG:  started streaming WAL from primary at 0\/2000000 on timeline 1\n2017-02-08 16:50:20 AEDT [6030] postgres@postgres FATAL:  the database system is starting up\n2017-02-08 16:50:20 AEDT [6018] @ LOG:  redo starts at 0\/2000028\n2017-02-08 16:50:20 AEDT [6018] @ LOG:  consistent recovery state reached at 0\/20000F0\n2017-02-08 16:50:20 AEDT [6017] @ LOG:  database system is ready to accept read only connections\n<\/code><\/pre>\n<p>To confirm all has gone well we check the <code>xlog<\/code> location on the Master:<\/p>\n<pre><code>postgres@oattic01:~$ psql\npsql (9.3.15)\nType \"help\" for help.\n\npostgres=# \\l\n                                    List of databases\n      Name      |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   \n----------------+----------+----------+-------------+-------------+-----------------------\n postgres       | postgres | UTF8     | en_AU.UTF-8 | en_AU.UTF-8 | \n template0      | postgres | UTF8     | en_AU.UTF-8 | en_AU.UTF-8 | =c\/postgres          +\n                |          |          |             |             | postgres=CTc\/postgres\n template1      | postgres | UTF8     | en_AU.UTF-8 | en_AU.UTF-8 | =c\/postgres          +\n                |          |          |             |             | postgres=CTc\/postgres\n(3 rows)\n\npostgres=# SELECT pg_current_xlog_location();\n pg_current_xlog_location \n--------------------------\n 0\/3000208\n(1 row)\n\npostgres=# \n<\/code><\/pre>\n<p>and the xlog replay location on the current slave oattic02:<\/p>\n<pre><code>root@oattic02:~# su - postgres\npostgres@oattic02:~$ psql\npsql (9.3.15)\nType \"help\" for help.\n\npostgres=# \\l\n                                    List of databases\n      Name      |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   \n----------------+----------+----------+-------------+-------------+-----------------------\n postgres       | postgres | UTF8     | en_AU.UTF-8 | en_AU.UTF-8 | \n template0      | postgres | UTF8     | en_AU.UTF-8 | en_AU.UTF-8 | =c\/postgres          +\n                |          |          |             |             | postgres=CTc\/postgres\n template1      | postgres | UTF8     | en_AU.UTF-8 | en_AU.UTF-8 | =c\/postgres          +\n                |          |          |             |             | postgres=CTc\/postgres\n(3 rows)\n\npostgres=# select pg_last_xlog_replay_location();\n pg_last_xlog_replay_location \n------------------------------\n 0\/3000208\n(1 row)\n\npostgres=#\n<\/code><\/pre>\n<p>Now that we are confident the streaming replication is working we can move on to setting up the cluster.<\/p>\n<h1>Cluster Setup<\/h1>\n<p>Install needed clustering packages on both nodes:<\/p>\n<pre><code>$ sudo aptitude install heartbeat pacemaker corosync fence-agents openais cluster-glue resource-agents\n<\/code><\/pre>\n<p>which will result in following:<\/p>\n<pre><code>root@oattic02:~# dpkg -l | grep -E \"pacemaker|corosync|resource-agents\"\nii  corosync                             2.3.3-1ubuntu3                       amd64        Standards-based cluster framework (daemon and modules)\nii  crmsh                                1.2.5+hg1034-1ubuntu4                all          CRM shell for the pacemaker cluster manager\nii  libcorosync-common4                  2.3.3-1ubuntu3                       amd64        Standards-based cluster framework, common library\nii  pacemaker                            1.1.10+git20130802-1ubuntu2.3        amd64        HA cluster resource manager\nii  pacemaker-cli-utils                  1.1.10+git20130802-1ubuntu2.3        amd64        Command line interface utilities for Pacemaker\nii  resource-agents                      1:3.9.3+git20121009-3ubuntu2         amd64        Cluster Resource Agents\n<\/code><\/pre>\n<p>Then create the Corosync config file <code>\/etc\/corosync\/corosync.conf<\/code> on <code>oattic01<\/code>:<\/p>\n<pre><code>totem {\n    version: 2\n\n    # How long before declaring a token lost (ms)\n    token: 3000\n\n    # How many token retransmits before forming a new configuration\n    token_retransmits_before_loss_const: 10\n\n    # How long to wait for join messages in the membership protocol (ms)\n    join: 60\n\n    # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)\n    consensus: 3600\n\n    # Turn off the virtual synchrony filter\n    vsftype: none\n\n    # Number of messages that may be sent by one processor on receipt of the token\n    max_messages: 20\n\n    # Stagger sending the node join messages by 1..send_join ms\n    send_join: 45\n\n    # Limit generated nodeids to 31-bits (positive signed integers)\n    clear_node_high_bit: yes\n\n    # Disable encryption\n    secauth: off\n\n    # How many threads to use for encryption\/decryption\n    threads: 0\n\n    # Optionally assign a fixed node id (integer)\n    # nodeid: 1234\n\n    # CLuster name, needed for DLM or DLM wouldn't start\n    cluster_name: openattic\n\n    # This specifies the mode of redundant ring, which may be none, active, or passive.\n    rrp_mode: passive\n\n    interface {\n        ringnumber: 0\n        bindnetaddr: 10.10.1.16\n        mcastaddr: 226.94.1.1\n        mcastport: 5405\n    }\n    interface {\n        ringnumber: 1\n        bindnetaddr: 10.20.1.10\n        mcastaddr: 226.94.41.1\n        mcastport: 5407\n    }\n    transport: udpu\n}\n\nnodelist {\n    node {\n        name: oattic01\n        nodeid: 1\n        quorum_votes: 1\n        ring0_addr: 10.10.1.16\n        ring1_addr: 10.20.1.10\n\n    }\n    node {\n        name: oattic02\n        nodeid: 2\n        quorum_votes: 1\n        ring0_addr: 10.10.1.17\n        ring1_addr: 10.20.1.18\n    }\n}\n\nquorum {\n    provider: corosync_votequorum\n    expected_votes: 2\n    two_node: 1\n    wait_for_all: 1\n}\n\namf {\n    mode: disabled\n}\n\nservice {\n    # Load the Pacemaker Cluster Resource Manager\n    # if 0: start pacemaker\n    # if 1: don't start pacemaker\n    ver:       1\n    name:      pacemaker\n}\n\naisexec {\n        user:   root\n        group:  root\n}\n\nlogging {\n        fileline: off\n        to_stderr: yes\n        to_logfile: no\n        to_syslog: yes\n        syslog_facility: daemon\n        debug: off\n        timestamp: on\n        logger_subsys {\n                subsys: subsys: QUORUM \n                debug: off\n                tags: enter|leave|trace1|trace2|trace3|trace4|trace6\n        }\n}\n<\/code><\/pre>\n<p>The file on <code>oattic02<\/code> is basically same we just need to replace the values of the IP&#8217;s for the <code>bindnetaddr<\/code> and the rings addresses with appropriate values for that node. After restarting the service on both nodes:<\/p>\n<pre><code>root@[ALL]:~# service corosync restart \n<\/code><\/pre>\n<p>we can see both rings as functional:<\/p>\n<pre><code>root@oattic01:~# corosync-cfgtool -s\nPrinting ring status.\nLocal node ID 1\nRING ID 0\n    id  = 10.10.1.16\n    status  = ring 0 active with no faults\nRING ID 1\n    id  = 10.20.1.10\n    status  = ring 1 active with no faults\n\nroot@oattic02:~# corosync-cfgtool -s\nPrinting ring status.\nLocal node ID 2\nRING ID 0\n    id  = 10.10.1.17\n    status  = ring 0 active with no faults\nRING ID 1\n    id  = 10.20.1.18\n    status  = ring 1 active with no faults\n<\/code><\/pre>\n<p>and the quorum between the nodes established:<\/p>\n<pre><code>root@oattic01:~# corosync-quorumtool -l\n\nMembership information\n----------------------\n    Nodeid      Votes Name\n         1          1 10.10.1.16 (local)\n         2          1 10.10.1.17\n<\/code><\/pre>\n<p>Add needed permission for remote access to postgresql for the <code>replication<\/code> and <code>openatticpgsql<\/code> users (this cluster will be used for OpenATTIC database in this case):<\/p>\n<pre><code>root@[ALL]:~# vi \/etc\/postgresql\/9.3\/main\/pg_hba.conf\n[...]\nhost    replication     all             10.20.1.0\/24    trust\nhost    openatticpgsql  openatticpgsql  10.20.1.0\/24    md5\n<\/code><\/pre>\n<p>If not done already, ie the previous testing of replication was skipped, perform initial sync of the initial slave as described above:<\/p>\n<pre><code>user@oattic02:~$ sudo service postgresql stop\nuser@oattic02:~$ sudo -u postgres rm -rf \/var\/lib\/postgresql\/9.3\/main\/\nuser@oattic02:~$ sudo -u postgres pg_basebackup -h 10.20.1.10 -D \/var\/lib\/postgresql\/9.3\/main -U replication -v -P\n<\/code><\/pre>\n<p>Stop postgresql on both nodes and replace <code>auto<\/code> with <code>disabled<\/code> in <code>\/etc\/postgresql\/9.3\/main\/start.conf<\/code> since it will only be managed by Pacemaker and we want to prevent it from starting out of Pacemaker&#8217;s control.<\/p>\n<p>First download the OCF resource agent from <a href=\"https:\/\/raw.githubusercontent.com\/ClusterLabs\/resource-agents\/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e\/heartbeat\/pgsql\">here<\/a>, and<br \/>\nreplace the default one which is buggy:<\/p>\n<pre><code>root@[ALL]:~# mv \/usr\/lib\/ocf\/resource.d\/heartbeat\/pgsql \/usr\/lib\/ocf\/resource.d\/heartbeat\/pgsql.default\nroot@[ALL]:~# wget -O \/usr\/lib\/ocf\/resource.d\/heartbeat\/pgsql https:\/\/raw.githubusercontent.com\/ClusterLabs\/resource-agents\/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e\/heartbeat\/pgsql\nroot@[ALL]:~# chmod +x \/usr\/lib\/ocf\/resource.d\/heartbeat\/pgsql\n<\/code><\/pre>\n<p>Then on one of the servers create a <code>CIB<\/code> config file:<\/p>\n<pre><code>root@oattic01:~# vi cib.txt\nproperty \\\n    no-quorum-policy=\"ignore\" \\\n    stonith-enabled=\"false\" \\\n    crmd-transition-delay=\"0s\"\nrsc_defaults \\\n    resource-stickiness=\"INFINITY\" \\\n    migration-threshold=\"1\"\nprimitive vip-master ocf:heartbeat:IPaddr2 \\\n    params ip=\"192.168.0.241\" nic=\"eth0\" cidr_netmask=\"24\" \\\n    op start   timeout=\"60s\" interval=\"0s\"  on-fail=\"stop\" \\\n    op monitor timeout=\"60s\" interval=\"10s\" on-fail=\"restart\" \\\n    op stop    timeout=\"60s\" interval=\"0s\"  on-fail=\"block\"\nprimitive vip-rep ocf:heartbeat:IPaddr2 \\\n    params ip=\"10.20.1.200\" nic=\"eth2\" cidr_netmask=\"24\" \\\n    meta migration-threshold=\"0\" \\\n    op start   timeout=\"60s\" interval=\"0s\"  on-fail=\"restart\" \\\n    op monitor timeout=\"60s\" interval=\"10s\" on-fail=\"restart\" \\\n    op stop    timeout=\"60s\" interval=\"0s\"  on-fail=\"block\"\nprimitive vip-slave ocf:heartbeat:IPaddr2 \\\n    params ip=\"192.168.0.242\" nic=\"eth0\" cidr_netmask=\"24\" \\\n    meta resource-stickiness=\"1\" \\\n    op start   timeout=\"60s\" interval=\"0s\"  on-fail=\"restart\" \\\n    op monitor timeout=\"60s\" interval=\"10s\" on-fail=\"restart\" \\\n    op stop    timeout=\"60s\" interval=\"0s\"  on-fail=\"block\"\nprimitive pgsql ocf:heartbeat:pgsql \\\n   params \\\n        pgctl=\"\/usr\/lib\/postgresql\/9.3\/bin\/pg_ctl\" \\\n        psql=\"\/usr\/lib\/postgresql\/9.3\/bin\/psql\" \\\n        pgdata=\"\/var\/lib\/postgresql\/9.3\/main\/\" \\\n        start_opt=\"-p 5432\" \\\n        config=\"\/etc\/postgresql\/9.3\/main\/postgresql.conf\" \\\n        logfile=\"\/var\/log\/postgresql\/postgresql-9.3-main.log\" \\\n        rep_mode=\"sync\" \\\n        node_list=\"oattic01 oattic02\" \\\n        restore_command=\"test -f \/var\/lib\/postgresql\/9.3\/main\/pg_archive\/%f &amp;&amp; cp \/var\/lib\/postgresql\/9.3\/main\/pg_archive\/%f %p\" \\\n        primary_conninfo_opt=\"keepalives_idle=60 keepalives_interval=5 keepalives_count=5\" \\\n        master_ip=\"10.20.1.200\" \\\n        restart_on_promote=\"true\" \\\n        stop_escalate=\"0\" \\\n    op start   interval=\"0s\" timeout=\"60s\" on-fail=\"restart\" \\\n    op monitor interval=\"4s\" timeout=\"60s\" on-fail=\"restart\" \\\n    op monitor interval=\"3s\" timeout=\"60s\" on-fail=\"restart\" role=\"Master\" \\\n    op promote interval=\"0s\" timeout=\"60s\" on-fail=\"restart\" \\\n    op demote  interval=\"0s\" timeout=\"60s\" on-fail=\"stop\" \\\n    op stop    interval=\"0s\" timeout=\"60s\" on-fail=\"block\" \\\n    op notify  interval=\"0s\" timeout=\"60s\"\nms msPostgresql pgsql \\\n    meta master-max=\"1\" master-node-max=\"1\" clone-max=\"2\" clone-node-max=\"1\" notify=\"true\" interleave=\"true\" target-role=\"Started\"\nprimitive pingCheck ocf:pacemaker:ping \\\n    params name=\"default_ping_set\" host_list=\"192.168.0.1\" multiplier=\"100\" \\\n    op start   timeout=\"60s\" \\\n    op monitor timeout=\"60s\" interval=\"10s\" \\\n    op stop    timeout=\"60s\" \\\n    op reload  timeout=\"100s\"\nclone clnPingCheck pingCheck\ngroup master-group vip-master vip-rep \\\n      meta ordered=\"false\"\nlocation rsc_location-1 vip-slave \\\n    rule  200: pgsql-status eq \"HS:sync\" \\\n    rule  100: pgsql-status eq \"PRI\" \\\n    rule  -inf: not_defined pgsql-status \\\n    rule  -inf: pgsql-status ne \"HS:sync\" and pgsql-status ne \"PRI\"\nlocation rsc_location-2 msPostgresql \\\n    rule -inf: not_defined default_ping_set or default_ping_set lt 100\ncolocation rsc_colocation-1 inf: msPostgresql  clnPingCheck\ncolocation rsc_colocation-2 inf: master-group  msPostgresql:Master\ncolocation rsc_colocation-3 inf: vip-slave     msPostgresql:Slave\norder rsc_order-1 0: clnPingCheck          msPostgresql\norder rsc_order-2 0: msPostgresql:promote  master-group:start   sequential=true symmetrical=false\norder rsc_order-3 0: msPostgresql:demote   master-group:stop    sequential=true symmetrical=false\n<\/code><\/pre>\n<p>Make sure there aren&#8217;t any left over recovery files from previous testing:<\/p>\n<pre><code>root@oattic02:~# rm -f \/var\/lib\/postgresql\/9.3\/main\/recovery.conf\n<\/code><\/pre>\n<p>Create the following dir on both servers needed by the OCF agent:<\/p>\n<pre><code>root@[ALL]:~# mkdir -p \/var\/lib\/pgsql\/tmp\/\nroot@[ALL]:~# chown -R postgres\\: \/var\/lib\/pgsql\n<\/code><\/pre>\n<p>and then load the new configuration:<\/p>\n<pre><code>root@oattic01:~# crm configure load update cib.txt\n<\/code><\/pre>\n<p>Restart pacemaker on both nodes and then check the cluster status:<\/p>\n<pre><code>root@oattic01:~# crm status\nLast updated: Thu Feb  9 12:29:24 2017\nLast change: Thu Feb  9 12:28:37 2017 via crm_attribute on oattic01\nStack: corosync\nCurrent DC: oattic01 (1) - partition with quorum\nVersion: 1.1.10-42f2063\n2 Nodes configured\n7 Resources configured\n\n\nOnline: [ oattic01 oattic02 ]\n\n vip-slave  (ocf::heartbeat:IPaddr2):   Started oattic01 \n Resource Group: master-group\n     vip-master (ocf::heartbeat:IPaddr2):   Started oattic01 \n     vip-rep    (ocf::heartbeat:IPaddr2):   Started oattic01 \n Clone Set: clnPingCheck [pingCheck]\n     Started: [ oattic01 oattic02 ]\n Master\/Slave Set: msPostgresql [pgsql]\n     Masters: [ oattic01 ]\n     Slaves: [ oattic02 ]\n\nroot@oattic01:~# crm_mon -Qrf1A\nStack: corosync\nCurrent DC: oattic02 (2) - partition with quorum\nVersion: 1.1.10-42f2063\n2 Nodes configured\n7 Resources configured\n\n\nOnline: [ oattic01 oattic02 ]\n\nFull list of resources:\n\n vip-slave  (ocf::heartbeat:IPaddr2):   Started oattic02 \n Resource Group: master-group\n     vip-master (ocf::heartbeat:IPaddr2):   Started oattic01 \n     vip-rep    (ocf::heartbeat:IPaddr2):   Started oattic01 \n Clone Set: clnPingCheck [pingCheck]\n     Started: [ oattic01 oattic02 ]\n Master\/Slave Set: msPostgresql [pgsql]\n     Masters: [ oattic01 ]\n     Slaves: [ oattic02 ]\n\nNode Attributes:\n* Node oattic01:\n    + default_ping_set                  : 100       \n    + master-pgsql                      : 1000      \n    + pgsql-data-status                 : LATEST    \n    + pgsql-master-baseline             : 000000000C000090\n    + pgsql-status                      : PRI       \n* Node oattic02:\n    + default_ping_set                  : 100       \n    + master-pgsql                      : 100       \n    + pgsql-data-status                 : STREAMING|SYNC\n    + pgsql-status                      : HS:sync   \n\nMigration summary:\n* Node oattic01: \n* Node oattic02:\n<\/code><\/pre>\n<p>Confirm that a recovery file has been created on the Slave server:<\/p>\n<pre><code>root@oattic02:~# cat \/var\/lib\/postgresql\/9.3\/main\/recovery.conf\nstandby_mode = 'on'\nprimary_conninfo = 'host=10.20.1.200 port=5432 user=postgres application_name=oattic02 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'\nrestore_command = 'test -f \/var\/lib\/postgresql\/9.3\/main\/pg_archive\/%f &amp;&amp; cp \/var\/lib\/postgresql\/9.3\/main\/pg_archive\/%f %p'\nrecovery_target_timeline = 'latest'\n<\/code><\/pre>\n<p>and at the end that the Master and Slave are at the same <code>xlog<\/code> replication:<\/p>\n<pre><code>root@oattic01:~# su - postgres\npostgres@oattic01:~$ psql\npsql (9.3.15)\nType \"help\" for help.\n\npostgres=# SELECT pg_current_xlog_location();\n pg_current_xlog_location \n--------------------------\n 0\/5000340\n(1 row)\n\nroot@oattic02:~# su - postgres\npostgres@oattic02:~$ psql \npsql (9.3.15)\nType \"help\" for help.\n\npostgres=# select pg_last_xlog_replay_location();\n pg_last_xlog_replay_location \n------------------------------\n 0\/5000340\n(1 row)\n<\/code><\/pre>\n<p>In the Master&#8217;s log we can see <code>oattic02<\/code> becoming a synchronous replication Slave:<\/p>\n<pre><code>2017-02-09 13:43:39 AEDT [3234] @ LOG:  parameter \"synchronous_standby_names\" changed to \"oattic02\"\n2017-02-09 13:43:41 AEDT [4573] postgres@[unknown] LOG:  standby \"oattic02\" is now the synchronous standby with priority 1\n<\/code><\/pre>\n<p>Another check to show the VIP&#8217;s have been properly created on the Master:<\/p>\n<pre><code>root@oattic01:~# ip -f inet addr show | grep -E \"UP|inet\"\n1: lo: &lt;loopback ,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN group default \n    inet 127.0.0.1\/8 scope host lo\n2: eth1: &lt;broadcast ,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    inet 10.10.1.16\/24 brd 10.10.1.255 scope global eth1\n3: eth2: &lt;\/broadcast&gt;&lt;broadcast ,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    inet 10.20.1.10\/24 brd 10.20.1.255 scope global eth2\n    inet 10.20.1.200\/24 brd 10.20.1.255 scope global secondary eth2\n4: eth0: &lt;\/broadcast&gt;&lt;broadcast ,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    inet 192.168.0.134\/24 brd 192.168.0.255 scope global eth0\n    inet 192.168.0.241\/24 brd 192.168.0.255 scope global secondary eth0\n<\/code><\/pre>\n<p>and on the Slave we should only see the cluster read-only VIP attached to the <code>eth0<\/code> interface:<\/p>\n<pre><code>root@oattic02:~# ip -f inet addr show | grep -E \"UP|inet\"\n1: lo: &lt;loopback ,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN group default \n    inet 127.0.0.1\/8 scope host lo\n2: eth1: &lt;broadcast ,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    inet 10.10.1.17\/24 brd 10.10.1.255 scope global eth1\n3: eth2: &lt;\/broadcast&gt;&lt;broadcast ,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    inet 10.20.1.18\/24 brd 10.20.1.255 scope global eth2\n4: eth0: &lt;\/broadcast&gt;&lt;broadcast ,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    inet 192.168.0.135\/24 brd 192.168.0.255 scope global eth0\n    inet 192.168.0.242\/24 brd 192.168.0.255 scope global secondary eth0\n<\/code><\/pre>\n<p>If any issues try to cleanup affected resource like:<\/p>\n<pre><code>root@oattic01:~# crm resource cleanup msPostgresql\n<\/code><\/pre>\n<p>and if nothing else works reboot both nodes.<\/p>\n<h2>Couple of Tips<\/h2>\n<p>In case the Master wouldn&#8217;t start and the Slave is stuck in DISCONNECT state we can promote the Slave to Master by running:<\/p>\n<pre><code># crm_attribute -l forever -N oattic01 -n \"pgsql-data-status\" -v \"LATEST\"\n<\/code><\/pre>\n<p>where in this case the disconnected slave is on <code>oattic01<\/code>.<\/p>\n<p>Some other gotchas as pointed in the ClusterLabs page (see the links on the top):<\/p>\n<p>PGSQL.lock file (<code>\/var\/lib\/pgsql\/tmp\/PGSQL.lock<\/code>): The file is created on promote. And it&#8217;s deleted on demote only if Slave does not exist. If this file remains in a node, it means that the data may be inconsistent. Please copy all data from PRI and delete this lock file.<\/p>\n<p>Stop order: First, stop Slave. After that stop Master. If you stop Master first, PGSQL.lock file remains. If PGSQL would not start on any of the nodes, check for the <code>PGSQL.lock<\/code> file and delete it if present.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[serialposts] Setting up PostgreSQL synchronous or asynchronous replication cluster with Pacemaker is described in couple of resources like the official Pacemaker site PgSQL Replicated Cluster and the GitHub wiki of the OCF agent creator Resource Agent for PostgreSQL 9.1 streaming&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[26,25,20,38],"class_list":["post-400","post","type-post","status-publish","format-standard","hentry","category-database","tag-cluster","tag-high-availability","tag-pacemaker","tag-postgresql"],"_links":{"self":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/400","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=400"}],"version-history":[{"count":3,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/400\/revisions"}],"predecessor-version":[{"id":403,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/400\/revisions\/403"}],"wp:attachment":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=400"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=400"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=400"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}