{"id":282,"date":"2016-09-22T11:31:45","date_gmt":"2016-09-22T01:31:45","guid":{"rendered":"https:\/\/icicimov.com\/blog\/?p=282"},"modified":"2017-01-02T22:43:59","modified_gmt":"2017-01-02T11:43:59","slug":"cluster-networking-for-multi-tenant-isolation-in-proxmox-with-openvswitch","status":"publish","type":"post","link":"https:\/\/icicimov.com\/blog\/?p=282","title":{"rendered":"Cluster Networking for Multi-tenant isolation in Proxmox with OpenVSwitch"},"content":{"rendered":"<p>This is probably the most complex part of the setup. It involves network configuration of the cluster in a way that the instances running on different nodes can still talk to each other. This is needed in order to provide clustering and HA of the VM services them self.<\/p>\n<p>Note that the config below can be done via PVE GUI as well but I prefer the manual approach. The networking on the nodes has been setup (<code>\/etc\/network\/interfaces<\/code> file as per usual) as shown below:<\/p>\n<pre><code># Internal networks\nauto eth1\niface eth1 inet static\n    address  10.10.1.185\n    netmask  255.255.255.0\n    metric 200\n\nauto eth2\niface eth2 inet static\n    address  10.20.1.185\n    netmask  255.255.255.0\n    metric 200\n\n# External network\niface eth0 inet manual\nauto vmbr0\niface vmbr0 inet static\n    address  192.168.0.185\n    netmask  255.255.255.0\n    gateway  192.168.0.1\n    bridge_ports eth0\n    bridge_stp off\n    bridge_fd 0\n    metric 100\n<\/code><\/pre>\n<p>Since the cluster nodes are going to be deployed in the provider&#8217;s infrastructure of we have no control at all, lets say physical switches to setup VLAN&#8217;s on, we need to come up with some kind of SDN ie L3 overlay network which we can configure according to our needs. One such solution is using <code>OpenVSwitch<\/code> and <code>GRE<\/code> or <code>VxLAN<\/code> overlay networks (tunnels) which resulted in me creating the following configuration under <code>\/etc\/network\/interfaces<\/code> on both nodes:<\/p>\n<pre><code># GRE\/VXLAN network\nallow-vmbr1 eth3\niface eth3 inet manual\n        ovs_bridge vmbr1\n        ovs_type OVSPort\n        mtu 1546\n        up ip link set eth3 up\n\n# GRE\/VXLAN bridge\nallow-ovs vmbr1\niface vmbr1 inet manual\n        ovs_type OVSBridge\n        ovs_ports eth3 tep0\n        up ip link set vmbr1 up\n\n# GRE\/VXLAN interface\nallow-vmbr1 tep0\niface tep0 inet static\n        ovs_bridge vmbr1\n        ovs_type OVSIntPort\n        #ovs_options tag=11\n        address 10.30.1.185\n        netmask 255.255.255.0\n\n# Integration bridge\nallow-ovs vmbr2\niface vmbr2 inet manual\n    ovs_type OVSBridge\n    ovs_ports vx1 dhcptap0\n    up ip link set vmbr2 up\n\n# GRE\/VXLAN tunnel\nallow-vmbr2 vx1\niface vx1 inet manual\n    ovs_type OVSTunnel\n    ovs-bridge vmbr2\n    ovs_tunnel_type vxlan\n    ovs_options trunks=11,22,33\n    ovs_tunnel_options options:remote_ip=10.30.1.186 options:key=flow options:dst_port=4789\n\n# DHCP server interface for VLAN-11\nallow-vmbr2 dhcptap0\niface dhcptap0 inet static\n        ovs_bridge vmbr2\n        ovs_type OVSIntPort\n        ovs_options tag=11\n        address 172.29.240.3\n        netmask 255.255.255.0\n\nand on proxmox02:\n# GRE\/VXLAN network\nallow-vmbr1 eth3\niface eth3 inet manual\n        ovs_bridge vmbr1\n        ovs_type OVSPort\n        mtu 1546\n        up ip link set eth3 up\n\n# GRE\/VXLAN bridge\nallow-ovs vmbr1\niface vmbr1 inet manual\n        ovs_type OVSBridge\n        ovs_ports eth3 tep0\n        up ip link set vmbr1 up\n\n# GRE\/VXLAN interface\nallow-vmbr1 tep0\niface tep0 inet static\n        ovs_bridge vmbr1\n        ovs_type OVSIntPort\n        #ovs_options tag=11\n        address 10.30.1.186\n        netmask 255.255.255.0\n\n# Integration bridge\nallow-ovs vmbr2\niface vmbr2 inet manual\n    ovs_type OVSBridge\n    ovs_ports vx1 dhcptap0\n    up ip link set vmbr2 up\n\n# GRE\/VXLAN tunnel\nallow-vmbr2 vx1\niface vx1 inet manual\n    ovs_type OVSTunnel\n    ovs-bridge vmbr2\n    ovs_tunnel_type vxlan\n    ovs_options trunks=11,22,33\n    ovs_tunnel_options options:remote_ip=10.30.1.185 options:key=flow options:dst_port=4789\n<\/code><\/pre>\n<p>The only limitation is the name of the OVS bridges which needs to be of format <code>vmbrX<\/code> where X is a digit, so the bridge gets recognized and activated in PVE. I have used <code>VxLAN<\/code> since it is most recent and more efficient tunneling type and adds less packet overhead compared to <code>GRE<\/code>. The major difference is though that VxLAN uses UDP (port 4789 by default), so nearly all routers properly distribute traffic to the next hop by hashing over the 5 tuple that include the UDP source and destination ports. I have used the network interface <code>eth3<\/code> as physical transport for the tunnel and moved its address to the virtual interface <code>tep0<\/code> which is a internal port for the OVS bridge <code>vmbr1<\/code>. Attaching the IP to a port instead of the bridge itself makes it possible to attach more than one network to this bridge. This makes the nodes IP&#8217;s on this network, <code>10.30.1.185<\/code> on <code>proxmox01<\/code> and <code>10.30.1.186<\/code> on <code>proxmox02<\/code>, become the endpoints of our VxLAN tunnel.<\/p>\n<p>The next part is the OVS bridge <code>vmbr2<\/code>. This is the bridge that holds the VxLAN end point interface <code>vx1<\/code> on each side and the bridge that every VM launched gets connected to in order to be able to communicate with its peers running on the other node. This VxLAN tunnel can hold many different VLAN&#8217;s each marked with its own dedicated tag in OVS which takes care of the routing flows and traffic separation so the VLAN&#8217;s stay isolated from each other. In this case I have limited the tags to 11, 22 and 33 meaning I want to have only 3 different networks in my setup.<\/p>\n<blockquote><p>\n  <strong>NOTE:<\/strong> VxLAN by default needs multicast enabled on the network which often is not available on the cloud providers like AWS lets say. In this case we use <code>unicast<\/code> by specifying the IP&#8217;s of the endpoints.<\/p>\n<p>  <strong>NOTE:<\/strong> Both GRE and VxLAN do network encapsulation but do not provide encryption thus they are best suited for private LAN usage. In case of WAN an additional tool providing encryption needs to be used, ie some VPN option like OpenVPN, IPSEC or PeerVPN, in order to protect sensitive traffic.\n<\/p><\/blockquote>\n<p>After networking restart this is the OVS structure we&#8217;ve got created:<\/p>\n<pre><code>root@proxmox01:~# ovs-vsctl show\nf463d896-7fcb-40b1-b4a1-e493b255d978\n    Bridge \"vmbr2\"\n        Port \"vmbr2\"\n            Interface \"vmbr2\"\n                type: internal\n        Port \"vx1\"\n            trunks: [11, 22, 33]\n            Interface \"vx1\"\n                type: vxlan\n                options: {dst_port=\"4789\", key=flow, remote_ip=\"10.10.1.186\"}\n    Bridge \"vmbr1\"\n        Port \"eth3\"\n            Interface \"eth3\"\n        Port \"vmbr1\"\n            Interface \"vmbr1\"\n                type: internal\n        Port \"tep0\"\n            Interface \"tep0\"\n                type: internal\n    ovs_version: \"2.3.0\"\n<\/code><\/pre>\n<p>and on node proxmox02:<\/p>\n<pre><code>root@proxmox02:~# ovs-vsctl show\n76ca2f71-3963-4a65-beb9-cc5807cf9a17\n    Bridge \"vmbr2\"\n        Port \"vmbr2\"\n            Interface \"vmbr2\"\n                type: internal\n        Port \"vx1\"\n            trunks: [11, 22, 33]\n            Interface \"vx1\"\n                type: vxlan\n                options: {dst_port=\"4789\", key=flow, remote_ip=\"10.10.1.185\"}\n    Bridge \"vmbr1\"\n        Port \"tep0\"\n            Interface \"tep0\"\n                type: internal\n        Port \"vmbr1\"\n            Interface \"vmbr1\"\n                type: internal\n        Port \"eth3\"\n            Interface \"eth3\"\n    ovs_version: \"2.3.0\"\n<\/code><\/pre>\n<p>The network bridges, ports and interfaces will also appear in PVE and can be seen in the <code>Networking<\/code> tab of the GUI.<\/p>\n<p>Then I went and launched two test LXC containers, one on each node, and connected both to the <code>vmbr2<\/code>. Each container was created with two network interfaces, one tagged with tag 11 and the other with tag 22. Now we can see some new interfaces added by PVE to <code>vmbr2<\/code> and tagged by the appropriate tags. On the first node where <code>lxc01<\/code> (PVE instance id 100) was launched :<\/p>\n<pre><code>root@proxmox01:~# ovs-vsctl show\nf463d896-7fcb-40b1-b4a1-e493b255d978\n    Bridge \"vmbr2\"\n        Port \"vmbr2\"\n            Interface \"vmbr2\"\n                type: internal\n        Port \"dhcptap0\"\n            tag: 11\n            Interface \"dhcptap0\"\n                type: internal\n        Port \"veth100i1\"\n            tag: 11\n            Interface \"veth100i1\"\n        Port \"veth100i2\"\n            tag: 22\n            Interface \"veth100i2\"\n        Port \"vx1\"\n            trunks: [11, 22, 33]\n            Interface \"vx1\"\n                type: vxlan\n                options: {dst_port=\"4789\", key=flow, remote_ip=\"10.10.1.186\"}\n    Bridge \"vmbr1\"\n        Port \"eth3\"\n            Interface \"eth3\"\n        Port \"vmbr1\"\n            Interface \"vmbr1\"\n                type: internal\n        Port \"tep0\"\n            Interface \"tep0\"\n                type: internal\n    ovs_version: \"2.3.0\"\n<\/code><\/pre>\n<p>we have <code>veth100i1<\/code> and <code>veth100i2<\/code> created and on <code>proxmox02<\/code> we have <code>veth101i1<\/code> and <code>veth102i2<\/code> created:<\/p>\n<pre><code>root@proxmox02:~# ovs-vsctl show\n76ca2f71-3963-4a65-beb9-cc5807cf9a17\n    Bridge \"vmbr2\"\n        Port \"vmbr2\"\n            Interface \"vmbr2\"\n                type: internal\n        Port \"veth101i1\"\n            tag: 11\n            Interface \"veth101i1\"\n        Port \"dhcptap0\"\n            tag: 11\n            Interface \"dhcptap0\"\n                type: internal\n        Port \"vx1\"\n            trunks: [11, 22, 33]\n            Interface \"vx1\"\n                type: vxlan\n                options: {dst_port=\"4789\", key=flow, remote_ip=\"10.10.1.185\"}\n        Port \"veth101i2\"\n            tag: 22\n            Interface \"veth101i2\"\n    Bridge \"vmbr1\"\n        Port \"tep0\"\n            Interface \"tep0\"\n                type: internal\n        Port \"vmbr1\"\n            Interface \"vmbr1\"\n                type: internal\n        Port \"eth3\"\n            Interface \"eth3\"\n    ovs_version: \"2.3.0\"\n<\/code><\/pre>\n<p>The PVE built-in OVS integration is working great as we can see. Now if we login to the containers and check the connectivity:<\/p>\n<pre>\nroot@lxc01:~# ip addr show eth2     \n46: eth2@if47: &lt;broadcast ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    link\/ether 66:30:65:66:62:64 brd ff:ff:ff:ff:ff:ff\n    inet 172.29.250.10\/24 brd 172.29.250.255 scope global eth2\n       valid_lft forever preferred_lft forever\n    inet6 fe80::6430:65ff:fe66:6264\/64 scope link\n       valid_lft forever preferred_lft forever\n \nroot@lxc02:~# ip addr show eth2\n34: eth2@if35: &lt;broadcast ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    link\/ether 62:37:61:63:65:64 brd ff:ff:ff:ff:ff:ff\n    inet 172.29.250.11\/24 brd 172.29.250.255 scope global eth2\n       valid_lft forever preferred_lft forever\n    inet6 fe80::6037:61ff:fe63:6564\/64 scope link\n       valid_lft forever preferred_lft forever\n \nroot@lxc01:~# ping -c 4 172.29.250.11\nPING 172.29.250.11 (172.29.250.11) 56(84) bytes of data.\n64 bytes from 172.29.250.11: icmp_seq=1 ttl=64 time=1.30 ms\n64 bytes from 172.29.250.11: icmp_seq=2 ttl=64 time=0.952 ms\n64 bytes from 172.29.250.11: icmp_seq=3 ttl=64 time=0.503 ms\n64 bytes from 172.29.250.11: icmp_seq=4 ttl=64 time=0.545 ms\n--- 172.29.250.11 ping statistics ---\n4 packets transmitted, 4 received, 0% packet loss, time 3001ms\nrtt min\/avg\/max\/mdev = 0.503\/0.826\/1.307\/0.329 ms\n \nroot@lxc02:~# ping -c 4 172.29.250.10\nPING 172.29.250.10 (172.29.250.10) 56(84) bytes of data.\n64 bytes from 172.29.250.10: icmp_seq=1 ttl=64 time=1.63 ms\n64 bytes from 172.29.250.10: icmp_seq=2 ttl=64 time=0.493 ms\n64 bytes from 172.29.250.10: icmp_seq=3 ttl=64 time=0.525 ms\n64 bytes from 172.29.250.10: icmp_seq=4 ttl=64 time=0.510 ms\n--- 172.29.250.10 ping statistics ---\n4 packets transmitted, 4 received, 0% packet loss, time 3071ms\nrtt min\/avg\/max\/mdev = 0.493\/0.791\/1.637\/0.488 ms\n<\/pre>\n<p>the containers on the <code>172.29.250.0\/24<\/code> network can see each other although running on two different nodes. Now to the last part of the setup &#8230; the DHCP.<\/p>\n<h2>Providing DHCP service to the VM networks<\/h2>\n<p>All the above is fine as long as we configure our VM&#8217;s with static IP&#8217;s matching the VLAN they are connecting to. But what if we just want to launch the VM and don&#8217;t care about the IP it gets? PVE it self has an option to launch the VM with DHCP instead of static IP but the thing is it does not provide the DHCP service it self. The reason is probably the complexity involved in supporting the DHCP service in HA setup, first there can be only a single instance of DHCP service running for a given VLAN at any given time and second if the node DHCP is running on crashes the service needs to be moved to the second node. I decided to solve this challenge using <code>dnsmasq<\/code> and <code>keepalived<\/code>. I added the following interface to both PVE nodes in <code>\/etc\/network\/interfaces<\/code>:<\/p>\n<pre><code># DHCP server interface for VLAN-11\nallow-vmbr2 dhcptap0\niface dhcptap0 inet static\n        ovs_bridge vmbr2\n        ovs_type OVSIntPort\n        ovs_options tag=11\n        address 172.29.240.3\n        netmask 255.255.255.0\n<\/code><\/pre>\n<p>Then configured keepalived in <code>\/etc\/keepalived\/keepalived.conf<\/code> to manage a floating VIP on this interface and attach a dnsmasq service configured as DHCP service for the VLAN (in this case the one tagged with 11):<\/p>\n<pre><code>global_defs {\n   notification_email {\n     igorc@encompasscorporation.com\n   }\n   notification_email_from proxmox01\n   smtp_server localhost\n   smtp_connect_timeout 30\n   lvs_id dnsmasq\n}\n\nvrrp_script dnsmasq-dhcptap0 {\n    script \"kill -0 $(cat \/var\/run\/dnsmasq\/dnsmasq-dhcptap0.pid)\"\n    interval 2\n    fall 2     \n    rise 2\n    weight 20\n}\n\nvrrp_instance dnsmasq-dhcptap0 {\n    state BACKUP\n    priority 102\n    interface vmbr0\n    virtual_router_id 47\n    advert_int 3\n    lvs_sync_daemon_interface eth2\n    nopreempt\n\n    unicast_src_ip 192.168.0.185\n    unicast_peer {\n        192.168.0.186\n    }\n\n    notify_master \"\/etc\/keepalived\/dnsmasq.sh start dhcptap0 proxmox02\"\n    notify_backup \"\/etc\/keepalived\/dnsmasq.sh stop dhcptap0\"\n    smtp_alert\n\n    virtual_ipaddress {\n        172.29.240.3\/24 dev dhcptap0 scope global\n    }\n\n    virtual_routes {\n        172.29.240.0\/24 dev dhcptap0\n    }\n\n    track_script {\n        dnsmasq-dhcptap0\n    }\n\n    track_interface {\n        eth2\n        dhcptap0\n    }\n}\n<\/code><\/pre>\n<p>on the second node proxmox02:<\/p>\n<pre><code>global_defs {\n   notification_email {\n     igorc@encompasscorporation.com\n   }\n   notification_email_from proxmox02\n   smtp_server localhost\n   smtp_connect_timeout 30\n   lvs_id dnsmasq\n}\n\nvrrp_script dnsmasq-dhcptap0 {\n    script \"kill -0 $(cat \/var\/run\/dnsmasq\/dnsmasq-dhcptap0.pid)\"\n    interval 2\n    fall 2      \n    rise 2\n    weight 20\n}\n\nvrrp_instance dnsmasq-dhcptap0 {\n    state BACKUP\n    priority 101\n    interface vmbr0\n    virtual_router_id 47\n    advert_int 3\n    lvs_sync_daemon_interface eth2\n    nopreempt\n    garp_master_delay 1\n\n    unicast_src_ip 192.168.0.186\n    unicast_peer {\n        192.168.0.185\n    }\n\n    notify_master \"\/etc\/keepalived\/dnsmasq.sh start dhcptap0 proxmox01\"\n    notify_backup \"\/etc\/keepalived\/dnsmasq.sh stop dhcptap0\"\n    smtp_alert\n\n    virtual_ipaddress {\n        172.29.240.3\/24 dev dhcptap0 scope global\n    }\n\n    virtual_routes {\n        172.29.240.0\/24 dev dhcptap0\n    }\n\n    track_script {\n        dnsmasq-dhcptap0\n    }\n\n    track_interface {\n        eth1\n        dhcptap0\n    }\n}\n<\/code><\/pre>\n<p>On startup, keepalived will promote to <code>MASTER<\/code> on one of the nodes and to <code>BACKUP<\/code> on the other. The MASTER node will then run the <code>\/etc\/keepalived\/dnsmasq.sh<\/code> script:<\/p>\n<pre><code>#!\/bin\/bash\nCRONFILE=\"\/var\/spool\/cron\/crontabs\/root\"\nLEASEFILE=\"\/var\/run\/dnsmasq\/dnsmasq-${2}.leases\"\nPIDFILE=\"\/var\/run\/dnsmasq\/dnsmasq-${2}.pid\"\ncase \"$1\" in\n  start)\n         [[ ! -d \/var\/run\/dnsmasq ]] &amp;&amp; mkdir -p \/var\/run\/dnsmasq\n         \/sbin\/ip link set dev ${2} up &amp;&amp; \\\n         \/usr\/sbin\/dnsmasq -u root --conf-file=\/etc\/dnsmasq.d\/dnsmasq-${2} &amp;&amp; \\\n         [[ $(grep -c ${2} $CRONFILE) -eq 0 ]] &amp;&amp; echo \"* * * * * \/usr\/bin\/scp $LEASEFILE ${3}:$LEASEFILE\" | tee -a $CRONFILE\n         ssh $3 \"cat $PIDFILE | xargs kill -15\"\n         ssh $3 \"sed -i '\/dnsmasq-${2}.leases\/d' $CRONFILE\"\n         \/bin\/kill -0 $(&lt; $PIDFILE) &amp;&amp; exit 0 || echo \"Failed to start dnsmasq for $2.\"\n         ;;\n   stop)\n         sed -i '\/dnsmasq-${2}.leases\/d' $CRONFILE || echo \"Failed to remove cronjob for $2 leases sync.\"\n         #sed -n -i '\/dnsmasq-${2}.leases\/d' \/var\/spool\/cron\/crontabs\/root\n         [[ -f \"$PIDFILE\" ]] &amp;&amp; \/bin\/kill -15 $(&lt; $PIDFILE) &amp;&amp; exit 0 || echo \"Failed to stop dnsmasq for $2 or process doesn't exist.\"\n         ;;\n      *)\n         echo \"Usage: $0 [start|stop] interface_name peer_hostname\"\nesac\nexit 1\n<\/code><\/pre>\n<p>which will activate the <code>dhcptap0<\/code> OVS port interface on <code>vmbr2<\/code>, start a dnsmasq DHCP process, that will load its configuration from <code>\/etc\/dnsmasq.d\/dnsmasq-dhcptap0<\/code>, and attach it to <code>dhcptap0<\/code>. Lastly it will create a cron job which will constantly copy the DHCP leases to the standby node so in case of takeover that node has the list of IP&#8217;s that have already been dedicated. The DHCP config file <code>\/etc\/dnsmasq.d\/dnsmasq-dhcptap0<\/code> looks like this:<\/p>\n<pre><code>strict-order\nbind-interfaces\ninterface=dhcptap0\npid-file=\/var\/run\/dnsmasq\/dnsmasq-dhcptap0.pid\nlisten-address=172.29.240.3\nexcept-interface=lo\ndhcp-range=172.29.240.128,172.29.240.254,12h\ndhcp-leasefile=\/var\/run\/dnsmasq\/dnsmasq-dhcptap0.leases\n<\/code><\/pre>\n<p>Upon takeover, the script will also send an email to a dedicated address to let us know that switch over had occur. Of course the script needs to be executable:<\/p>\n<pre><code># chmod +x \/etc\/keepalived\/dnsmasq.sh\n<\/code><\/pre>\n<p>And we also disable the dnsmasq daemon in <code>\/etc\/default\/dnsmasq<\/code> since we want to start multiple processes manually when needed:<\/p>\n<pre><code>[...]\nENABLED=0\n[...]\n<\/code><\/pre>\n<p>In case we need debugging we can add:<\/p>\n<pre><code>[...]\nDAEMON_ARGS=\"-D\n[...]\n<\/code><\/pre>\n<p>to the config as well.<\/p>\n<p>Then I have configured <code>eth1<\/code> as DHCP interface on <code>lxc01<\/code> and <code>lxc02<\/code> and after bringing up the interfaces I checked the keepalived status:<\/p>\n<pre><code>root@proxmox02:~# systemctl status keepalived.service\n keepalived.service - LSB: Starts keepalived\n   Loaded: loaded (\/etc\/init.d\/keepalived)\n   Active: active (running) since Tue 2016-03-15 14:57:17 AEDT; 2 weeks 0 days ago\n  Process: 18834 ExecStop=\/etc\/init.d\/keepalived stop (code=exited, status=0\/SUCCESS)\n  Process: 19542 ExecStart=\/etc\/init.d\/keepalived start (code=exited, status=0\/SUCCESS)\n   CGroup: \/system.slice\/keepalived.service\n           \u251c\u250019545 \/usr\/sbin\/keepalived -D\n           \u251c\u250019546 \/usr\/sbin\/keepalived -D\n           \u251c\u250019547 \/usr\/sbin\/keepalived -D\n           \u2514\u250020072 \/usr\/sbin\/dnsmasq -u root --conf-file=\/etc\/dnsmasq.d\/dnsmasq-dhcptap0\nMar 30 03:55:53 proxmox02 dnsmasq-dhcp[20072]: DHCPREQUEST(dhcptap0) 172.29.240.192 62:36:35:61:62:33\nMar 30 03:55:53 proxmox02 dnsmasq-dhcp[20072]: DHCPACK(dhcptap0) 172.29.240.192 62:36:35:61:62:33 lxc02\nMar 30 06:39:58 proxmox02 dnsmasq-dhcp[20072]: DHCPREQUEST(dhcptap0) 172.29.240.176 3a:64:63:36:34:39\nMar 30 06:39:58 proxmox02 dnsmasq-dhcp[20072]: DHCPACK(dhcptap0) 172.29.240.176 3a:64:63:36:34:39 lxc01\nMar 30 08:59:47 proxmox02 dnsmasq-dhcp[20072]: DHCPREQUEST(dhcptap0) 172.29.240.192 62:36:35:61:62:33\nMar 30 08:59:47 proxmox02 dnsmasq-dhcp[20072]: DHCPACK(dhcptap0) 172.29.240.192 62:36:35:61:62:33 lxc02\nMar 30 11:27:56 proxmox02 dnsmasq-dhcp[20072]: DHCPREQUEST(dhcptap0) 172.29.240.176 3a:64:63:36:34:39\nMar 30 11:27:56 proxmox02 dnsmasq-dhcp[20072]: DHCPACK(dhcptap0) 172.29.240.176 3a:64:63:36:34:39 lxc01\nMar 30 13:17:30 proxmox02 dnsmasq-dhcp[20072]: DHCPREQUEST(dhcptap0) 172.29.240.192 62:36:35:61:62:33\nMar 30 13:17:30 proxmox02 dnsmasq-dhcp[20072]: DHCPACK(dhcptap0) 172.29.240.192 62:36:35:61:62:33 lxc02\n<\/code><\/pre>\n<p>we can see the requests coming through and the leases file populated and in sync on both nodes:<\/p>\n<pre><code>root@proxmox02:~# cat \/var\/run\/dnsmasq\/dnsmasq-dhcptap0.leases\n1462198134 3a:64:63:36:34:39 172.29.240.176 lxc01 *\n1462195647 62:36:35:61:62:33 172.29.240.192 lxc02 *\n\nroot@proxmox01:~# cat \/var\/run\/dnsmasq\/dnsmasq-dhcptap0.leases\n1462198134 3a:64:63:36:34:39 172.29.240.176 lxc01 *\n1462195647 62:36:35:61:62:33 172.29.240.192 lxc02 *\n<\/code><\/pre>\n<p>thanks to the cronjob set on the master node:<\/p>\n<pre><code>root@proxmox02:~# crontab -l | grep -v ^\\#\n* * * * * \/usr\/bin\/scp \/var\/run\/dnsmasq\/dnsmasq-dhcptap0.leases proxmox01:\/var\/run\/dnsmasq\/dnsmasq-dhcptap0.leases\n<\/code><\/pre>\n<p>The script moves the job to the new <code>MASTER<\/code> upon fail-over. For the end we test the connectivity from the containers to confirm all is working well:<\/p>\n<pre>\nroot@lxc01:~# ip addr show eth1\n48: eth1@if49:  &lt;BROADCAST ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    link\/ether 3a:64:63:36:34:39 brd ff:ff:ff:ff:ff:ff\n    inet 172.29.240.176\/24 brd 172.29.240.255 scope global eth1\n       valid_lft forever preferred_lft forever\n    inet6 fe80::3864:63ff:fe36:3439\/64 scope link\n       valid_lft forever preferred_lft forever\n \nroot@lxc02:~# ip addr show eth1\n30: eth1@if31: &lt;BROADCAST ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    link\/ether 62:36:35:61:62:33 brd ff:ff:ff:ff:ff:ff\n    inet 172.29.240.192\/24 brd 172.29.240.255 scope global eth1\n       valid_lft forever preferred_lft forever\n    inet6 fe80::6036:35ff:fe61:6233\/64 scope link\n       valid_lft forever preferred_lft forever\n \nroot@lxc01:~# ping -c 4 172.29.240.192  \nPING 172.29.240.192 (172.29.240.192) 56(84) bytes of data.\n64 bytes from 172.29.240.192: icmp_seq=1 ttl=64 time=1.30 ms\n64 bytes from 172.29.240.192: icmp_seq=2 ttl=64 time=0.906 ms\n64 bytes from 172.29.240.192: icmp_seq=3 ttl=64 time=0.591 ms\n64 bytes from 172.29.240.192: icmp_seq=4 ttl=64 time=0.672 ms\n--- 172.29.240.192 ping statistics ---\n4 packets transmitted, 4 received, 0% packet loss, time 3040ms\nrtt min\/avg\/max\/mdev = 0.591\/0.869\/1.309\/0.280 ms\n \nroot@lxc02:~# ping -c 4 172.29.240.176  \nPING 172.29.240.176 (172.29.240.176) 56(84) bytes of data.\n64 bytes from 172.29.240.176: icmp_seq=1 ttl=64 time=1.23 ms\n64 bytes from 172.29.240.176: icmp_seq=2 ttl=64 time=0.583 ms\n64 bytes from 172.29.240.176: icmp_seq=3 ttl=64 time=0.622 ms\n64 bytes from 172.29.240.176: icmp_seq=4 ttl=64 time=0.554 ms\n--- 172.29.240.176 ping statistics ---\n4 packets transmitted, 4 received, 0% packet loss, time 2999ms\nrtt min\/avg\/max\/mdev = 0.554\/0.748\/1.233\/0.281 ms\n<\/pre>\n<h2>Network Isolation<\/h2>\n<p>As mentioned previously, this setup offers the benefit of network isolation meaning, as per our config, the VM&#8217;s attached to VLAN-11 for example will not be able to talk to the ones attached to VLAN-22. This means these VLAN&#8217;s can be given to different tenants and they will not be able to see each other traffic although both of them are using the same SDN. This is courtesy of the <code>VxLAN<\/code> tunnel properties (and <code>GRE<\/code> as well for that matter) of <code>L2<\/code> tagging.<\/p>\n<p>To test this I have added new interface eth3 on both containers and set its IP in the same subnet as interface eth2 but with different tag of 33:<\/p>\n<pre>\nroot@lxc01:~# ip addr show eth2\n46: eth2@if47:  &lt;BROADCAST ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    link\/ether 66:30:65:66:62:64 brd ff:ff:ff:ff:ff:ff\n    inet 172.29.250.10\/24 brd 172.29.250.255 scope global eth2\n       valid_lft forever preferred_lft forever\n    inet6 fe80::6430:65ff:fe66:6264\/64 scope link\n       valid_lft forever preferred_lft forever\n \nroot@lxc01:~# ip addr show eth3\n57: eth3@if58:  &lt;BROADCAST ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000\n    link\/ether 36:32:66:37:62:39 brd ff:ff:ff:ff:ff:ff\n    inet 172.29.250.13\/24 scope global eth3\n       valid_lft forever preferred_lft forever\n    inet6 fe80::3432:66ff:fe37:6239\/64 scope link\n       valid_lft forever preferred_lft forever\n<\/pre>\n<p>Now, if I try to ping <code>172.29.250.10<\/code> or <code>172.29.250.11<\/code> (on proxmox02) from <code>172.29.250.10<\/code>:<\/p>\n<pre><code>root@lxc01:~# ping -c 4 -W 5 -I eth3 172.29.250.10\nPING 172.29.250.10 (172.29.250.10) from 172.29.250.13 eth3: 56(84) bytes of data.\nFrom 172.29.250.13 icmp_seq=1 Destination Host Unreachable\nFrom 172.29.250.13 icmp_seq=2 Destination Host Unreachable\nFrom 172.29.250.13 icmp_seq=3 Destination Host Unreachable\nFrom 172.29.250.13 icmp_seq=4 Destination Host Unreachable\n--- 172.29.250.10 ping statistics ---\n4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3016ms\npipe 3\n\nroot@lxc01:~# ping -c 4 -W 5 -I eth3 172.29.250.11\nPING 172.29.250.11 (172.29.250.11) from 172.29.250.13 eth3: 56(84) bytes of data.\nFrom 172.29.250.13 icmp_seq=1 Destination Host Unreachable\nFrom 172.29.250.13 icmp_seq=2 Destination Host Unreachable\nFrom 172.29.250.13 icmp_seq=3 Destination Host Unreachable\nFrom 172.29.250.13 icmp_seq=4 Destination Host Unreachable\n--- 172.29.250.11 ping statistics ---\n4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 2999ms\npipe 4\n<\/code><\/pre>\n<p>we can see the connectivity is failing. Although the interfaces belong to the same <code>L3<\/code> class network they have been isolated on <code>L2<\/code> layer in the SDN and thus exist as separate networks.<\/p>\n<p>[serialposts]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is probably the most complex part of the setup. It involves network configuration of the cluster in a way that the instances running on different nodes can still talk to each other. This is needed in order to provide&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,22,13],"tags":[26,25,18,24,31,23],"class_list":["post-282","post","type-post","status-publish","format-standard","hentry","category-cluster","category-kvm","category-virtualization","tag-cluster","tag-high-availability","tag-iscsi","tag-kvm","tag-ovs","tag-proxmox"],"_links":{"self":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/282","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=282"}],"version-history":[{"count":6,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/282\/revisions"}],"predecessor-version":[{"id":288,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/282\/revisions\/288"}],"wp:attachment":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=282"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=282"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=282"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}