Mastodon

A highly available WireGuard VPN setup

WireGuard is a communication protocol and free and open-source software that implements encrypted virtual private networks (VPNs), and was designed with the goals of ease of use, high speed performance, and low attack surface.

My Journey to WireGuard

I’ve been using it in my home lab setup since about 2020. When, in the end of 2021, it was finally merged into the Linux mainline with release 5.9, I started to replace my former Tinc-VPN setup with it. Tinc-VPN is another great open source VPN solution. Unfortunately, its development has stalled over the last years which motivated me to look for alternatives. In contrast to WireGuard, Tinc runs as a user-space daemon and uses tun/tap devices which adds a significant processing overhead. Like WireGuard, it is also using UDP for tunneling data, but falls back to TCP in situations where direct datagram connectivity is not feasible. Another big advantage of Tinc is its ability to form a mesh of nodes and to route traffic within it when direct P2P connections are not possible due to firewall restrictions. At the same time, this mesh is also used for facilitating direct connections by signaling endpoint addresses of NATed hosts.

Tinc’s mesh capability.

This mesh functionality made Tinc quite robust against the failure of single nodes as usually we could route traffic via other paths.

Highly Available WireGuard server setup

To counteract this shortcoming, this blog post will present a highly available WireGuard setup using the Virtual Router Redundancy Protocol (VRRP) as implemented by the keepalived daemon.

That said, it is worth noting that this setup does will not bring back some of the beloved features of Tinc. Both meshing, the peer and and endpoint discovery features of Tinc are currently and will never be supported by WireGuard. Jason A. Donenfeld the author of WireGuard focused the design of WireGuard on simplicity, performance and auditability. Hence advanced features like the ones mentioned will only be available to WireGuard by additional agents / daemons which control and configure WireGuard for you. Examples for such are Tailscale, Netmaker and Netbird.

The setup presented in this post is a so called active / standby configuration consisting of two almost equal configured Linux servers running both WireGuard and the keepalived daemon. As the name suggest only one of those two servers will by actively handling WireGuard tunneling traffic while the other one stands by for the event of a failure or maintenance of the active node.

VRRP Wireguard Setup

Requirements

Before get started some requirements for the setup:

  • 2 Servers running Linux 5.9 or newer.
  • A working Wireguard configuration.
  • A local L2 network segment two which both servers are connected.
  • Upstream connectivity without NATing via gateway connected to the network segment (usually provided by your internet or hosting provider).
  • An unused address to be used as Virtual IP (VIP) which roamed between the two servers by VRRP.

An important point is here the assumption that we are running both servers in the same switched network segment as this is a requirement for VRRP. We are also assuming that the upstream gateway performs no NATing. This guide covers only IPv6 addressing. However all steps can be also adapted or repeated for a dual stack or IPv4-only setup.

Detailed steps

Here are some of the specifics for my setup which need to be adapted by you:

  • Server Key (same use by both servers)
    • Private: YIEDx+A2ONo5+uv3mrk/p7ileL3T5QQ8hleQK0IYEEI=
    • Public: XGubrkGtuECdvoykKeUiNMigk2onfLCPfEo9Im+vmx8=
  • Peer Key (In this example we only have a single peer)
    • Private: OIbpWVIVVBOtWfwkmXkFRN7Q/jBdfYtsGt7j97aHx1Q=
    • Public: 3NGl6gTOGs6ai0RE91VmVFgF+N4gw1EBG11KOeiKJAg=
  • Public Server Subnet: 2001:DB8:1::/64
    • Gateway: 2001:DB8:1::1
    • Virtual IP: 2001:DB8:1::2
    • Server A: 2001:DB8:1:::3
    • Server B: 2001:DB8:1::4
  • WireGuard Tunnel Subnet: 2001:DB8:2::1/64
    • Server: 2001:DB8:2::1 (same used by both servers)
    • Peer: 2001:DB8:2::2
  • Interface names
    • Wireguard: wg1
    • Upstream: eno1

1. Prepare servers

We start of preparing the two servers by installing WireGuard and keepalived:

sudo apt install keepalived wireguard-tools iproute2

Next we configure a WireGuard interface on both servers using exactly the same configuration file at /etc/wireguard/wg1.conf:

[Interface]
Address = 2001:DB8:2::1/64
PrivateKey = YIEDx+A2ONo5+uv3mrk/p7ileL3T5QQ8hleQK0IYEEI=
ListenPort = 51800

[Peer]
PublicKey = 3NGl6gTOGs6ai0RE91VmVFgF+N4gw1EBG11KOeiKJAg=
AllowedIPs = 2001:DB8:2::2/128
PersistentKeepalive = 25

Similarly, a reciprocal configuration file is needed on the client side which skip here for brevity. Before proceeding, we activate the interface on both servers:

systemctl enable --now wg-quick@wg1

wg show wg1 # Check if interface is up

2. Configuring Keepalived

Create a configuration file for keepalived at /etc/keepalived/keepalived.conf

global_defs {
    enable_script_security
    script_user root
}

# Check if the server the WireGuard interface configured
vrrp_script check_wg {
    script "/usr/bin/wg show wg1"
    user root
}

vrrp_instance wg_v6 {
    interface eno1
    virtual_router_id 52
    notify /usr/local/bin/keepalived-wg.sh

    state BACKUP # use BACKUP for Server B
    priority 99 # use 100 for Server B

    virtual_ipaddress {
	2001:DB8:1::1/64
    }

    track_script {
        check_wg
    }
}

Create a notification script for keepalived at /usr/local/bin/keepalived-wg.sh

#!/bin/bash

TYPE=$1
NAME=$2
STATE=$3
PRIO=$4

WGIF=wg1

case ${STATE} in
	MASTER)
		ip link set up dev ${WGIF}
		;;

	BACKUP|FAULT|STOP|DELETED)
		ip link set down dev ${WGIF}
		;;

	*)
		echo "unknown state"
		exit 1
esac

Now start the keepalived daemon:

chmod +x /usr/local/bin/keepalived-wg.sh
systemctl enable --now keepalived

4. Testing the fail over

In our configuration, Server A has a higher VRRP priority and as such will be preferred if both servers are healthy. To test our setup, we simply bring down the WireGuard interface on Server A and observe how the VIP gets moved to Server B. From the WireGuard peers perspective not much changes. In fact no connections will be dropped during the fail-over. Internally, the clients WireGuard interface renegotiate the handshake. However, that step is actually not observable by the user.

Run the following commands on Server A while alongside test the connectivity from the client side through the tunnel via ping -i0.2 2001:DB8:2::1:

# Check that keepalived has moved the VIP to interface eno1
ip addr show dev eno1

# Bring down the Wireguard interface
wg-quick down wg1

# Keepalived should now have moved the VIP to Server B
ip addr show dev eno1

Going further

In my personal network, I operate a Interior Gateway Protocol (IGP) to dynamically route traffic within and also towards other networks. Common IGPs are OSPF, ISIS or BGP. In my specific case, both Servers A & B run the Bird2 routing daemon with interior and exterior BGP sessions.

So how does the WireGuard HA setup interoperates with my interior routing? Quite well actually. As my notify script (keepalive-wg.sh) will automatically bring up/down the interface, the routes attached to the interface will be picked up by Bird’s direct protocol.

I am also planning to extend my WireGuard agent ɯice to support the synchronization of WireGuard interface configurations between multiple servers.

Background

Surprisingly, the setup works by using Keepalived and does not require any iptables or nftables magic to rewrite source IP addresses. I’ve seen some people mentioning that SNAT/DNAT would be required to convince WireGuard to use the virtual IP instead of the server addresses. However, in my experience this was not necessary.

Another concern has been that the backup Wireguard interface still might attempt to establish a handshake with its peers. This would quite certainly interfere with the handshakes originated by the current master server. However, also this has not been proven to be the case. I assume the fact that our notify script brings down the WireGuard interface on the backup server causes them to cease all communication with its peers.