PostgreSQL HA Cluster with Patroni — Ubuntu 20.04

PostgreSQL is a great open-source database, but it doesn't ship any HA story out of the box. The community answer is Patroni — a Python cluster manager that handles deploying and operating highly-available PostgreSQL clusters, backed by a distributed config store like etcd, Consul or ZooKeeper. It also takes care of replication, backup, and restore configs. In this post I'm walking through a 4-node setup, but you can scale the PostgreSQL nodes however you like.

What you'll need:

4 Ubuntu 20.04 servers
Full SSH access between them (SSH key trust)

Server	What runs on it	IP
psql01	PostgreSQL, Patroni	172.16.16.101
psql02	PostgreSQL, Patroni	172.16.16.102
etcd	etcd	172.16.16.103
haproxy	HAProxy	172.16.16.104

If you need a PostgreSQL install walkthrough first, I covered that in a separate post.

Installing etcd

Note: only run this on the etcd node.

etcd holds the state of the PostgreSQL cluster. Whenever a node's state changes, Patroni writes the new state into etcd's key/value store. etcd is what Patroni uses to elect the leader and keep the cluster healthy.

bash

1apt install etcd

Once etcd is installed, drop in the config:

bash

1vim /etc/default/etcd

bash

1ETCD_LISTEN_PEER_URLS="http://172.16.16.103:2380"2ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://172.16.16.103:2379"3ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.16.16.103:2380"4ETCD_INITIAL_CLUSTER="default=http://172.16.16.103:2380,"5ETCD_ADVERTISE_CLIENT_URLS="http://172.16.16.103:2379"6ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"7ETCD_INITIAL_CLUSTER_STATE="new"

Swap the IP for your own etcd host. After saving, restart the service:

bash

1systemctl restart etcd23systemctl status etcd

Installing HAProxy

Note: only run this on the HAProxy node.

HAProxy watches the master/slave nodes and routes client connections to the current leader.

bash

1apt install haproxy

We'll come back to the HAProxy config at the end, after PostgreSQL and Patroni are running.

Installing Patroni and PostgreSQL

Note: only run this on the PostgreSQL nodes.

Use the PostgreSQL install link above if you need it. Once Postgres is in place, install Patroni:

bash

1apt install python3-pip python3-dev libpq-dev -y2pip3 install --upgrade pip3pip install patroni4pip install python-etcd5pip install psycopg2

Infrastructure is in place — now we wire up the cluster.

Configuring Patroni

Note: do this on every node that runs PostgreSQL + Patroni.

Patroni's config is a single YAML that describes the whole cluster. Patroni needs a couple of PostgreSQL users, so set those up on both nodes first — set a password for the default postgres user and create a new replicator user that Patroni will use for replication:

sql

1su - postgres2psql3ALTER USER postgres PASSWORD 'sifre123';4CREATE USER replicator WITH ENCRYPTED PASSWORD 'sifre321';

Patroni also needs to reach a few PostgreSQL binaries, so symlink them. Run this on both Postgres nodes:

bash

1ln -s /usr/lib/postgresql/12/bin/* /usr/sbin/

Last bit of prep: create the data dir Patroni will use and lock down the perms.

bash

1mkdir -p /data/patroni2chown -R postgres:postgres /data/3chmod -R 700 /data/

Now we can write patroni.yml.

SSH into psql01 first:

bash

1vim /etc/patroni.yml

yaml

1scope: prodcluster # cluster name2namespace: /db/3name: prodpsql01 # node name45restapi:6    listen: 172.16.16.101:8008 # node IP7    connect_address: 172.16.16.101:8008 # node IP89etcd:10    host: 172.16.16.103:2379 # etcd server IP1112bootstrap:13    dcs:14        ttl: 3015        loop_wait: 1016        retry_timeout: 1017        maximum_lag_on_failover: 104857618        postgresql:19            use_pg_rewind: true2021    initdb:22    - encoding: UTF823    - data-checksums2425    pg_hba:26    - host replication replicator 127.0.0.1/32 md527    - host replication replicator 172.16.16.101/0 md5 # psql01 IP28    - host replication replicator 172.16.16.102/0 md5 # psql02 IP29    - host all all 0.0.0.0/0 md53031    users:32        admin:33            password: admin34            options:35                - createrole36                - createdb3738postgresql:39    listen: 10.90.214.183:543240    connect_address: 10.90.214.183:543241    data_dir: /data/patroni # Patroni data dir42    pgpass: /tmp/pgpass43    authentication:44        replication:45            username: replicator46            password: sifre12347        superuser:48            username: postgres49            password: sifre32150    parameters:51        unix_socket_directories: '.'5253tags:54    nofailover: false55    noloadbalance: false56    clonefrom: false57    nosync: false

Now psql02 — same file, just swap the IP:

bash

1vim /etc/patroni.yml

yaml

1scope: prodcluster # cluster name2namespace: /db/3name: prodpsql01 # node name45restapi:6    listen: 172.16.16.102:8008 # node IP7    connect_address: 172.16.16.102:8008 # node IP89etcd:10    host: 172.16.16.103:2379 # etcd server IP1112bootstrap:13    dcs:14        ttl: 3015        loop_wait: 1016        retry_timeout: 1017        maximum_lag_on_failover: 104857618        postgresql:19            use_pg_rewind: true2021    initdb:22    - encoding: UTF823    - data-checksums2425    pg_hba:26    - host replication replicator 127.0.0.1/32 md527    - host replication replicator 172.16.16.102/0 md5 # psql01 IP28    - host replication replicator 172.16.16.101/0 md5 # psql02 IP29    - host all all 0.0.0.0/0 md53031    users:32        admin:33            password: admin34            options:35                - createrole36                - createdb3738postgresql:39    listen: 10.90.214.183:543240    connect_address: 10.90.214.183:543241    data_dir: /data/patroni # Patroni data dir42    pgpass: /tmp/pgpass43    authentication:44        replication:45            username: replicator46            password: sifre12347        superuser:48            username: postgres49            password: sifre32150    parameters:51        unix_socket_directories: '.'5253tags:54    nofailover: false55    noloadbalance: false56    clonefrom: false57    nosync: false

With the YAML in place, turn Patroni into a systemd service. Do this on both PostgreSQL nodes:

bash

1vim /etc/systemd/system/patroni.service

ini

1[Unit]2Description=Runners to orchestrate a high-availability PostgreSQL3After=syslog.target network.target45[Service]6Type=simple78User=postgres9Group=postgres1011ExecStart=/usr/local/bin/patroni /etc/patroni.yml12KillMode=process13TimeoutSec=3014Restart=no1516[Install]17WantedBy=multi-user.target

That's everything — start the cluster:

bash

1systemctl daemon-reload2systemctl enable patroni # so Patroni starts on boot3systemctl enable postgresql4systemctl start patroni5systemctl start postgresql

PostgreSQL cluster's up. Last step: HAProxy config so we have a single endpoint for the database.

Configuring HAProxy

On the HAProxy node, edit the config. I'm using port 5000 for the database endpoint — pick whatever you want.

bash

1vim /etc/haproxy/haproxy.cfg

bash

1global2    maxconn 10034defaults5    log global6    mode tcp7    retries 28    timeout client 30m9    timeout connect 4s10    timeout server 30m11    timeout check 5s1213listen stats14    mode http15    bind *:700016    stats enable17    stats uri /1819listen postgres20    bind *:500021    option httpchk22    http-check expect status 20023    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions24    server PSQL01 172.16.16.101:5432 maxconn 100 check port 800825    server PSQL02 172.16.16.102:5432 maxconn 100 check port 8008

Save it and restart HAProxy:

bash

1systemctl restart haproxy

Open http://haproxy_ip:7000 in a browser to see the HAProxy stats page and the live cluster.

Both cluster nodes are listed. PSQL02 showing as down on the stats page is expected — that's how HAProxy reports replicas in this master/slave layout, not an actual failure.

That's it. haproxy_ip:5000 is now your database endpoint. You can also check the cluster from the CLI with patronictl:

bash

1patronictl -c /etc/patroni.yml list

Output:

bash

1+------------+---------------+---------+---------+----+-----------+2|   Member   |      Host     |   Role  |  State  | TL | Lag in MB |3+ Cluster: prodcluster (7138377709122875960) ----+----+-----------+4| prodpsql01 | 172.16.16.101 | Leader  | running | 20 |           |5| prodpsql02 | 172.16.16.102 | Replica | running | 20 |         0 |6+------------+---------------+---------+---------+----+-----------+

And that's a 4-node PostgreSQL HA cluster, ready to use. 🙂