PostgreSQL HA Cluster with Patroni — Ubuntu 20.04
PostgreSQL is a great open-source database, but it doesn't ship any HA story out of the box. The community answer is Patroni — a Python cluster manager that handles deploying and operating highly-available PostgreSQL clusters, backed by a distributed config store like etcd, Consul or ZooKeeper. It also takes care of replication, backup, and restore configs. In this post I'm walking through a 4-node setup, but you can scale the PostgreSQL nodes however you like.
What you'll need:
- 4 Ubuntu 20.04 servers
- Full SSH access between them (SSH key trust)
| Server | What runs on it | IP |
|---|---|---|
| psql01 | PostgreSQL, Patroni | 172.16.16.101 |
| psql02 | PostgreSQL, Patroni | 172.16.16.102 |
| etcd | etcd | 172.16.16.103 |
| haproxy | HAProxy | 172.16.16.104 |
If you need a PostgreSQL install walkthrough first, I covered that in a separate post.
Installing etcd
Note: only run this on the etcd node.
etcd holds the state of the PostgreSQL cluster. Whenever a node's state changes, Patroni writes the new state into etcd's key/value store. etcd is what Patroni uses to elect the leader and keep the cluster healthy.
apt install etcdOnce etcd is installed, drop in the config:
vim /etc/default/etcdETCD_LISTEN_PEER_URLS="http://172.16.16.103:2380"ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://172.16.16.103:2379"ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.16.16.103:2380"ETCD_INITIAL_CLUSTER="default=http://172.16.16.103:2380,"ETCD_ADVERTISE_CLIENT_URLS="http://172.16.16.103:2379"ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"ETCD_INITIAL_CLUSTER_STATE="new"Swap the IP for your own etcd host. After saving, restart the service:
systemctl restart etcdsystemctl status etcdInstalling HAProxy
Note: only run this on the HAProxy node.
HAProxy watches the master/slave nodes and routes client connections to the current leader.
apt install haproxyWe'll come back to the HAProxy config at the end, after PostgreSQL and Patroni are running.
Installing Patroni and PostgreSQL
Note: only run this on the PostgreSQL nodes.
Use the PostgreSQL install link above if you need it. Once Postgres is in place, install Patroni:
apt install python3-pip python3-dev libpq-dev -ypip3 install --upgrade pippip install patronipip install python-etcdpip install psycopg2Infrastructure is in place — now we wire up the cluster.
Configuring Patroni
Note: do this on every node that runs PostgreSQL + Patroni.
Patroni's config is a single YAML that describes the whole cluster. Patroni needs a couple of PostgreSQL users, so set those up on both nodes first — set a password for the default postgres user and create a new replicator user that Patroni will use for replication:
su - postgrespsqlALTER USER postgres PASSWORD 'sifre123';CREATE USER replicator WITH ENCRYPTED PASSWORD 'sifre321';Patroni also needs to reach a few PostgreSQL binaries, so symlink them. Run this on both Postgres nodes:
ln -s /usr/lib/postgresql/12/bin/* /usr/sbin/Last bit of prep: create the data dir Patroni will use and lock down the perms.
mkdir -p /data/patronichown -R postgres:postgres /data/chmod -R 700 /data/Now we can write patroni.yml.
SSH into psql01 first:
vim /etc/patroni.ymlscope: prodcluster # cluster namenamespace: /db/name: prodpsql01 # node namerestapi: listen: 172.16.16.101:8008 # node IP connect_address: 172.16.16.101:8008 # node IPetcd: host: 172.16.16.103:2379 # etcd server IPbootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 md5 - host replication replicator 172.16.16.101/0 md5 # psql01 IP - host replication replicator 172.16.16.102/0 md5 # psql02 IP - host all all 0.0.0.0/0 md5 users: admin: password: admin options: - createrole - createdbpostgresql: listen: 10.90.214.183:5432 connect_address: 10.90.214.183:5432 data_dir: /data/patroni # Patroni data dir pgpass: /tmp/pgpass authentication: replication: username: replicator password: sifre123 superuser: username: postgres password: sifre321 parameters: unix_socket_directories: '.'tags: nofailover: false noloadbalance: false clonefrom: false nosync: falseNow psql02 — same file, just swap the IP:
vim /etc/patroni.ymlscope: prodcluster # cluster namenamespace: /db/name: prodpsql01 # node namerestapi: listen: 172.16.16.102:8008 # node IP connect_address: 172.16.16.102:8008 # node IPetcd: host: 172.16.16.103:2379 # etcd server IPbootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 md5 - host replication replicator 172.16.16.102/0 md5 # psql01 IP - host replication replicator 172.16.16.101/0 md5 # psql02 IP - host all all 0.0.0.0/0 md5 users: admin: password: admin options: - createrole - createdbpostgresql: listen: 10.90.214.183:5432 connect_address: 10.90.214.183:5432 data_dir: /data/patroni # Patroni data dir pgpass: /tmp/pgpass authentication: replication: username: replicator password: sifre123 superuser: username: postgres password: sifre321 parameters: unix_socket_directories: '.'tags: nofailover: false noloadbalance: false clonefrom: false nosync: falseWith the YAML in place, turn Patroni into a systemd service. Do this on both PostgreSQL nodes:
vim /etc/systemd/system/patroni.service[Unit]Description=Runners to orchestrate a high-availability PostgreSQLAfter=syslog.target network.target[Service]Type=simpleUser=postgresGroup=postgresExecStart=/usr/local/bin/patroni /etc/patroni.ymlKillMode=processTimeoutSec=30Restart=no[Install]WantedBy=multi-user.targetThat's everything — start the cluster:
systemctl daemon-reloadsystemctl enable patroni # so Patroni starts on bootsystemctl enable postgresqlsystemctl start patronisystemctl start postgresqlPostgreSQL cluster's up. Last step: HAProxy config so we have a single endpoint for the database.
Configuring HAProxy
On the HAProxy node, edit the config. I'm using port 5000 for the database endpoint — pick whatever you want.
vim /etc/haproxy/haproxy.cfgglobal maxconn 100defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5slisten stats mode http bind *:7000 stats enable stats uri /listen postgres bind *:5000 option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server PSQL01 172.16.16.101:5432 maxconn 100 check port 8008 server PSQL02 172.16.16.102:5432 maxconn 100 check port 8008Save it and restart HAProxy:
systemctl restart haproxyOpen http://haproxy_ip:7000 in a browser to see the HAProxy stats page and the live cluster.
Both cluster nodes are listed. PSQL02 showing as down on the stats page is expected — that's how HAProxy reports replicas in this master/slave layout, not an actual failure.
That's it. haproxy_ip:5000 is now your database endpoint. You can also check the cluster from the CLI with patronictl:
patronictl -c /etc/patroni.yml listOutput:
+------------+---------------+---------+---------+----+-----------+| Member | Host | Role | State | TL | Lag in MB |+ Cluster: prodcluster (7138377709122875960) ----+----+-----------+| prodpsql01 | 172.16.16.101 | Leader | running | 20 | || prodpsql02 | 172.16.16.102 | Replica | running | 20 | 0 |+------------+---------------+---------+---------+----+-----------+And that's a 4-node PostgreSQL HA cluster, ready to use. 🙂