Home

Setting up an Apache Cassandra cluster on Ubuntu 22.04

What is Apache Cassandra?

Apache Cassandra is a distributed NoSQL database built around horizontal scalability and no single point of failure. It's a great fit for write-heavy workloads, time-series data, and anything where you'd rather throw nodes at a problem than manually shard a relational database. This post is a 3-node cluster setup on Ubuntu 22.04 with Cassandra 4.1.

What you'll need

bash
sudo apt-get install openjdk-11-jdkjava -version
bash
sudo add-apt-repository ppa:deadsnakes/ppasudo apt-get updatesudo apt-get install python3.6python --version

Installing Cassandra (single-node first)

Each box gets installed standalone first, then we wire them into a cluster. Add the apt repo:

bash
echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

Trust the Apache signing key:

bash
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -

Refresh and install:

bash
sudo apt-get updatesudo apt-get install cassandra

Cassandra is now running as a single node. Confirm:

bash
nodetool status
bash
Datacenter: 168===============Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--  Address          Load       Tokens       Owns (effective)  Host ID                               RackUN  192.168.159.129

Don't worry about the UN — it stands for Up / Normal, not "unavailable".

cqlsh ships with the install. Drop into the shell against localhost:

bash
cqlshConnected to Test Cluster at 127.0.0.1:9042.cqlsh> exit;

To connect over the network with auth:

bash
cqlsh 10.90.214.187 9042 -u cassandra -p cassandra

Repeat the install on the other two servers before continuing.

Joining the nodes into a cluster

Edit /etc/cassandra/cassandra.yaml on each node. The minimum changes:

seed_provider

Add the IPs of every node to the seeds list:

yaml
seed_provider:  - class_name: org.apache.cassandra.locator.SimpleSeedProvider    parameters:      - seeds: "192.168.159.129,192.168.159.130,192.168.159.131"

listen_address

The local IP of the node:

yaml
listen_address: 192.168.159.129

endpoint_snitch

yaml
endpoint_snitch: RackInferringSnitch

Restart Cassandra on every node:

bash
service cassandra restart

The nodes will gossip and discover each other. Verify:

bash
nodetool status

Quick replication test

Connect to any node with cqlsh and create a keyspace with replication factor 3 (one replica per node):

sql
CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};USE foo;CREATE TABLE test (sno int PRIMARY KEY, pname text);INSERT INTO test (sno, pname) VALUES (1, 'mert');INSERT INTO test (sno, pname) VALUES (2, 'mert');SELECT * FROM test; sno | pname-----+-------   1 |  mert   2 |  mert

Confirm the row landed on every node:

bash
nodetool -h localhost getendpoints foo test 1192.168.159.131192.168.159.129192.168.159.130

That's a 3-node Cassandra cluster up and replicating.

A few notes worth knowing

Consistency level

The minimum number of nodes that must acknowledge a read or write for it to be considered successful. Default is ONE. For stronger guarantees, QUORUM requires (N/2 + 1) nodes:

sql
cqlsh> consistency quorum;Consistency level set to QUORUM.

More on consistency in the DataStax docs.

Datacenter / rack mismatch on restart

If Cassandra refuses to start with cannot start node if snitch's data center differs from previous data center, add these flags to cassandra-env.sh:

bash
sudo vim /etc/cassandra/cassandra-env.sh
properties
JVM_OPTS="$JVM_OPTS -Dcassandra.ignore_dc=true"JVM_OPTS="$JVM_OPTS -Dcassandra.ignore_rack=true"

This is fine for a one-off recovery, but figure out why the snitch's view changed before leaving these flags in place permanently.