Setting up an Apache Cassandra cluster on Ubuntu 22.04
What is Apache Cassandra?
Apache Cassandra is a distributed NoSQL database built around horizontal scalability and no single point of failure. It's a great fit for write-heavy workloads, time-series data, and anything where you'd rather throw nodes at a problem than manually shard a relational database. This post is a 3-node cluster setup on Ubuntu 22.04 with Cassandra 4.1.
What you'll need
- 3 Ubuntu 22.04 servers
- Java — the supported version depends on the Cassandra version. For 4.1, OpenJDK 8 or 11 latest both work:
sudo apt-get install openjdk-11-jdkjava -version- Python 3.6+ (or 2.7) for
cqlsh, the CQL shell:
sudo add-apt-repository ppa:deadsnakes/ppasudo apt-get updatesudo apt-get install python3.6python --version- Network connectivity between the nodes. Cassandra's CQL port is 9042 by default; gossip uses 7000 (or 7001 with TLS). Open these between the nodes.
Installing Cassandra (single-node first)
Each box gets installed standalone first, then we wire them into a cluster. Add the apt repo:
echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.listTrust the Apache signing key:
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -Refresh and install:
sudo apt-get updatesudo apt-get install cassandraCassandra is now running as a single node. Confirm:
nodetool statusDatacenter: 168===============Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns (effective) Host ID RackUN 192.168.159.129Don't worry about the
UN— it stands for Up / Normal, not "unavailable".
cqlsh ships with the install. Drop into the shell against localhost:
cqlshConnected to Test Cluster at 127.0.0.1:9042.cqlsh> exit;To connect over the network with auth:
cqlsh 10.90.214.187 9042 -u cassandra -p cassandraRepeat the install on the other two servers before continuing.
Joining the nodes into a cluster
Edit /etc/cassandra/cassandra.yaml on each node. The minimum changes:
seed_provider
Add the IPs of every node to the seeds list:
seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "192.168.159.129,192.168.159.130,192.168.159.131"listen_address
The local IP of the node:
listen_address: 192.168.159.129endpoint_snitch
endpoint_snitch: RackInferringSnitchRestart Cassandra on every node:
service cassandra restartThe nodes will gossip and discover each other. Verify:
nodetool statusQuick replication test
Connect to any node with cqlsh and create a keyspace with replication factor 3 (one replica per node):
CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};USE foo;CREATE TABLE test (sno int PRIMARY KEY, pname text);INSERT INTO test (sno, pname) VALUES (1, 'mert');INSERT INTO test (sno, pname) VALUES (2, 'mert');SELECT * FROM test; sno | pname-----+------- 1 | mert 2 | mertConfirm the row landed on every node:
nodetool -h localhost getendpoints foo test 1192.168.159.131192.168.159.129192.168.159.130That's a 3-node Cassandra cluster up and replicating.
A few notes worth knowing
Consistency level
The minimum number of nodes that must acknowledge a read or write for it to be considered successful. Default is ONE. For stronger guarantees, QUORUM requires (N/2 + 1) nodes:
cqlsh> consistency quorum;Consistency level set to QUORUM.More on consistency in the DataStax docs.
Datacenter / rack mismatch on restart
If Cassandra refuses to start with cannot start node if snitch's data center differs from previous data center, add these flags to cassandra-env.sh:
sudo vim /etc/cassandra/cassandra-env.shJVM_OPTS="$JVM_OPTS -Dcassandra.ignore_dc=true"JVM_OPTS="$JVM_OPTS -Dcassandra.ignore_rack=true"This is fine for a one-off recovery, but figure out why the snitch's view changed before leaving these flags in place permanently.