Tuesday, November 17, 2015

Adding node to an existing Cassandra instance to form a cluster


Assuming you already had a single node Casandra running with data in it and now you are planning to add another node or multiple nodes to form a cluster, following is what needs to be done:

Instance 1 : 192.168.13.156 - Has data in data directory and already running cassandra

Shutdown cassandra -> kill <pid>
Make the following changes to cassandra.yaml and start Cassandra

cluster_name: 'Test Cluster'
seeds: "192.168.13.156"
listen_address: 192.168.13.156
rpc_address: 0.0.0.0
rpc_port: 9160
uncomment the following i.e. remove # -
# broadcast_rpc_address: 1.2.3.4
start cassandra

Instance 2: 192.168.104.29 - Fresh instance
Shutdown cassandra -> kill <pid>
Delete the data dir -> rm -rf data
The data will be synch'd from the seed on startup

Make the following changes to cassandra.yaml and start Cassandra

cluster_name: 'Test Cluster'
seeds: "192.168.13.156"
listen_address: 192.168.104.29
rpc_address: 0.0.0.0
rpc_port: 9160
uncomment the following i.e. remove # -
# broadcast_rpc_address: 1.2.3.4

start cassandra

run nodetool status to verify the cluster

Points to remember:

  1. Deleting the data dir on the new node ensures fresh data is synchedup without interference of the local data on the node
  2. Don't add the ips of one node as seed to another and vice-versa. It might result in loss of data. You should ensure the node which has all the data acts as seed to others, i.e. till the cluster is balanced.