Introduction

This tutorial will teach you how to configure a Multi-Node cluster with Cassandra on a VPS. Cassandra, a highly scalable open source database system that achieves great performance when setup with multiple-nodes – even on different data centers.

Installing Cassandra on Each Node

cassandra illustration for: Installing Cassandra on Each Node

Before we begin configuring each node, you need to have Cassandra installed in every one of them. We have an easy tutorial on how to do that with VPS. After you’ve installed Cassandra on every node, you need to make sure it isn’t running. To close Cassandra, type in:

				
					sudo ps auwx | grep cassandra
				
			

If a process different from the “grep” one appears, copy the proccess ID and kill it:

				
					sudo kill -9 PID
				
			

<img src="images/how-to-configure-a-multi-node-cluster-with-cassandra-on-a-ubuntu-vps-section-1.png; alt="The highlited number is the PID" />

<img src="images/how-to-configure-a-multi-node-cluster-with-cassandra-on-a-ubuntu-vps-section-1.png; alt="How to kill the proccess" />

You’ll also need to clear data. Do so by running:

				
					sudo rm -rf /var/lib/cassandra/*
				
			

Configuring Cassandra

To configure Cassandra for multiple nodes, you’ll need to know beforehand how many nodes you’re going to use, and calculate token numbers for each. We’ve developed a tool to do this, and you can get it db.tt. Simply write the number of nodes you’re dealing with and you’ll have tokens for each node. For example, if you have three nodes, you’d have these numbers:

				
					Node 0: 0

Node 1: 3074457345618258602

Node 2: 6148914691236517205
				
			

Now you’ll need to edit your configuration file for each node. To do so, open the nano text editor by running:

				
					nano ~/cassandra/conf/cassandra.yaml
				
			

The information you’ll need to edit can be the same for all nodes (cluster_name, seed_provider, rpc_address and endpoint_snitch) or different for each one (initial_token and listen_address). Choose a node to be your seed one, and look in the configuration file for the lines that refer to each of these attributes, and modify them to your needs:

				
					cluster_name: 'Name'

initial_token: Token

seed_provider:

    - seeds:  "Seed IP"

listen_address: Droplet's IP

rpc_address: 0.0.0.0

endpoint_snitch: RackInferringSnitch
				
			

Substitute "Name" by your cluster name, "Token" by the number you generated earlier (depending on the node), "Seed IP" by your seed node's IP, and "Droplet's IP" by your droplet's IP address. Do this for each node. Example of this filled on a 3-node setup:

				
					Node 0

cluster_name: 'Mythe cloud providerCluster'

initial_token: 0

seed_provider:

    - seeds:  "198.211.xxx.0"

listen_address: 198.211.xxx.0

rpc_address: 0.0.0.0

endpoint_snitch: RackInferringSnitch

Node 1

cluster_name: 'Mythe cloud providerCluster'

initial_token: 3074457345618258602

seed_provider:

    - seeds:  "198.211.xxx.0"

listen_address: 192.241.xxx.0

rpc_address: 0.0.0.0

endpoint_snitch: RackInferringSnitch

Node 2

cluster_name: 'Mythe cloud providerCluster'

initial_token: 6148914691236517205

seed_provider:

    - seeds:  "198.211.xxx.0"

listen_address: 37.139.xxx.0

rpc_address: 0.0.0.0

endpoint_snitch: RackInferringSnitch
				
			

To run, simply type in:

				
					sudo sh ~/cassandra/bin/cassandra
				
			

on the seed node and when it’s finished, replicate this process on the other nodes. If you don’t see any errors, your multi-node Cassandra setup should be successfully deployed.