2

I am configuring a 6-node Cassandra cluster on AWS EC2, 3 nodes in one region and 3 nodes in another region:

eu-central-1
- node0   cass-db-0   10.10.37.79   eu-central-1a
- node1   cass-db-1   10.10.38.229  eu-central-1b
- node2   cass-db-2   10.10.36.76   eu-central-1a

eu-west-1
- node3   cass-db-0   10.10.37.80   eu-west-1a
- node4   cass-db-1   10.10.39.177  eu-west-1b
- node5   cass-db-2   10.10.37.231  eu-west-1a

I have completed the local configuration in cassandra.yaml.

Now, I need to configure cassandra-rackdc.properties and cassandra-topology.properties but I don't understand the network topology.

Please advise.

2 Answers 2

2

When you are building a cluster, you would typically start with the network topology first. In your case, your choice of 2 regions indicates to me that you would like to have two logical Cassandra DCs each with 3 nodes.

Network topology

For best practice, we recommend configuring your keyspaces with a replication factor (RF) of 3 in each DC. This means that (a) there are 3 copies of the data, and (b) your cluster is configured for high availability.

With RF:3, it would require that you have the equivalent number of logical C* racks in each DC but in your case this is not possible because you only have 2 AZs so the topology design means that you will need to place all nodes in the one logical C* rack.

Snitches

A snitch determines which DCs and racks nodes belong to. There are several snitches to choose from and your choice of snitch will determine which .properties file to configure.

GossipingPropertyFileSnitch (GPFS) automatically updates all nodes using gossip. GPFS is recommended in all cases because it will future-proof your cluster. Unless you have C* expertise and have strong preference for other snitches, it is best practice to stick with GPFS. When using GPFS, you will need to define the node's DC and rack in the cassandra-rackdc.properties file. For details, see GossipingPropertyFileSnitch.

PropertyFileSnitch (PFS) is the precursor to GPFS determines the network topology based on what you've configured in the cassandra-topology.properties file. With PFS, each node has a full list of all nodes in the cluster so when you add/removed nodes, you have to update the cassandra-topology.properties file on every single node (details here). This is tedious which is why users prefer GPFS.

WARNING: If you are not using PropertyFileSnitch, we recommend that you delete the cassandra-topology.properties file on every single node because it's been known to cause intermittent gossip issues as I've documented here -- https://community.datastax.com/questions/4621/.

There are other snitches available (see the docs here) but I won't go through it here since we think GPFS is the right choice in all cases. Cheers!

Sign up to request clarification or add additional context in comments.

5 Comments

I have completed the configuration and checked them but there is something strange I am stuck with, the seeds in eu-central-1 don't replicate with eu-west-1 seeds. I have created keyspace with some tables in eu-central-1 , I can see them from the eu-central-1 seeds but I can't from eu-west-1 seeds .. Is that normal ?
I don't quite understand your question so I think you need to provide examples. Also, since this is really a new question (although related to this post), you should really ask a new SO question. Cheers!
I have 3 seeds in eu-central-1 and 3 seeds in eu-west-1. I have created keyspace from eu-central-1 seed as following: CREATE KEYSPACE my_test WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'eu-west-1':'3', 'eu-central-1':'3'}; after that I have created several tables under this keyspace. should those keyspace and the tables be replicated to eu-west-1 seeds automatically ?
They're not "seeds" but "replicas". 🙂 The answer is yes, the keyspace has 3 replicas in both EU West and Central so should be replicated. But this isn't really related to the original rack configuration so you really need to post a new question, otherwise we'd end up with a really long thread here. 🙂
Thank you for your prompt reply. you right I will add a new question for my new issue.
1

Erick provides some great background here, which should be helpful for you. In terms of getting to a simple solution, I'd recommend this:

  • Make sure you're using the GossipingPropertyFileSnitch in the cassandra.yaml.
  • Delete cassandra-topology.properties.
  • Edit cassandra-rackdc.properties and set dc=eu-west-1 for the 3 the west nodes; likewise dc=eu-central-1 for the central nodes.
  • Leave the rack at the default, as you only have 3 nodes across 2 availability zones (AZs 1a and 1b).

If you were using AZs 1a, 1b, and 1c I'd say to use that for the rack property. Erick mentions defining your keyspaces with a RF of 3, which is solid advice. Typically, you'll want the number of AZs to match your RF for even data distribution and availability, which is why I'd recommend leaving rack at the default value for all.

Likewise, your keyspace definitions would look something like this:

CREATE KEYSPACE keyspace_name WITH REPLICATION = 
    {'class':'NetworkTopologyStrategy',
     'eu-west-1':'3',
     'eu-central-1':'3'};

The main point to consider, is that your data center names must match between the keyspace definition and the entries in the cassandra-rackdc.properties files.

6 Comments

I am configuring the firewall now because when I run the nodetool status it just return with the one node even when I was running the all nodes.
I have completed the configuration and checked them but there is something strange I am stuck with, the seeds in eu-central-1 don't replicate with eu-west-1 seeds. I have created keyspace with some tables in eu-central-1 , I can see them from the eu-central-1 seeds but I can't from eu-west-1 seeds .. Is that normal ?
@Haytham No that's not normal. It sounds to me like there's a schema disagreement (you can verify that by running a nodetool describecluster). If there are multiple schema versions on different nodes, I'd run a rolling-restart of the nodes which don't have the most-recent schema.
please find the following: [root@ip-10-201-37-79 conf]# nodetool describecluster Cluster Information: Name: CassPoC Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch DynamicEndPointSnitch: enabled Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: e4959b38-ab9d-30fe-9fe7-ade8da47d486: [10.201.36.76, 10.201.37.79, 10.201.38.229]
in the seeds parameter, I have added the all IPs : - seeds: "10.201.37.79,10.201.38.229,10.201.36.76,10.200.37.80,10.200.39.177,10.200.37.231"
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.