How much impact does the number of Cassandra seed nodes have on network traffic?

378 views Asked by At

I am basically wondering how big of an impact the number of seed nodes plays in network traffic.

I have a 16-node cluster with 3 seed nodes and I am trying to keep the gossip-traffic as less as possible in order to minimize the general network traffic. So, a seed-node receives more gossip traffic than a non-seed node.

Does this practically means that the more seed nodes I have the less gossip-traffic I have? I guess by having more than 3 seed-nodes, the gossiping would be distributed more and hence the traffic towards one seed-node would decrease. Is that correct? Would 4 or 5 be better?

2

There are 2 answers

0
Erick Ramirez On BEST ANSWER

Gossip traffic is minimal on a cluster compared to traffic going through the CDN. And since it is on a private network only used by nodes to talk to each other, it isn't something to be concerned with.

To be clear, gossip isn't part of "general traffic" -- it's separate from client requests (reads and writes) which are on the public IP.

The general recommendation is to specify at least two nodes in each DC as seed nodes so that if one is unavailable or unresponsive, another node in the local DC is available. In the worst case, another node in a remote DC will need to be contacted by new nodes joining the cluster. For larger clusters (around 50-100 nodes), three nodes from each DC is usually sufficient.

From a gossip perspective, more seed nodes is not always better. Every second, a node will gossip with up to 3 nodes in a cluster:

  1. gossip with a random live node
  2. gossip with a dead node to check if it's back online
  3. gossip with a random seed node if (1) was not a seed

Cassandra will always try to gossip with a seed node so it can reach gossip convergence faster. By "convergence" I mean nodes learn about the state of other nodes much quicker.

To use an analogy, imagine a street of 10 houses where neighbours gossip with each other. If a person only gossiped with one other random person per day, it will take several days for news to reach everyone on the street.

If on the other hand each person gossiped with one random person AND the person in house #1, news will spread much quicker because house #1 knows everything and will pass the gossip on to everyone else on the same day.

If you had more seed nodes in your cluster, the same thing would happen -- it will take longer for nodes to learn about topology changes (new nodes, decommissions) and unavailable nodes. For this reason, we recommend sticking with the recommended two nodes per DC unless you have a large cluster.

If you'd like more information on how gossip works, see the following sources:

0
Manish Khandelwal On

Seed Nodes are just initial contact point to help the new node to join the cluster. Once a node joins the cluster, seed node information is not used anymore. The idea is to give three node as seed node for better availability reason. One seed node solves the same purpose and number of seed nodes has no impact on performance of the cluster.