Apache Cassandra Node Driver Connection

Question

48 views Asked by Suprit Gandhi At 31 March 2024 at 06:58

I am using node server for the backend. Connection to Cassandra is done using cassandra-driver nodejs.

Connection is done as follows:

const client = new cassandra.Client({
  contactPoints: ['h1', 'h2'],
  localDataCenter: 'datacenter1',
  keyspace: 'ks1'
});

In contactPoints, do I just need to add 'seed' nodes or can I can add any nodes from the datacenter?
Do I need to run separate backend service for each datacenter? Or is there a way to connect multiple datacenter from the same nodejs backend service?
Any recommended way for setting backend server such that bandwidth can be minimized between Cassandra nodes and backend server? Should backend server run on the same machine where one of the Cassandra node is running so that data won't need to travel between multiple machines? Or is it fine if backend server runs on a completely separate machine than Cassandra node? Here, for example, if AWS EC2 is used, then data transfer charges might increase due to data flow between Cassandra node and backend server.

There are 1 answers

**Madhavan** · Answer 1 · 2024-03-31T11:51:33+00:00

Yes, any node is fine as the driver immediately knows the entire topology of the cluster as soon as it connects to the "contact points" to make the initial handshake
Connecting to multiple-datacenters via the same client is always a bad idea. The setup should be to connect the local region/datacenter based application microservices to the same C* region/datacenter for locality purposes. Anyways, during a failover time assuming an entire cloud region goes for a toss, the application services will also be down along with the cluster's datacenter. So, the failover will happen at a layer above these, like a load balancer that will route the traffic to the appropriate region services (app + db). Also, this is the reason we are providing the local datacenter when creating a client. See the below image for a reference (courtesy of DataStax)
The below should never be the deployment topology. Application servers (backend servers as you're stating) should be deployed SEPERATELY in the same cloud provider and same region so as to minimize the cross region data transfer (benefiting charges and offers the lowest possible transfer times, i.e. latency). The right setup should only be having C* processes to be running on its own machine. No other process(es) should be running on that machine to give C* process the hardware power that it needs.

Should backend server run on the same machine where one of the Cassandra node is running so that data won't need to travel between multiple machines?