Locating cassandra partition node

583 views Asked by At

I'm using Cassandra database with datastax driver. I need to do batch read from Cassandra of something to the order of 2000 rows. My use case is like, I get the list of ids in my request and those ids are my partitioning keys in Cassandra. I want to know if it's a good idea to spawn 2000 threads and get data from Cassandra in parallel (in that case reading the data will efficient as it goes to just one node) or is it possible to figure out a way to group ids which live in same node so that I can optimize the reads(now in this case I need to spawn much less threads and less overhead on Cassandra). Please let me know can I achieve batch read in an efficient way apart from spawning multiple threads. Thanks! PS: Data coming back from Cassandra is not that huge to cause OOM.

1

There are 1 answers

2
Mikhail Baksheev On BEST ANSWER

is it possible to figure out a way to group ids which live in same node

Yes it is, you can get Token Ranges for cassandra cluster and check occurrence for tokens for you ids in the ranges, and then group ids by nodes.

In additional:

There is no need to spawn many threads, datastax driver provides asynchronous api, we use it in our project to perform a lot of queries in parallel and it works enough good, but not excellent from performance point of view.

Necessity to perform thousands requests to read data indicates unsuitable data model. You should implement data model around queries to minimize number of request to have good performance.

Updated:

I suppose, you can use method Metadata.newToken to calculate token on driver side or directly get replicas with Metadata.getReplicas for a given partition key. But before it serialize the partition key according to its type and protocol version