I was using astyanax connection pool defined as this:
ipSeeds = "LOAD_BALANCER_HOST:9160";
conPool.setSeeds(ipSeeds)
.setDiscoveryType(NodeDiscoveryType.TOKEN_AWARE)
.setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE);
However, my cluster have 4 nodes and I have 8 client machines connecting on it. LOAD_BALANCER_HOST
forwards requests to one of my four nodes.
On a client node, I have:
$netstat -an | grep 9160 | awk '{print $5}' | sort |uniq -c
235 node1:9160
680 node2:9160
4 node3:9160
4 node4:9160
So although the ConnectionPoolType is TOKEN_AWARE
, my client seems to be connecting mainly to node2, sometimes to node1, but almost never to nodes 3 and 4.
Question is:
Why is this happening? Shouldn't a token aware connection pool query the ring for the node list and connect to all the active nodes using round robin algorithm?
William Price
is totally right: the fact you're using aTokenAwarePolicy
and possibly a defaultPartitioner
means that - first your data will be stored biased across your nodes and - then on querying theLoadbalancingPolicy
makes your driver remember the correct nodes to ask forYou can improve your cluster's performance by using some deviating or may be a custom partitioner to equally distribute your data. To randomly query nodes use either
RoundRobinPolicy
(http://www.datastax.com/doc-source/developer/java-apidocs/com/datastax/driver/core/policies/RoundRobinPolicy.html) orDatacenterAwareRoundRobinPolicy
(http://www.datastax.com/doc-source/developer/java-apidocs/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.html).The latter, of course, needs the definition of data centers in your keyspace.
Without any further information I would suggest to just change the partitioner as a TokenAware load balancing policy is usually a good idea. The main load will end up on these nodes in the end -- the TokenAware policy get's you to the right coordinator just quicker.