Sorry, this will take a bit to explain... We're testing the performance of Cassandra using YCSB. We have a 3-node setup and a 9-node setup. The 3-node setup is pretty simple: replication=1 (no copies).
Our 9 node setup contains 3 data centers (3 nodes per data center). In the 9 node setup, we also kept replication=1 because we understand that Cassandra's default NetworkTopologyStrategy is going to automatically replicate across data centers. That effectivly gives us a copy of the data at each data center which is great because we want to test this.
Our read-only test against the 9 node setup uses the DCAwareRoundRobinPolicy to query against the "local" data center only. So, we are querying against just 3 of the 9 nodes and were expecting similar results to our simple 3-node setup. In fact we'd expect the results to be a little worse because of cassandra's read repair messages and also because we are using a QUORUM read consistency.
However, we found the opposite. Our read-only test on the 3 node simple setup performance was a little worse than our more complex 3 data center/9 node setup.
Data loaded on both clusters are the same. Read-only tests were run with varying thread counts and we noticed larger disparity with more threads. The 9-node setup got better with more threads, which should not have been the case because we verified that only the 3 nodes we connected to in our "local" data center are receiving queries.
So, why are reads faster in the more complex setup when we are still hitting the same number of nodes (3)? Our write-only test did not exhibit this behaviour.
Thanks in advance!