This is a followup question to this one: Why is my cassandra throughput not improving when I add nodes?
My schema currently looks like this (the blobs are roughly all the same size, about 140 bytes):
create keyspace nms WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
use nms;
hour timestamp,
qos int,
id int,
ts timestamp,
tz int,
data blob,
PRIMARY KEY ((hour, qos), id, ts));
In both scenarios, I have a single node. Other than the obvious IP address and storage locations, the Apache C* 2.1.5 config is out of the box.
When I run the client and single node in separate hosts, I get roughly 55K inserts/s. The cfhistograms output looks roughly like this:
nms/qos histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 86.00 0.00 42510 535
75% 0.00 124.00 0.00 42510 642
95% 0.00 179.00 0.00 61214 1109
98% 0.00 215.00 0.00 61214 1109
99% 0.00 258.00 0.00 61214 1109
Min 0.00 4.00 0.00 150 3
Max 0.00 61214.00 0.00 61214 1109
When I run the client on the same host as the single node, I get roughly 90K inserts/s. A histogram snapshot looks like this (pretty much the same above):
nms/qos histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 86.00 0.00 42510 535
75% 0.00 103.00 0.00 42510 642
95% 0.00 179.00 0.00 61214 1109
98% 0.00 310.00 0.00 61214 1109
99% 0.00 535.00 0.00 61214 1109
Min 0.00 3.00 0.00 150 3
Max 0.00 126934.00 0.00 61214 1109
Why the big difference in insertion rates? I would have thought the rates would be equivalent, or better in the split setup?
BTW, I see this odd behavior with all the permutations of hardware that I have available to me, so there is more to it than client horsepower.
Marc B, you are correct. If you see this and would like to post your comment as an answer, I will give you credit for it.
In more detail, what was happening is that while my connection to the network was 1G, I was going through an unexpected 100Mb router somewhere. Once I realized this and ensured all the moving parts were in the same 1G network, my rates jumped to 180K inserts/s.
In case someone cares, the Linux command to check you interface speed is
The tool to test the speed between boxes is iperf.