This is a followup question to this one: Why is my cassandra throughput not improving when I add nodes?

My schema currently looks like this (the blobs are roughly all the same size, about 140 bytes):

create keyspace nms WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
use nms;
CREATE TABLE qos(
                hour timestamp,
                qos int,
                id int,
                ts timestamp,
                tz int,
                data blob,
              PRIMARY KEY ((hour, qos), id, ts));

In both scenarios, I have a single node. Other than the obvious IP address and storage locations, the Apache C* 2.1.5 config is out of the box.

When I run the client and single node in separate hosts, I get roughly 55K inserts/s. The cfhistograms output looks roughly like this:

nms/qos histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             0.00             86.00              0.00             42510               535
75%             0.00            124.00              0.00             42510               642
95%             0.00            179.00              0.00             61214              1109
98%             0.00            215.00              0.00             61214              1109
99%             0.00            258.00              0.00             61214              1109
Min             0.00              4.00              0.00               150                 3
Max             0.00          61214.00              0.00             61214              1109

When I run the client on the same host as the single node, I get roughly 90K inserts/s. A histogram snapshot looks like this (pretty much the same above):

nms/qos histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             0.00             86.00              0.00             42510               535
75%             0.00            103.00              0.00             42510               642
95%             0.00            179.00              0.00             61214              1109
98%             0.00            310.00              0.00             61214              1109
99%             0.00            535.00              0.00             61214              1109
Min             0.00              3.00              0.00               150                 3
Max             0.00         126934.00              0.00             61214              1109

Why the big difference in insertion rates? I would have thought the rates would be equivalent, or better in the split setup?

BTW, I see this odd behavior with all the permutations of hardware that I have available to me, so there is more to it than client horsepower.

1

There are 1 answers

0
Julio Garcia On BEST ANSWER

Marc B, you are correct. If you see this and would like to post your comment as an answer, I will give you credit for it.

In more detail, what was happening is that while my connection to the network was 1G, I was going through an unexpected 100Mb router somewhere. Once I realized this and ensured all the moving parts were in the same 1G network, my rates jumped to 180K inserts/s.

In case someone cares, the Linux command to check you interface speed is

sudo ethtool eth0

The tool to test the speed between boxes is iperf.