Datastax Cassandra repair service weird estimation and heavy load

428 views Asked by At

I have a 5 node cluster with around 1TB of data. Vnodes enabled. Ops Center version 5.12 and DSE 4.6.7. I would like to do a full repair within 10 days and use the repair service in Ops Center so that i don't put unnecessary load on the cluster.

The problem that I'm facing is that repair service puts to much load and is working too fast. It progress is around 30% (according to Ops Center) in 24h. I even tried to change it to 40 days without any difference.

Questions,

  1. Can i trust the percent-complete number in OpsCenter?
  2. The suggested number is something like 0.000006 days. Could that guess be related to the problem?
  3. Are there any settings/tweaks that could be useful to lower the load?
1

There are 1 answers

6
Faern On
  1. You can use OpsCenter as a guideline about where data is stored and what's going on in the cluster, but it's really more of a dashboard. The real 'tale of the tape' comes from 'nodetool' via command line on server nodes such as

#shell> nodetool status

Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.xxx.xxx.xx 43.95 GB 256 33.3% b1e56789-8a5f-48b0-9b76-e0ed451754d4 RAC1

What type of compaction are you using?

You've asked a sort of 'magic bullet' question, as there could be several factors in play. These are examples but not limited to:

A. Size of data, and the whole rows in Cassandra (you can see these with nodetool cf_stats table_size entries). Rows that result in a binary size of larger than 16M will be seen as "ultra" wide rows, which might be an indicator your schema in your data model needs a 'compound' or 'composite' row key.

B. Type of setup you have with respects to replication and network strategy.

C. Data entry point, how Cassandra gets it's data. Are you using Python? PHP? What inputs the data? You can get funky behavior from a cluster with a bad PHP driver (for example)

D. Vnodes are good, but can be bad. What version of Cassandra are you running? You can find out via CQLSH with cqlsh -3 then type 'show version'

E. Type of compaction is a big killer. Are you using SizeTieredCompaction or LevelCompaction?

Start by running 'nodetool cfstats' from command line on the server any given node is running on. The particular areas of interest would be (at this point)

Compacted row minimum size: Compacted row maximum size:

More than X amount of bytes in size here on systems with Y amount of RAM can be a significant problem. Be sure Cassandra has enough RAM and that the stack is tuned.

The default configuration for performance on Cassandra should normally be enough, so the next step would be to open a CQLSH interface to the node with 'cqlsh -3 hostname' and issue the command 'describe keyspaces'. Take the known key space name you are running and issue 'describe keyspace FOO' and look at your schema. Of particular interest are your primary keys. Are you using "composite rowkeys" or "composite primary key"? (as described here: http://www.datastax.com/dev/blog/whats-new-in-cql-3-0 )If not, you probably need to depending on read/write load expected.

Also check how your initial application layer is inserting data into Cassandra? Using PHP? Python? What drivers are being used? There are significant bugs in Cassandra versions < 1.2.10 using certain Thrift connectors such as the Java driver or the PHPcassa driver so you might need to upgrade Cassandra and make some driver changes.

In addition to these steps also consider how your nodes were created. Note that migration from static nodes to virtual nodes (or vnodes) has to be mitigated. You can't simply switch configs on a node that's already been populated. You will want to check your initial_token: settings in /etc/cassandra/cassandra.yaml. The questions I ask myself here are "what initial tokens are set? (no initial tokens for vnodes) were the tokens changed after the data was populated?" For static nodes which I typically run, I calculate them using a tool like: [http://www.geroba.com/cassandra/cassandra-token-calculator/] as I've run into complications with vnodes (though they are much more reliable now than before).