Cassandra Performance : Less rows with more columns vs more rows with less columns

895 views Asked by At

We are evaluating if we can migrate from SQL SERVER to cassandra for OLAP. As per the internal storage structure we can have wide rows. We almost need to access data by the date. We often need to access data within date range as we have financial data. If we use date as Partition key to support filter by date,we end up having less row with huge number of columns. Will it hamper performance if we have millions of columns for a single row key in future as we process millions of transactions every day.

Do we need to have some changes in the access pattern to have more rows with less number of columns per row.

Need some performance insight to proceed in either direction

1

There are 1 answers

4
medvekoma On

Using wide rows is typically fine with Cassandra, there are however a few things to consider:

  • Ensure that you don't reach the 2 billion column limit in any case
  • The whole wide row is stored on the same node: it needs to fit on the disk. Also, if you have some dates that are accessed more frequently then other dates (e.g. today) then you can create hotspots on the node that stores the data for that day.
  • Very wide rows can affect performance however: Aaron Morton from The Last Pickle has an interesting article about this: http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html It is somewhat old, but I believe that the concepts are still valid.

For a good table design decision one needs to know all typical filter conditions. If you have any other fields you typically filter for as an exact match, you could add them to the partition key as well.