Cassandra Table needs a lot of Storage space

114 views Asked by At

I just stored a 3,1 GB CSV via Spark-Cassandra-Connector to a Table in a Cassandra Cluster (5 Nodes, 30 GB each, 7.5 GB RAM each instance, cassandra uses ~1.8 GB of that).

I jst saw via DataOpsCenter, that my Cluster holds 16 GB of data (each node ~3.x GB) and my storage usage has grown from 14 GB (before) to 64 GB (after the writing process)!!!

My Keystore has following settings:

replica_placement_strategy  org.apache.cassandra.locator.SimpleStrategy
replication_factor  2

CREATE TABLE debs.energydata10m (
  id int PRIMARY KEY,
  house_id int,
  household_id int,
  plug_id int,
  ts timestamp,
  type int,
  val float
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.000000 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Why does Cassandra need that much storage for this 3.1 GB CSV?

Edit: Here is the output of the ls -lR /var/lib/cassandra/data/debs/ command:

ubuntu@ip-xx-xx-xx-xx:~$ ls -lR /var/lib/cassandra/data/debs/
/var/lib/cassandra/data/debs/:
total 24
drwxr-xr-x 2 cassandra cassandra     6 Jun 16 12:43 energydata1000m-52502e00142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 16384 Jun 17 13:39 energydata100m-4cb23100142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra     6 Jun 17 08:41 energydata10m-46487f90142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra  4096 Jun 17 10:58 energydata10m-f17f204014d811e5b5ddabd6d8b6d1d3
drwxr-xr-x 3 cassandra cassandra    22 Jun 17 10:07 energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra     6 Jun 16 12:40 energydata-d615ace0141d11e5b5ddabd6d8b6d1d3

/var/lib/cassandra/data/debs/energydata1000m-52502e00142511e5b5ddabd6d8b6d1d3:
total 0

/var/lib/cassandra/data/debs/energydata100m-4cb23100142511e5b5ddabd6d8b6d1d3:
total 3294336
-rw-r--r-- 1 cassandra cassandra     361779 Jun 17 12:36 debs-energydata100m-ka-187-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra  943405306 Jun 17 12:36 debs-energydata100m-ka-187-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 12:36 debs-energydata100m-ka-187-Digest.sha1
-rw-r--r-- 1 cassandra cassandra   17615016 Jun 17 12:36 debs-energydata100m-ka-187-Filter.db
-rw-r--r-- 1 cassandra cassandra  254001924 Jun 17 12:36 debs-energydata100m-ka-187-Index.db
-rw-r--r-- 1 cassandra cassandra       9911 Jun 17 12:36 debs-energydata100m-ka-187-Statistics.db
-rw-r--r-- 1 cassandra cassandra    1763968 Jun 17 12:36 debs-energydata100m-ka-187-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 12:36 debs-energydata100m-ka-187-TOC.txt
-rw-r--r-- 1 cassandra cassandra      46747 Jun 17 12:25 debs-energydata100m-ka-211-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra  120719760 Jun 17 12:25 debs-energydata100m-ka-211-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 12:25 debs-energydata100m-ka-211-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    2266552 Jun 17 12:25 debs-energydata100m-ka-211-Filter.db
-rw-r--r-- 1 cassandra cassandra   32799168 Jun 17 12:25 debs-energydata100m-ka-211-Index.db
-rw-r--r-- 1 cassandra cassandra       9955 Jun 17 12:25 debs-energydata100m-ka-211-Statistics.db
-rw-r--r-- 1 cassandra cassandra     227840 Jun 17 12:25 debs-energydata100m-ka-211-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 12:25 debs-energydata100m-ka-211-TOC.txt
-rw-r--r-- 1 cassandra cassandra     400275 Jun 17 13:39 debs-energydata100m-ka-353-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 1053658168 Jun 17 13:39 debs-energydata100m-ka-353-Data.db
-rw-r--r-- 1 cassandra cassandra          9 Jun 17 13:39 debs-energydata100m-ka-353-Digest.sha1
-rw-r--r-- 1 cassandra cassandra   19254504 Jun 17 13:39 debs-energydata100m-ka-353-Filter.db
-rw-r--r-- 1 cassandra cassandra  281034756 Jun 17 13:39 debs-energydata100m-ka-353-Index.db
-rw-r--r-- 1 cassandra cassandra       9911 Jun 17 13:39 debs-energydata100m-ka-353-Statistics.db
-rw-r--r-- 1 cassandra cassandra    1951696 Jun 17 13:39 debs-energydata100m-ka-353-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:39 debs-energydata100m-ka-353-TOC.txt
-rw-r--r-- 1 cassandra cassandra     106147 Jun 17 13:32 debs-energydata100m-ka-377-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra  275239666 Jun 17 13:32 debs-energydata100m-ka-377-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:32 debs-energydata100m-ka-377-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    5209632 Jun 17 13:32 debs-energydata100m-ka-377-Filter.db
-rw-r--r-- 1 cassandra cassandra   74503386 Jun 17 13:32 debs-energydata100m-ka-377-Index.db
-rw-r--r-- 1 cassandra cassandra       9935 Jun 17 13:32 debs-energydata100m-ka-377-Statistics.db
-rw-r--r-- 1 cassandra cassandra     517456 Jun 17 13:32 debs-energydata100m-ka-377-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:32 debs-energydata100m-ka-377-TOC.txt
-rw-r--r-- 1 cassandra cassandra      63267 Jun 17 13:36 debs-energydata100m-ka-392-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra  163610575 Jun 17 13:36 debs-energydata100m-ka-392-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:36 debs-energydata100m-ka-392-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    3146928 Jun 17 13:36 debs-energydata100m-ka-392-Filter.db
-rw-r--r-- 1 cassandra cassandra   44398512 Jun 17 13:36 debs-energydata100m-ka-392-Index.db
-rw-r--r-- 1 cassandra cassandra       9971 Jun 17 13:36 debs-energydata100m-ka-392-Statistics.db
-rw-r--r-- 1 cassandra cassandra     308400 Jun 17 13:36 debs-energydata100m-ka-392-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:36 debs-energydata100m-ka-392-TOC.txt
-rw-r--r-- 1 cassandra cassandra      16475 Jun 17 13:37 debs-energydata100m-ka-398-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   42447012 Jun 17 13:37 debs-energydata100m-ka-398-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:37 debs-energydata100m-ka-398-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     819112 Jun 17 13:37 debs-energydata100m-ka-398-Filter.db
-rw-r--r-- 1 cassandra cassandra   11540160 Jun 17 13:37 debs-energydata100m-ka-398-Index.db
-rw-r--r-- 1 cassandra cassandra       9915 Jun 17 13:37 debs-energydata100m-ka-398-Statistics.db
-rw-r--r-- 1 cassandra cassandra      80208 Jun 17 13:37 debs-energydata100m-ka-398-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:37 debs-energydata100m-ka-398-TOC.txt
-rw-r--r-- 1 cassandra cassandra       3307 Jun 17 13:37 debs-energydata100m-ka-399-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra    8375321 Jun 17 13:37 debs-energydata100m-ka-399-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:37 debs-energydata100m-ka-399-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     159248 Jun 17 13:37 debs-energydata100m-ka-399-Filter.db
-rw-r--r-- 1 cassandra cassandra    2292966 Jun 17 13:37 debs-energydata100m-ka-399-Index.db
-rw-r--r-- 1 cassandra cassandra       9895 Jun 17 13:37 debs-energydata100m-ka-399-Statistics.db
-rw-r--r-- 1 cassandra cassandra      16000 Jun 17 13:37 debs-energydata100m-ka-399-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:37 debs-energydata100m-ka-399-TOC.txt
-rw-r--r-- 1 cassandra cassandra       3299 Jun 17 13:39 debs-energydata100m-ka-400-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra    8332947 Jun 17 13:39 debs-energydata100m-ka-400-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:39 debs-energydata100m-ka-400-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     159088 Jun 17 13:39 debs-energydata100m-ka-400-Filter.db
-rw-r--r-- 1 cassandra cassandra    2290716 Jun 17 13:39 debs-energydata100m-ka-400-Index.db
-rw-r--r-- 1 cassandra cassandra       9895 Jun 17 13:39 debs-energydata100m-ka-400-Statistics.db
-rw-r--r-- 1 cassandra cassandra      15984 Jun 17 13:39 debs-energydata100m-ka-400-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:39 debs-energydata100m-ka-400-TOC.txt

/var/lib/cassandra/data/debs/energydata10m-46487f90142511e5b5ddabd6d8b6d1d3:
total 0

/var/lib/cassandra/data/debs/energydata10m-f17f204014d811e5b5ddabd6d8b6d1d3:
total 326684
-rw-r--r-- 1 cassandra cassandra     95051 Jun 17 10:30 debs-energydata10m-ka-37-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 245687780 Jun 17 10:30 debs-energydata10m-ka-37-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 10:30 debs-energydata10m-ka-37-Digest.sha1
-rw-r--r-- 1 cassandra cassandra   4617168 Jun 17 10:30 debs-energydata10m-ka-37-Filter.db
-rw-r--r-- 1 cassandra cassandra  66716856 Jun 17 10:30 debs-energydata10m-ka-37-Index.db
-rw-r--r-- 1 cassandra cassandra      9923 Jun 17 10:30 debs-energydata10m-ka-37-Statistics.db
-rw-r--r-- 1 cassandra cassandra    463376 Jun 17 10:30 debs-energydata10m-ka-37-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 10:30 debs-energydata10m-ka-37-TOC.txt
-rw-r--r-- 1 cassandra cassandra      3379 Jun 17 10:28 debs-energydata10m-ka-38-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   8505046 Jun 17 10:28 debs-energydata10m-ka-38-Data.db
-rw-r--r-- 1 cassandra cassandra         9 Jun 17 10:28 debs-energydata10m-ka-38-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    162984 Jun 17 10:28 debs-energydata10m-ka-38-Filter.db
-rw-r--r-- 1 cassandra cassandra   2346732 Jun 17 10:28 debs-energydata10m-ka-38-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 10:28 debs-energydata10m-ka-38-Statistics.db
-rw-r--r-- 1 cassandra cassandra     16368 Jun 17 10:28 debs-energydata10m-ka-38-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 10:28 debs-energydata10m-ka-38-TOC.txt
-rw-r--r-- 1 cassandra cassandra      1811 Jun 17 10:58 debs-energydata10m-ka-39-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   4475513 Jun 17 10:58 debs-energydata10m-ka-39-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 10:58 debs-energydata10m-ka-39-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     86392 Jun 17 10:58 debs-energydata10m-ka-39-Filter.db
-rw-r--r-- 1 cassandra cassandra   1243818 Jun 17 10:58 debs-energydata10m-ka-39-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 10:58 debs-energydata10m-ka-39-Statistics.db
-rw-r--r-- 1 cassandra cassandra      8704 Jun 17 10:58 debs-energydata10m-ka-39-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 10:58 debs-energydata10m-ka-39-TOC.txt

/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3:
total 0
drwxr-xr-x 3 cassandra cassandra 40 Jun 17 10:07 snapshots

/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3/snapshots:
total 4
drwxr-xr-x 2 cassandra cassandra 4096 Jun 17 10:07 1434535647574-energydata10m

/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3/snapshots/1434535647574-energydata10m:
total 326784
-rw-r--r-- 1 cassandra cassandra     92923 Jun 17 09:15 debs-energydata10m-ka-37-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 240323836 Jun 17 09:15 debs-energydata10m-ka-37-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 09:15 debs-energydata10m-ka-37-Digest.sha1
-rw-r--r-- 1 cassandra cassandra   4520064 Jun 17 09:15 debs-energydata10m-ka-37-Filter.db
-rw-r--r-- 1 cassandra cassandra  65218608 Jun 17 09:15 debs-energydata10m-ka-37-Index.db
-rw-r--r-- 1 cassandra cassandra      9919 Jun 17 09:15 debs-energydata10m-ka-37-Statistics.db
-rw-r--r-- 1 cassandra cassandra    452976 Jun 17 09:15 debs-energydata10m-ka-37-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 09:15 debs-energydata10m-ka-37-TOC.txt
-rw-r--r-- 1 cassandra cassandra      3307 Jun 17 09:14 debs-energydata10m-ka-38-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   8321541 Jun 17 09:14 debs-energydata10m-ka-38-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 09:14 debs-energydata10m-ka-38-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    159384 Jun 17 09:14 debs-energydata10m-ka-38-Filter.db
-rw-r--r-- 1 cassandra cassandra   2294964 Jun 17 09:14 debs-energydata10m-ka-38-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 09:14 debs-energydata10m-ka-38-Statistics.db
-rw-r--r-- 1 cassandra cassandra     16016 Jun 17 09:14 debs-energydata10m-ka-38-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 09:14 debs-energydata10m-ka-38-TOC.txt
-rw-r--r-- 1 cassandra cassandra      3307 Jun 17 09:15 debs-energydata10m-ka-39-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   8316992 Jun 17 09:15 debs-energydata10m-ka-39-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 09:15 debs-energydata10m-ka-39-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    159296 Jun 17 09:15 debs-energydata10m-ka-39-Filter.db
-rw-r--r-- 1 cassandra cassandra   2293614 Jun 17 09:15 debs-energydata10m-ka-39-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 09:15 debs-energydata10m-ka-39-Statistics.db
-rw-r--r-- 1 cassandra cassandra     16000 Jun 17 09:15 debs-energydata10m-ka-39-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 09:15 debs-energydata10m-ka-39-TOC.txt
-rw-r--r-- 1 cassandra cassandra       755 Jun 17 10:07 debs-energydata10m-ka-40-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   1781300 Jun 17 10:07 debs-energydata10m-ka-40-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 10:07 debs-energydata10m-ka-40-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     34752 Jun 17 10:07 debs-energydata10m-ka-40-Filter.db
-rw-r--r-- 1 cassandra cassandra    500220 Jun 17 10:07 debs-energydata10m-ka-40-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 10:07 debs-energydata10m-ka-40-Statistics.db
-rw-r--r-- 1 cassandra cassandra      3552 Jun 17 10:07 debs-energydata10m-ka-40-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 10:07 debs-energydata10m-ka-40-TOC.txt
-rw-r--r-- 1 cassandra cassandra       152 Jun 17 10:07 manifest.json

/var/lib/cassandra/data/debs/energydata-d615ace0141d11e5b5ddabd6d8b6d1d3:
total 0

Information: The Data of energydata10m or energydata1000m already existed before the writing process of energydata100m (the 14 GB disk space before launing)!

************** EDIT **************
I found calculation formulas here: http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html They say that the data on disk can be much higher than the original dataset. Can someone explain how to calculate the values of the link above? I don't know about the needed data-sizes...

1

There are 1 answers

0
D. Müller On BEST ANSWER

The following documentation explains the data sizes and their calculation: http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html