I just stored a 3,1 GB CSV via Spark-Cassandra-Connector to a Table in a Cassandra Cluster (5 Nodes, 30 GB each, 7.5 GB RAM each instance, cassandra uses ~1.8 GB of that).
I jst saw via DataOpsCenter, that my Cluster holds 16 GB of data (each node ~3.x GB) and my storage usage has grown from 14 GB (before) to 64 GB (after the writing process)!!!
My Keystore has following settings:
replica_placement_strategy org.apache.cassandra.locator.SimpleStrategy
replication_factor 2
CREATE TABLE debs.energydata10m (
id int PRIMARY KEY,
house_id int,
household_id int,
plug_id int,
ts timestamp,
type int,
val float
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.000000 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
Why does Cassandra need that much storage for this 3.1 GB CSV?
Edit: Here is the output of the ls -lR /var/lib/cassandra/data/debs/ command:
ubuntu@ip-xx-xx-xx-xx:~$ ls -lR /var/lib/cassandra/data/debs/
/var/lib/cassandra/data/debs/:
total 24
drwxr-xr-x 2 cassandra cassandra 6 Jun 16 12:43 energydata1000m-52502e00142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 16384 Jun 17 13:39 energydata100m-4cb23100142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 6 Jun 17 08:41 energydata10m-46487f90142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 4096 Jun 17 10:58 energydata10m-f17f204014d811e5b5ddabd6d8b6d1d3
drwxr-xr-x 3 cassandra cassandra 22 Jun 17 10:07 energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 6 Jun 16 12:40 energydata-d615ace0141d11e5b5ddabd6d8b6d1d3
/var/lib/cassandra/data/debs/energydata1000m-52502e00142511e5b5ddabd6d8b6d1d3:
total 0
/var/lib/cassandra/data/debs/energydata100m-4cb23100142511e5b5ddabd6d8b6d1d3:
total 3294336
-rw-r--r-- 1 cassandra cassandra 361779 Jun 17 12:36 debs-energydata100m-ka-187-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 943405306 Jun 17 12:36 debs-energydata100m-ka-187-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 12:36 debs-energydata100m-ka-187-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 17615016 Jun 17 12:36 debs-energydata100m-ka-187-Filter.db
-rw-r--r-- 1 cassandra cassandra 254001924 Jun 17 12:36 debs-energydata100m-ka-187-Index.db
-rw-r--r-- 1 cassandra cassandra 9911 Jun 17 12:36 debs-energydata100m-ka-187-Statistics.db
-rw-r--r-- 1 cassandra cassandra 1763968 Jun 17 12:36 debs-energydata100m-ka-187-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 12:36 debs-energydata100m-ka-187-TOC.txt
-rw-r--r-- 1 cassandra cassandra 46747 Jun 17 12:25 debs-energydata100m-ka-211-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 120719760 Jun 17 12:25 debs-energydata100m-ka-211-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 12:25 debs-energydata100m-ka-211-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 2266552 Jun 17 12:25 debs-energydata100m-ka-211-Filter.db
-rw-r--r-- 1 cassandra cassandra 32799168 Jun 17 12:25 debs-energydata100m-ka-211-Index.db
-rw-r--r-- 1 cassandra cassandra 9955 Jun 17 12:25 debs-energydata100m-ka-211-Statistics.db
-rw-r--r-- 1 cassandra cassandra 227840 Jun 17 12:25 debs-energydata100m-ka-211-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 12:25 debs-energydata100m-ka-211-TOC.txt
-rw-r--r-- 1 cassandra cassandra 400275 Jun 17 13:39 debs-energydata100m-ka-353-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 1053658168 Jun 17 13:39 debs-energydata100m-ka-353-Data.db
-rw-r--r-- 1 cassandra cassandra 9 Jun 17 13:39 debs-energydata100m-ka-353-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 19254504 Jun 17 13:39 debs-energydata100m-ka-353-Filter.db
-rw-r--r-- 1 cassandra cassandra 281034756 Jun 17 13:39 debs-energydata100m-ka-353-Index.db
-rw-r--r-- 1 cassandra cassandra 9911 Jun 17 13:39 debs-energydata100m-ka-353-Statistics.db
-rw-r--r-- 1 cassandra cassandra 1951696 Jun 17 13:39 debs-energydata100m-ka-353-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:39 debs-energydata100m-ka-353-TOC.txt
-rw-r--r-- 1 cassandra cassandra 106147 Jun 17 13:32 debs-energydata100m-ka-377-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 275239666 Jun 17 13:32 debs-energydata100m-ka-377-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:32 debs-energydata100m-ka-377-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 5209632 Jun 17 13:32 debs-energydata100m-ka-377-Filter.db
-rw-r--r-- 1 cassandra cassandra 74503386 Jun 17 13:32 debs-energydata100m-ka-377-Index.db
-rw-r--r-- 1 cassandra cassandra 9935 Jun 17 13:32 debs-energydata100m-ka-377-Statistics.db
-rw-r--r-- 1 cassandra cassandra 517456 Jun 17 13:32 debs-energydata100m-ka-377-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:32 debs-energydata100m-ka-377-TOC.txt
-rw-r--r-- 1 cassandra cassandra 63267 Jun 17 13:36 debs-energydata100m-ka-392-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 163610575 Jun 17 13:36 debs-energydata100m-ka-392-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:36 debs-energydata100m-ka-392-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 3146928 Jun 17 13:36 debs-energydata100m-ka-392-Filter.db
-rw-r--r-- 1 cassandra cassandra 44398512 Jun 17 13:36 debs-energydata100m-ka-392-Index.db
-rw-r--r-- 1 cassandra cassandra 9971 Jun 17 13:36 debs-energydata100m-ka-392-Statistics.db
-rw-r--r-- 1 cassandra cassandra 308400 Jun 17 13:36 debs-energydata100m-ka-392-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:36 debs-energydata100m-ka-392-TOC.txt
-rw-r--r-- 1 cassandra cassandra 16475 Jun 17 13:37 debs-energydata100m-ka-398-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 42447012 Jun 17 13:37 debs-energydata100m-ka-398-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:37 debs-energydata100m-ka-398-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 819112 Jun 17 13:37 debs-energydata100m-ka-398-Filter.db
-rw-r--r-- 1 cassandra cassandra 11540160 Jun 17 13:37 debs-energydata100m-ka-398-Index.db
-rw-r--r-- 1 cassandra cassandra 9915 Jun 17 13:37 debs-energydata100m-ka-398-Statistics.db
-rw-r--r-- 1 cassandra cassandra 80208 Jun 17 13:37 debs-energydata100m-ka-398-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:37 debs-energydata100m-ka-398-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3307 Jun 17 13:37 debs-energydata100m-ka-399-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8375321 Jun 17 13:37 debs-energydata100m-ka-399-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:37 debs-energydata100m-ka-399-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 159248 Jun 17 13:37 debs-energydata100m-ka-399-Filter.db
-rw-r--r-- 1 cassandra cassandra 2292966 Jun 17 13:37 debs-energydata100m-ka-399-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 13:37 debs-energydata100m-ka-399-Statistics.db
-rw-r--r-- 1 cassandra cassandra 16000 Jun 17 13:37 debs-energydata100m-ka-399-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:37 debs-energydata100m-ka-399-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3299 Jun 17 13:39 debs-energydata100m-ka-400-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8332947 Jun 17 13:39 debs-energydata100m-ka-400-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:39 debs-energydata100m-ka-400-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 159088 Jun 17 13:39 debs-energydata100m-ka-400-Filter.db
-rw-r--r-- 1 cassandra cassandra 2290716 Jun 17 13:39 debs-energydata100m-ka-400-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 13:39 debs-energydata100m-ka-400-Statistics.db
-rw-r--r-- 1 cassandra cassandra 15984 Jun 17 13:39 debs-energydata100m-ka-400-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:39 debs-energydata100m-ka-400-TOC.txt
/var/lib/cassandra/data/debs/energydata10m-46487f90142511e5b5ddabd6d8b6d1d3:
total 0
/var/lib/cassandra/data/debs/energydata10m-f17f204014d811e5b5ddabd6d8b6d1d3:
total 326684
-rw-r--r-- 1 cassandra cassandra 95051 Jun 17 10:30 debs-energydata10m-ka-37-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 245687780 Jun 17 10:30 debs-energydata10m-ka-37-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 10:30 debs-energydata10m-ka-37-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 4617168 Jun 17 10:30 debs-energydata10m-ka-37-Filter.db
-rw-r--r-- 1 cassandra cassandra 66716856 Jun 17 10:30 debs-energydata10m-ka-37-Index.db
-rw-r--r-- 1 cassandra cassandra 9923 Jun 17 10:30 debs-energydata10m-ka-37-Statistics.db
-rw-r--r-- 1 cassandra cassandra 463376 Jun 17 10:30 debs-energydata10m-ka-37-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 10:30 debs-energydata10m-ka-37-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3379 Jun 17 10:28 debs-energydata10m-ka-38-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8505046 Jun 17 10:28 debs-energydata10m-ka-38-Data.db
-rw-r--r-- 1 cassandra cassandra 9 Jun 17 10:28 debs-energydata10m-ka-38-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 162984 Jun 17 10:28 debs-energydata10m-ka-38-Filter.db
-rw-r--r-- 1 cassandra cassandra 2346732 Jun 17 10:28 debs-energydata10m-ka-38-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 10:28 debs-energydata10m-ka-38-Statistics.db
-rw-r--r-- 1 cassandra cassandra 16368 Jun 17 10:28 debs-energydata10m-ka-38-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 10:28 debs-energydata10m-ka-38-TOC.txt
-rw-r--r-- 1 cassandra cassandra 1811 Jun 17 10:58 debs-energydata10m-ka-39-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 4475513 Jun 17 10:58 debs-energydata10m-ka-39-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 10:58 debs-energydata10m-ka-39-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 86392 Jun 17 10:58 debs-energydata10m-ka-39-Filter.db
-rw-r--r-- 1 cassandra cassandra 1243818 Jun 17 10:58 debs-energydata10m-ka-39-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 10:58 debs-energydata10m-ka-39-Statistics.db
-rw-r--r-- 1 cassandra cassandra 8704 Jun 17 10:58 debs-energydata10m-ka-39-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 10:58 debs-energydata10m-ka-39-TOC.txt
/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3:
total 0
drwxr-xr-x 3 cassandra cassandra 40 Jun 17 10:07 snapshots
/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3/snapshots:
total 4
drwxr-xr-x 2 cassandra cassandra 4096 Jun 17 10:07 1434535647574-energydata10m
/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3/snapshots/1434535647574-energydata10m:
total 326784
-rw-r--r-- 1 cassandra cassandra 92923 Jun 17 09:15 debs-energydata10m-ka-37-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 240323836 Jun 17 09:15 debs-energydata10m-ka-37-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 09:15 debs-energydata10m-ka-37-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 4520064 Jun 17 09:15 debs-energydata10m-ka-37-Filter.db
-rw-r--r-- 1 cassandra cassandra 65218608 Jun 17 09:15 debs-energydata10m-ka-37-Index.db
-rw-r--r-- 1 cassandra cassandra 9919 Jun 17 09:15 debs-energydata10m-ka-37-Statistics.db
-rw-r--r-- 1 cassandra cassandra 452976 Jun 17 09:15 debs-energydata10m-ka-37-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 09:15 debs-energydata10m-ka-37-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3307 Jun 17 09:14 debs-energydata10m-ka-38-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8321541 Jun 17 09:14 debs-energydata10m-ka-38-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 09:14 debs-energydata10m-ka-38-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 159384 Jun 17 09:14 debs-energydata10m-ka-38-Filter.db
-rw-r--r-- 1 cassandra cassandra 2294964 Jun 17 09:14 debs-energydata10m-ka-38-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 09:14 debs-energydata10m-ka-38-Statistics.db
-rw-r--r-- 1 cassandra cassandra 16016 Jun 17 09:14 debs-energydata10m-ka-38-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 09:14 debs-energydata10m-ka-38-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3307 Jun 17 09:15 debs-energydata10m-ka-39-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8316992 Jun 17 09:15 debs-energydata10m-ka-39-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 09:15 debs-energydata10m-ka-39-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 159296 Jun 17 09:15 debs-energydata10m-ka-39-Filter.db
-rw-r--r-- 1 cassandra cassandra 2293614 Jun 17 09:15 debs-energydata10m-ka-39-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 09:15 debs-energydata10m-ka-39-Statistics.db
-rw-r--r-- 1 cassandra cassandra 16000 Jun 17 09:15 debs-energydata10m-ka-39-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 09:15 debs-energydata10m-ka-39-TOC.txt
-rw-r--r-- 1 cassandra cassandra 755 Jun 17 10:07 debs-energydata10m-ka-40-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 1781300 Jun 17 10:07 debs-energydata10m-ka-40-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 10:07 debs-energydata10m-ka-40-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 34752 Jun 17 10:07 debs-energydata10m-ka-40-Filter.db
-rw-r--r-- 1 cassandra cassandra 500220 Jun 17 10:07 debs-energydata10m-ka-40-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 10:07 debs-energydata10m-ka-40-Statistics.db
-rw-r--r-- 1 cassandra cassandra 3552 Jun 17 10:07 debs-energydata10m-ka-40-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 10:07 debs-energydata10m-ka-40-TOC.txt
-rw-r--r-- 1 cassandra cassandra 152 Jun 17 10:07 manifest.json
/var/lib/cassandra/data/debs/energydata-d615ace0141d11e5b5ddabd6d8b6d1d3:
total 0
Information: The Data of energydata10m or energydata1000m already existed before the writing process of energydata100m (the 14 GB disk space before launing)!
************** EDIT **************
I found calculation formulas here: http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html They say that the data on disk can be much higher than the original dataset. Can someone explain how to calculate the values of the link above? I don't know about the needed data-sizes...
The following documentation explains the data sizes and their calculation: http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html