I am using cqlsh v5.0.1 where I have a 8 node cassandra cluster which has several tables where in I am scanning a table and have a simple logic that if a row is older than 6 months than I delete it and if its younger than 6 month then I update the ttl for that row, in order to do this I am using express-cassandra npm and streaming the rows of the table using each row method but I get this error regarding server timeout very often and my program terminates as I don't get the next page required to process further.
Below I have attached my table config and code
Keyspace: events
Read Count: 550349
Read Latency: 14.500334253355598 ms.
Write Count: 46644769
Write Latency: 0.2615331485294739 ms.
Pending Flushes: 0
Table: track
SSTable count: 18
Space used (live): 1.56 TB
Space used (total): 1.56 TB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 2.66 GB
SSTable Compression Ratio: 0.12156681850176397
Number of partitions (estimate): 222854730
Memtable cell count: 4092
Memtable data size: 8.04 MB
Memtable off heap memory used: 0 bytes
Memtable switch count: 1828
Local read count: 550349
Local read latency: 12.668 ms
Local write count: 46644784
Local write latency: 0.201 ms
Pending flushes: 0
Bloom filter false positives: 5
Bloom filter false ratio: 0.00000
Bloom filter space used: 417.49 MB
Bloom filter off heap memory used: 570.87 MB
Index summary off heap memory used: 211.54 MB
Compression metadata off heap memory used: 1.89 GB
Compacted partition minimum bytes: 43 bytes
Compacted partition maximum bytes: 765.03 MB
Compacted partition mean bytes: 44.5 KB
Average live cells per slice (last five minutes): 10.050420168067227
Maximum live cells per slice (last five minutes): 124
Average tombstones per slice (last five minutes): 9.004201680672269
Maximum tombstones per slice (last five minutes): 1597
Schema:
CREATE TABLE events.track (
"profileId" text,
"projectId" text,
"sessionId" bigint,
"anonymousId" text,
"appBuild" text,
"appName" text,
"appNamespace" text,
"appVersion" text,
attributes list<text>,
channels list<text>,
"deviceId" text,
"deviceManufacturer" text,
"deviceModel" text,
"deviceName" text,
"eventTypes" list<text>,
ip text,
"libraryName" text,
"libraryVersion" text,
locale text,
"networkCarrier" text,
"osName" text,
"osVersion" text,
"propertyIds" list<text>,
referrer text,
"screenDensity" int,
"screenHeight" int,
"screenWidth" int,
"sessionAttributes" map<text, text>,
texts list<text>,
timestamps list<timestamp>,
timezone text,
"userAgent" text,
"writeKey" text,
PRIMARY KEY (("profileId", "projectId"), "sessionId")
) WITH CLUSTERING ORDER BY ("sessionId" DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Streaming code
EventModel.eachRow({}, { fetchSize: 10000 }, function (n, row) {
eventsChunk.push(row);
},
function (err, result) {
// error handling and business logic here
});
Any help would be highly appreciated.
The timeout occurs because a full table scan is expensive particularly when there are over 200 million partitions just on one node.
You didn't specify your table schema and the query you're running but I imagine you're doing a range query with
ALLOW FILTERING
which puts the nodes under load so they become unresponsive.Cassandra is designed for OLTP workloads where you want to retrieve a single partition really, REALLY fast. Full table scans are OLAP workloads so you need an analytics solution such as using Apache Spark and the spark-cassandra-connector.
The connector optimises the queries against Cassandra so instead of doing a full table scan, the connector breaks them up into segments of token ranges and only requesting a small portion at a time. Cheers!