Cassandra Server timeout during read query at consistency ONE (0 replica(s) responded over 1 required)

1.1k views Asked by At

I am using cqlsh v5.0.1 where I have a 8 node cassandra cluster which has several tables where in I am scanning a table and have a simple logic that if a row is older than 6 months than I delete it and if its younger than 6 month then I update the ttl for that row, in order to do this I am using express-cassandra npm and streaming the rows of the table using each row method but I get this error regarding server timeout very often and my program terminates as I don't get the next page required to process further.

Below I have attached my table config and code

Keyspace: events
    Read Count: 550349
    Read Latency: 14.500334253355598 ms.
    Write Count: 46644769
    Write Latency: 0.2615331485294739 ms.
    Pending Flushes: 0
        Table: track
        SSTable count: 18
        Space used (live): 1.56 TB
        Space used (total): 1.56 TB
        Space used by snapshots (total): 0 bytes
        Off heap memory used (total): 2.66 GB
        SSTable Compression Ratio: 0.12156681850176397
        Number of partitions (estimate): 222854730
        Memtable cell count: 4092
        Memtable data size: 8.04 MB
        Memtable off heap memory used: 0 bytes
        Memtable switch count: 1828
        Local read count: 550349
        Local read latency: 12.668 ms
        Local write count: 46644784
        Local write latency: 0.201 ms
        Pending flushes: 0
        Bloom filter false positives: 5
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 417.49 MB
        Bloom filter off heap memory used: 570.87 MB
        Index summary off heap memory used: 211.54 MB
        Compression metadata off heap memory used: 1.89 GB
        Compacted partition minimum bytes: 43 bytes
        Compacted partition maximum bytes: 765.03 MB
        Compacted partition mean bytes: 44.5 KB
        Average live cells per slice (last five minutes): 10.050420168067227
        Maximum live cells per slice (last five minutes): 124
        Average tombstones per slice (last five minutes): 9.004201680672269
        Maximum tombstones per slice (last five minutes): 1597 

Schema:

CREATE TABLE events.track (
    "profileId" text,
    "projectId" text,
    "sessionId" bigint,
    "anonymousId" text,
    "appBuild" text,
    "appName" text,
    "appNamespace" text,
    "appVersion" text,
    attributes list<text>,
    channels list<text>,
    "deviceId" text,
    "deviceManufacturer" text,
    "deviceModel" text,
    "deviceName" text,
    "eventTypes" list<text>,
    ip text,
    "libraryName" text,
    "libraryVersion" text,
    locale text,
    "networkCarrier" text,
    "osName" text,
    "osVersion" text,
    "propertyIds" list<text>,
    referrer text,
    "screenDensity" int,
    "screenHeight" int,
    "screenWidth" int,
    "sessionAttributes" map<text, text>,
    texts list<text>,
    timestamps list<timestamp>,
    timezone text,
    "userAgent" text,
    "writeKey" text,
    PRIMARY KEY (("profileId", "projectId"), "sessionId")
) WITH CLUSTERING ORDER BY ("sessionId" DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Streaming code

EventModel.eachRow({}, { fetchSize: 10000 }, function (n, row) {
           eventsChunk.push(row);
        },
           function (err, result) {
           // error handling and business logic here
        });

Any help would be highly appreciated.

1

There are 1 answers

2
Erick Ramirez On

The timeout occurs because a full table scan is expensive particularly when there are over 200 million partitions just on one node.

You didn't specify your table schema and the query you're running but I imagine you're doing a range query with ALLOW FILTERING which puts the nodes under load so they become unresponsive.

Cassandra is designed for OLTP workloads where you want to retrieve a single partition really, REALLY fast. Full table scans are OLAP workloads so you need an analytics solution such as using Apache Spark and the spark-cassandra-connector.

The connector optimises the queries against Cassandra so instead of doing a full table scan, the connector breaks them up into segments of token ranges and only requesting a small portion at a time. Cheers!