Why does Cassandra Full Partition Scan even with a Limit?

47 views Asked by Kiran Baktha At 29 October 2023 at 18:59

Apologies if this is a silly question. From what I read it looks like Cassandra would do a full table or partition (if partition key is specified) scan when executing a query with limit ? I was wondering why that might be?

For instance suppose I have this table

CREATE TABLE IF NOT EXISTS movies(
year int,
name text,
budget int,
PRIMARY KEY (year, budget)
) WITH CLUSTERING ORDER BY (budget DESC);

And I want to execute queries like

SELECT name FROM movies WHERE year = 2022 LIMIT 25;

To get 25 movies with the highest budget in 2022.

Or like

SELECT name FROM movies WHERE year = 2022 AND budget < 10000 LIMIT 25;

To get 25 movies with the highest budget below $100K in 2022.

I was curious why Cassandra has to scan the entire partition? Can't it just return the first 25 in the partition for the first query directly and the first 25 from the clustering key filter (guessing this should be efficient since its a cluster key) for the second query?

In terms of complexity , if n is the max number of records in a partition the the first query being O(25) and the second query being O(25 + log(n)) guessing it takes log(n) to do the cluster key filtering. Looks like Cassandra would do full partition scan and so its going to be O(n) ? If so, why wouldn't Cassandra be more efficient here?

Original Q&A

TechQA.

Why does Cassandra Full Partition Scan even with a Limit?

There are 0 answers

Related Questions in DATABASE

Related Questions in CASSANDRA

Related Questions in NOSQL

Related Questions in DATASTAX

Related Questions in CASSANDRA-3.0

Popular Questions

Trending Questions