Apologies if this is a silly question. From what I read it looks like Cassandra would do a full table or partition (if partition key is specified) scan when executing a query with limit ? I was wondering why that might be?
For instance suppose I have this table
CREATE TABLE IF NOT EXISTS movies(
year int,
name text,
budget int,
PRIMARY KEY (year, budget)
) WITH CLUSTERING ORDER BY (budget DESC);
And I want to execute queries like
SELECT name FROM movies WHERE year = 2022 LIMIT 25;
To get 25 movies with the highest budget in 2022.
Or like
SELECT name FROM movies WHERE year = 2022 AND budget < 10000 LIMIT 25;
To get 25 movies with the highest budget below $100K in 2022.
I was curious why Cassandra has to scan the entire partition? Can't it just return the first 25 in the partition for the first query directly and the first 25 from the clustering key filter (guessing this should be efficient since its a cluster key) for the second query?
In terms of complexity , if n is the max number of records in a partition the the first query being O(25) and the second query being O(25 + log(n)) guessing it takes log(n) to do the cluster key filtering. Looks like Cassandra would do full partition scan and so its going to be O(n) ? If so, why wouldn't Cassandra be more efficient here?