How to properly design a Cassandra table for CQL order by timestamp ASC

45 views Asked by At

I need to run the following type of query:

SELECT * FROM outbox 
ORDER BY timestamp ASC 
LIMIT 10

DDL:

CREATE TABLE IF NOT EXISTS stories_views.outbox (
    key_id uuid,
    message text,
    created_at timestamp,
    PRIMARY KEY ((key_id), created_at))
WITH CLUSTERING ORDER BY (created_at asc)
AND compaction = { 'class' : 'SizeTieredCompactionStrategy' };

Please tell me how to design a table so that there is no search across the entire Cassandra cluster.

2

There are 2 answers

0
Madhavan On

You have it the correct way if your query is always going to be what you had described above. Cassandra® being a distributed NoSQL database is going to do a full cluster scan if your query is badly designed. If you're leveraging point queries i.e. leveraging the full primary keys, it won't do a scan. With versions 5.x and above, we now have an efficient Storaged-Attached Index (SAI) that will help giving you the flexibility to query by a non-primary key column on your table.

For more data modeling understanding, you could leverage this free-browser-based handy tutorial where you could understand the nuances of data modeling. Cheers!

0
Mário Tavares On

If you use a limit, you won't necessarily read the whole table, but at least 1/replication_factor of the nodes, up to the whole cluster will do some work, since you're also not specifying a partition - namely all replicas in the datacenter that the application is connecting to will scan data in the table locally until either 10 results are met or the table is fully scanned.

This is considered an anti-pattern in Cassandra because it defeats the purpose of data partitioning. If you want to achieve scalable reads, it is necessary to query by partition. This will ensure that no more than 3 nodes (assuming replication factor is 3) will have to scan a subset of their data.

In this case, a scalable query could look something like this:

SELECT * FROM stories_views.outbox WHERE key_id = foo ORDER BY created_at ASC LIMIT 10;

This will return you the oldest 10 results for the partition with key_id "foo".

If the application really needs to scan all key_id entries in the table, your best (albeit not good) option may be to denormalize and create a new table with time window buckets as partitions (such as a partition representing an hour, a day, a week, etc.), and scan through the buckets until you get the number of results you need.

The first challenge there will then be sizing the buckets in order to not build enormous partitions (100MB or above).

The second problem with this approach is that partitioning time windows is also an anti-pattern that gets in the way of balanced workload distribution. High throughput demands during peak times can overload a small subset of the nodes with write requests, and maybe also reads, for the partition that represents the current time window.