Let's say I have the following table which possibly contains dozens of millions of rows and thousands of guids:
| Guid | Measurement time | Signal |
| ------ | ---------------- | ------ |
| xxx1 | now | 10 |
| xxx2 | now | 5 |
| xxx1 | now-1h | 9 |
| xxx2 | now-1h | 6 |
The table DDL:
create table sensor_data_v3
(
guid text,
measurement_time bigint,
signal float,
... some more columns ...
primary key (guid, measurement_time)
)
with clustering order by (measurement_time desc)
and caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
and compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
and compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
and dclocal_read_repair_chance = 0.1
and speculative_retry = '99PERCENTILE'
and read_repair_chance = 0;
I need to get the latest signal for each guid, how do I do that?
Cassandra version is 3.11 if I'm not mistaken.
After reading this answer - Cassandra - How group by latest timestamp it seems just I just had to run the following: