i have the following time series in my Cassandra cluster:
CREATE TABLE consumptions (
meter_id int,
date date,
meter_date_time timestamp,
data text,
PRIMARY KEY ((meter_id, date), meter_date_time)
) WITH CLUSTERING ORDER BY (meter_date_time DESC)
To calculate the consumptions of a month i need the latest meter reading from a month. In my case the query looks like this:
select * from consumtions
where meter_id = 1 and
date in (...'2016-12-30','2016-12-31'...)
limit 1
I'm, aware that an IN clause with multiple partition keys is an anti-pattern. Is there a better way for this query without denormalization?
In most of my other queries i'm working with asnyc queries - but this would return every result and i can't use the Limit.
The IN clause is generally bad, but since you are putting there 31 days at most, IMHO you can keep it without worrying about performances.
A second alternative is to run one-for-each-day-on-the-month queries async, and filter out the most recent measure at application level. That's only a bit of code.
A third alternative, if you have A LOT of sensors, is to query by day-of-month in descendant order hoping to find the last measure soon. This way you'd run a lower number of queries, removing some load from the cluster, at expense of some latency at application level, because if your last measure is on the 29th of the month, you query the 31st and find no results, go back one day and query the 30th and find no result, go back one day and query the 29th and find your result.