Cassandra 2.1: how to model for the N most recently active users?

104 views Asked by At

I need to get a list of N users with the most recent actions using cassandra.

I have tried to use the following schema:

CREATE TABLE user_actions(
  partition_key int, 
  username int,
  action_time timestamp,
PRIMARY KEY(partition_key, action_time)
WITH CLUSTERING ORDER BY(action_time DESC);

When a user first logs in it creates a new entry. However, I'm unable to update the start_time property as it's used as a clustering key. I could delete the old data and create a new entry when the user takes another action - but user actions occur frequently and we will quickly hit the tombstone limit.

Does anyone know how best I could model the data in order to achieve this?

1

There are 1 answers

2
Chris Lohfink On BEST ANSWER

You dont need to delete individual cells. Just keep appending. Add another component to the partition key to keep from getting too wide:

PRIMARY KEY((partition_key, day), action_time)

Then query using something like (pseudo code):

y = floor(time() / (60 * 60 * 24))
oldest_possible = y - 7
r = []
while len(r) < N and y >= oldest_possible:
    R.append(query('SELECT * FROM user_actions where partition_key = {x} AND day = {y} LIMIT {N};', x, y, N)
    y -= 1

to get the last N users. When theres not N in the partition, decrement day by 1 and query that too. Set TTL to 1 to 5 days or something and old records will be compacted away. This assumes you have at least N actions in 24 hours, or else you may get a not full list. If your records are super active can use hour instead of day.

The day/hour component can just be a simple floor(time() / (60*60*24))