I'm trying analyze user retention using a cohort analysis based on event data stored in Redshift.
For example, in Redshift I have:
timestamp action user id
--------- ------ -------
2015-05-05 12:00 homepage 1
2015-05-05 12:01 product page 1
2015-05-05 12:02 homepage 2
2015-05-05 12:03 checkout 1
I would like to extract the daily retention cohort. For example:
signup_day users_count d1 d2 d3 d4 d5 d6 d7
---------- ----------- -- -- -- -- -- -- --
2015-05-05 100 80 60 40 20 17 16 12
2015-05-06 150 120 90 60 30 22 18 15
Where signup_day
represents the first date we have a record of a user action, users_count
is the total amount of users who signed up on signup_day
, d1
is the number of users who performed any action a day after signup_day
etc...
Is there a better way to represent the retention analysis data?
What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?
Eventually I found the query below to satisfy my requirements.
It produces a slightly different table than I described above (but is better for my needs):