Cohort analysis with Amazon Redshift / PostgreSQL

Question

Cohort analysis with Amazon Redshift / PostgreSQL

2.3k views Asked by Ita At 07 June 2015 at 20:31

I'm trying analyze user retention using a cohort analysis based on event data stored in Redshift.

For example, in Redshift I have:

timestamp          action        user id
---------          ------        -------
2015-05-05 12:00   homepage      1
2015-05-05 12:01   product page  1
2015-05-05 12:02   homepage      2
2015-05-05 12:03   checkout      1

I would like to extract the daily retention cohort. For example:

signup_day  users_count d1  d2  d3  d4  d5  d6  d7 
----------  ----------- --  --  --  --  --  --  --  
2015-05-05  100         80  60  40  20  17  16  12
2015-05-06  150         120 90  60  30  22  18  15

Where signup_day represents the first date we have a record of a user action, users_count is the total amount of users who signed up on signup_day, d1 is the number of users who performed any action a day after signup_day etc...

Is there a better way to represent the retention analysis data?

What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?

Original Q&A

There are 1 answers

**Ita** · Answer 1 · 2015-06-17T14:12:03+00:00

Eventually I found the query below to satisfy my requirements.

WITH 

users AS (
  SELECT
    user_id,
    date_trunc('day', min(timestamp)) as activated_at
    from table
    group by 1
  )
,

events AS (
  SELECT user_id,
         action,
         timestamp AS occurred_at
    FROM table
)

SELECT DATE_TRUNC('day',u.activated_at) AS signup_date,


       TRUNC(EXTRACT('EPOCH' FROM e.occurred_at - u.activated_At)/(3600*24)) AS user_period,


       COUNT(DISTINCT e.user_id) AS retained_users
  FROM users u
  JOIN events e
    ON e.user_id = u.user_id
   AND e.occurred_at >= u.activated_at
 WHERE u.activated_at >= getdate() - INTERVAL '11 day'
 GROUP BY 1,2
 ORDER BY 1,2

It produces a slightly different table than I described above (but is better for my needs):

signup_date  user_period  retained_users
-----------  -----------  --------------
2015-05-05   0            80
2015-05-05   1            60
2015-05-05   2            40
2015-05-05   3            20
2015-05-06   0            100
2015-05-06   1            80
2015-05-06   2            40
2015-05-06   3            20

TechQA.

Cohort analysis with Amazon Redshift / PostgreSQL

There are 1 answers

Related Questions in SQL

Related Questions in AMAZON-WEB-SERVICES

Related Questions in METRICS

Related Questions in AMAZON-REDSHIFT

Popular Questions

Trending Questions