Calculating running revenue for cohorts SQL using window function sum over ()

61 views Asked by At

I am trying to calculate the running sum (revenue) within each cohort group. I am using the following query to achieve this:

round(sum(SUM(i.subtotal)) OVER (PARTITION BY cft.cohort_start order by invoice_date), 2) AS accrual_cum
  • cohort_start is the name of the cohort group
  • subtotal is the revenue amount

It doesn't help me achieve what I want as it just returns the total sum for each cohort group, whereas what I want is to show me the running sum within each cohort group, so for the first transaction in the first cohort revenue = A , with the second transaction it becomes A + B, in the third A + B + C etc.

What am I doing wrong here?

My result

This is the whole query just in case :

WITH 
invoices AS (
    SELECT * FROM {{ source ('bq_raw_data', 'invoices') }}
), 

customer_first_transaction AS (
    SELECT
        customer_id,
        MIN(invoice_date) AS cohort_start
    FROM invoices
    GROUP BY customer_id
)

SELECT
    cft.cohort_start AS customer_cohort,
    invoice_date,
    EXTRACT(DAY FROM (current_date-cft.cohort_start)) AS days_since_cohort_start,
    i.product_id AS product,
    coalesce(companies.region, '0') as region,
    round(SUM(i.subtotal), 2) as accrual_revenue,
    round(sum(SUM(i.subtotal)) OVER (PARTITION BY cft.cohort_start order by invoice_date), 2) AS accrual_cum, 
    case when i.product_id = 'stp_9' then ROUND(SUM(i.subtotal) / 12, 2) 
    else round(SUM(i.subtotal), 2) 
    end AS p_and_l_revenue, 
    ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN sum(i.subtotal) / 12 ELSE sum(i.subtotal) END) OVER 
    (PARTITION BY cft.cohort_start ORDER BY invoice_date), 2) 
    AS p_and_l_cum

FROM invoices i
left JOIN customer_first_transaction cft ON i.customer_id = cft.customer_id
LEFT JOIN {{ source('bq_raw_data', 'customers') }} AS customers ON customers.id = i.customer_id
LEFT JOIN {{ source('bq_raw_data', 'companies') }} AS companies ON companies.id = customers.seedlegals_company_id
group by cft.cohort_start, invoice_date, i.product_id, companies.region
order by customer_cohort, invoice_date

EDIT: The result that I get when I am using the query suggested by 0xKevin:

    ROUND(i.subtotal, 2) as accrual_revenue,
    ROUND(SUM(i.subtotal) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS accrual_cum,
    case when i.product_id = 'stp_9' then ROUND(i.subtotal / 12, 2) 
    else ROUND(i.subtotal, 2) 
    end AS p_and_l_revenue,
    ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN i.subtotal / 12 ELSE i.subtotal END) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS p_and_l_cum

Result after the edit

1

There are 1 answers

6
0xKevin On

The SQL query you've written seems almost correct, but there's a slight issue with how you've used the SUM() function. The query within the OVER clause should not contain a nested SUM. The SUM inside the window function already calculates the cumulative total; there should not be an additional SUM around it.

The correct usage of the window function for calculating a running total, or cumulative sum, would look like this:

SUM(i.subtotal) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS accrual_cum

And for the p_and_l_cum, it should be:

SUM(CASE WHEN i.product_id = 'stp_9' THEN i.subtotal / 12 ELSE i.subtotal END) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS p_and_l_cum

Here's how you should modify your query:

WITH 
invoices AS (
    SELECT * FROM {{ source ('bq_raw_data', 'invoices') }}
), 
customer_first_transaction AS (
    SELECT
        customer_id,
        MIN(invoice_date) AS cohort_start
    FROM invoices
    GROUP BY customer_id
)
SELECT
    cft.cohort_start AS customer_cohort,
    invoice_date,
    EXTRACT(DAY FROM (current_date - cft.cohort_start)) AS days_since_cohort_start,
    i.product_id AS product,
    coalesce(companies.region, '0') as region,
    ROUND(SUM(i.subtotal), 2) as accrual_revenue,
    ROUND(SUM(i.subtotal) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS accrual_cum,
    case when i.product_id = 'stp_9' then ROUND(i.subtotal / 12, 2) 
    else ROUND(i.subtotal, 2) 
    end AS p_and_l_revenue,
    ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN i.subtotal / 12 ELSE i.subtotal END) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS p_and_l_cum
FROM invoices i
LEFT JOIN customer_first_transaction cft ON i.customer_id = cft.customer_id
LEFT JOIN {{ source('bq_raw_data', 'customers') }} AS customers ON customers.id = i.customer_id
LEFT JOIN {{ source('bq_raw_data', 'companies') }} AS companies ON companies.id = customers.seedlegals_company_id
GROUP BY cft.cohort_start, invoice_date, i.product_id, companies.region
ORDER BY customer_cohort, invoice_date;