Cumulative Dataset in SQL

74 views Asked by At

I have created a function in python and I am trying to write the same in SQL.

import pandas as pd
df['asofdate'] = pd.to_datetime(df['asofdate'])
df['month_year'] = df['asofdate'].dt.to_period('M')
lista_dat = df['month_year'].unique()
dataframe = pd.DataFrame()
for i in lista_dat:
    new = df[df['month_year'] <= i]
    new = new.drop_duplicates(subset=['issuer_id'], keep='last')
    new['month_year'] = i
    dataframe = pd.concat([new, dataframe])

Basically, I have a dataset with the issuer_id and the asofdate. I want to create the data set, where I will have cumulative dataset. Let' assume we have Jan 2023, Feb 2023 and Mar 2023 dates and also the day in asofdate. The new table will have month_year column. So Jan 2023 will have only Jan 2023, Feb 2023 will have Jan and Feb 2023 and Mar 2023 will have Jan, Feb and Mar 2023. I want to keep the most recent record.

To verify than SQL query is correct: After the creation of the dataset, after we group the data on month_year column, distinct issuer_id will be increasing throughout the whole period.

Starting table

Expected final table

I was not able to loop through the distinct values in asofdate column.

1

There are 1 answers

2
P.Salmon On

get the distinct year month in a cte then join eg

with cte as
(
select distinct cast(date_format(t.dt,'%Y%m') as unsigned)ym
from t)
select * from cte
join t on cast(date_format(t.dt,'%Y%m') as unsigned) <= cte.ym 

https://dbfiddle.uk/2BNtRz5r

if you want to create a table

create table t1 
with cte as
(
select distinct cast(date_format(t.dt,'%Y%m') as unsigned)ym
from t)
select * from cte
join t on cast(date_format(t.dt,'%Y%m') as unsigned) <= cte.ym ;

https://dbfiddle.uk/q1VitS14