I have created a function in python and I am trying to write the same in SQL.
import pandas as pd
df['asofdate'] = pd.to_datetime(df['asofdate'])
df['month_year'] = df['asofdate'].dt.to_period('M')
lista_dat = df['month_year'].unique()
dataframe = pd.DataFrame()
for i in lista_dat:
new = df[df['month_year'] <= i]
new = new.drop_duplicates(subset=['issuer_id'], keep='last')
new['month_year'] = i
dataframe = pd.concat([new, dataframe])
Basically, I have a dataset with the issuer_id and the asofdate. I want to create the data set, where I will have cumulative dataset. Let' assume we have Jan 2023, Feb 2023 and Mar 2023 dates and also the day in asofdate. The new table will have month_year column. So Jan 2023 will have only Jan 2023, Feb 2023 will have Jan and Feb 2023 and Mar 2023 will have Jan, Feb and Mar 2023. I want to keep the most recent record.
To verify than SQL query is correct: After the creation of the dataset, after we group the data on month_year column, distinct issuer_id will be increasing throughout the whole period.
I was not able to loop through the distinct values in asofdate column.
get the distinct year month in a cte then join eg
https://dbfiddle.uk/2BNtRz5r
if you want to create a table
https://dbfiddle.uk/q1VitS14