I am trying to get the daily growth rate of users per merchant and add it as another column to my dataframe. I was able to come up with the code already; however I think the issue is because of certain merchants that may have consecutive days of 0 sales. 0-> 0 doesn't have a growth rate so I believe that's why its messing up my code. Any recommendations?
Here is the code I am using: Apologies for the length
# First I filter our the dates and order category
# week
df_wk = df.loc[(df['Booked at'] >= '2023-01-01') & (df['Booked at'] <= '2023-05-31')]
#placed eats orders
df_eats = df_wk.loc[df_wk['Job Type'] == 'Eats']
# Create a dataframe with all merchants and dates within the date range
dates = pd.date_range('2023-01-01', '2023-05-31', freq='D')
merchants = df_eats['Merchant Name'].unique()
index = pd.MultiIndex.from_product([dates, merchants], names=['Booked at', 'Merchant Name'])
all_sales = pd.DataFrame(index=index).reset_index()
#including all merchants even on zero sales days and getting number of unique users per merchant and day
eats_users = df_eats.groupby(['Booked at', 'Merchant Name']).agg({'Customer Name': lambda x: x.nunique()})
eats_users.rename(columns={'Customer Name': 'Total Unique Users'}, inplace=True)
# Sort the dataframe by 'Booked at' column to ensure chronological order
eats_users.sort_values('Booked at', inplace=True)
#merge all sales with new df
unique_users = pd.merge(all_sales, eats_users, on=['Booked at', 'Merchant Name'], how='left')
unique_users.fillna(0, inplace=True)
# Sort the dataframe by 'Booked at' column to ensure chronological order
unique_users.sort_values('Booked at', inplace=True)
# Add new column that calculates the daily growth rate per merchant
previous_users = {}
for index, row in unique_users.iterrows():
current_users = row['Total Unique Users']
merchant = index[1] # Get the merchant name from the index tuple
if pd.isna(current_users):
unique_users.at[index, 'Daily Growth Rate for Unique Users'] = pd.NA
else:
if merchant in previous_users:
previous_users_count = previous_users[merchant]
growth_rate = (current_users / previous_users_count - 1) * 100
unique_users.at[index, 'Daily Growth Rate for Unique Users'] = growth_rate
else:
unique_users.at[index, 'Daily Growth Rate for Unique Users'] = pd.NA
previous_users[merchant] = current_users
Expected Output is something like this:
Booked at Merchant Name Total Unique Users \
0 2023-01-01 Merchant A 38
1 2023-01-01 Merchant B 75
2 2023-01-01 Merchant C 74
3 2023-01-02 Merchant A 33
4 2023-01-02 Merchant B 71
.. ... ... ...
447 2023-05-30 Merchant A 75
449 2023-05-30 Merchant C 13
451 2023-05-31 Merchant B 76
450 2023-05-31 Merchant A 52
452 2023-05-31 Merchant C 2
Daily Growth Rate for Unique Users
0 NaN
1 NaN
2 NaN
3 -13.157895
4 -5.333333
.. ...
447 5.633803
449 -45.833333
451 13.432836
450 -30.666667
452 -84.615385
[453 rows x 4 columns]