nested loops in python DataFrame to create multiple time series forecasts

536 views Asked by At

I am new to Python and need it mostly for Stats. Here is a glimpse of the dataset:

Code    City    Date    Sales   
K1  W   1/1/2017    46506.92  
K1  X   1/1/2017    187195.2  
K1  Y   1/1/2017    12858.15  
K1  Z   1/1/2017    25300.88  
K2  W   1/1/2017    87731.47  
K2  X   1/1/2017    14952.8  
K3  Y   1/1/2017    167.8204  
K4  A   1/1/2017    9602.108  
K4  B   1/1/2017    16034.13  
K4  C   1/1/2017    106.5196  
K4  D   1/1/2017    1057.269  
K5  W   1/1/2017    12346.57  
K5  X   1/1/2017    528776.5  
K5  Y   1/1/2017    7598.979  
K5  Z   1/1/2017    147969.6  
K6  W   1/1/2017    11770.68  
K6  X   1/1/2017    180867.6  
K6  Y   1/1/2017    11778.6  
K6  Z   1/1/2017    48835.3  

City = list of strings and same code may be in multiple cities but each Code-City combination is unique with 32 datapoints. Data is available for a period of 32 months and is collected for 1st of each month. I need to create an array of rmse error values from individual forecasts. Each forecast is Code-City level. I wrote a def function for ARIMA(can't use prophet contingency)
I tried to filter the DataFrame hierarchically by Code and then City for that Code by using:

df.loc[lambda x: x['Code'] in Codelist].loc[lambda x: x['City'] in Citylist]
But getting error as
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

        ##The way I want is for example, if Code exists in list of codes, move over to second for loop. Check if the City is present for that Code, if yes, call the defined function for ARIMA. The reason being same code exists in multiple cities.   

I want to store the result which are rmse value of forecasts - actuals in an array and keep appending it after every iteration. I am expecting an array of 5 float values of forecasted Output using ARIMA forecast.

1

There are 1 answers

1
Suraj Motaparthy On

To do individual forecasts, you can take only the code, city combinations which are present in data beforehand rather than try all combinations.

for code, city in df[['Code', 'City']].drop_duplicates().values:
    train_df = df[(df['Code']==code)&(df['City']==city)].sort_values(by='Date')
    .....