resample yearly to daily data with forward values

186 views Asked by At

I have 2 dataframe (one df and on series). They look like this :

ranking 

Date        type
2009-12-31  AAB000001471      Q2
            AAB000001513      Q1
            AAB000002057      Q2
            AAB000002164      Q2
            AAB000003162      Q2
                              ..
2020-12-31  AAB013530250      Q1
            AAB013530706      Q1
            AAB013531035      Q1
            AAB135329261      Q1
            AAB013536471      Q2
Name: Actif_net_part, Length: 59621, dtype: category
Categories (4, object): ['Q1' < 'Q2' < 'Q3' < 'Q4']

and :

base 

    Date    type
0   2020-10-06  AAB000006710
1   2020-09-22  AAB000006710
2   2020-09-08  AAB000006710
3   2020-08-25  AAB000006710
4   2020-02-25  AAB000006710
... ... ...
11477259    2019-05-15  AAB013282712
11477260    2019-05-14  AAB013282712
11477261    2019-05-13  AAB013282712
11477262    2019-05-10  AAB013282712
11477263    2019-05-09  AAB013282712
11477264 rows × 2 columns

the first one is yearly data whereas the second one is daily. I would like to merge both base on type/Date columns but for the yearly data create a daily one with forward data. I mean for this : 2009-12-31 AAB000001471 Q2 2010-01-01 AAB000001471 Q2 .... until the end of the year where I have a new value

I tried :

ddd = ranking.groupby(['type', pd.Grouper(level=0, freq='D')]).apply(lambda x: x.fillna(method='ffill'))

but it is not working. I still have yearly data

Thanks

1

There are 1 answers

0
jezrael On

Use GroupBy.apply with DataFrame.asfreq:

print (ranking)
Date        type        
2009-12-31  AAB000001471    Q2
            AAB000001513    Q1
            AAB000002057    Q2
            AAB000002164    Q2
            AAB000003162    Q2
2020-12-31  AAB000001471    Q1
            AAB000001513    Q1
            AAB000002057    Q1
            AAB000002164    Q1
            AAB000003162    Q2
Name: Actif_net_part, dtype: object

ddd = (ranking.reset_index(level=1)
              .groupby('type', group_keys=False)
              .apply(lambda x: x.asfreq('d', method='ffill'))
              .set_index('type', append=True)
              .sort_index())
print (ddd)
                        Actif_net_part
Date       type                       
2009-12-31 AAB000001471             Q2
           AAB000001513             Q1
           AAB000002057             Q2
           AAB000002164             Q2
           AAB000003162             Q2
...                                ...
2020-12-31 AAB000001471             Q1
           AAB000001513             Q1
           AAB000002057             Q1
           AAB000002164             Q1
           AAB000003162             Q2

[20095 rows x 1 columns]