There are two dataframes, both with datetime objects in either 5 min, df_05min, or 15 min, df_15min, increments.

df_05min = pd.DataFrame({'dt':['2008-10-2404:12:30',
                                '2008-10-2404:12:35',
                                '2008-10-2404:12:40',
                                '2008-10-2404:12:45',
                                '2008-10-2404:12:50',
                                '2008-10-2404:13:00',
                                '2008-10-2404:13:05']})

df_15min = pd.DataFrame([['2008-10-2404:12:15',  'L'],
                        ['2008-10-2404:12:30',  'r'],
                        ['2008-10-2404:12:45',  'S'  ],
                        ['2008-10-2404:13:00',  'L'],
                        ['2008-10-2404:13:15',  'L' ]], columns=['dt','col'])

The goal is to merge the df_15min dataframe into the df_05min dataframe on the datetime column, dt, copying some accompanying data into the appropriate rows. This is instead of an outer merge where non-matching values get NaN. For example, in df_15min '2008-10-2404:12:30' has a value of np.nan that I would like to copy to the 5 minute values belonging to that 15 min interval in df_05min . This means 12:30, 12:35, and 12:40 will all have values of np.nan.

The desired end product looks like this:

df_desired = pd.DataFrame(['2008-10-2404:12:15', 'L',
                        '2008-10-2404:12:30', 'r',
                        '2008-10-2404:12:35', 'r',
                        '2008-10-2404:12:40', 'r',
                        '2008-10-2404:12:45', 'S',
                        '2008-10-2404:12:50', 'S',
                        '2008-10-2404:13:00', 'L',
                        '2008-10-2404:13:15', 'L'])

2 Answers

1
jezrael On Best Solutions

Here need merge_asof with outer join, what is not implemented, so possible solution is DataFrame.merge, sort by DataFrame.sort_values, forward filling missing values and last create default index by DataFrame.reset_index:

df_05min = pd.DataFrame({'dt':['2008-10-24 04:12:30',
                                '2008-10-24 04:12:35',
                                '2008-10-24 04:12:40',
                                '2008-10-24 04:12:45',
                                '2008-10-24 04:12:50',
                                '2008-10-24 04:13:00',
                                '2008-10-24 04:13:05']})

df_15min = pd.DataFrame([['2008-10-24 04:12:15',  'L'],
                        ['2008-10-24 04:12:30',  'r'],
                        ['2008-10-24 04:12:45',  'S'  ],
                        ['2008-10-24 04:13:00',  'L'],
                        ['2008-10-24 04:13:15',  'L' ]], columns=['dt','col'])

df_05min['dt'] = pd.to_datetime(df_05min['dt'])
df_15min['dt'] = pd.to_datetime(df_15min['dt'])

df=pd.merge(df_05min, df_15min, how='outer').sort_values('dt').ffill().reset_index(drop=True)
print (df)
                   dt col
0 2008-10-24 04:12:15   L
1 2008-10-24 04:12:30   r
2 2008-10-24 04:12:35   r
3 2008-10-24 04:12:40   r
4 2008-10-24 04:12:45   S
5 2008-10-24 04:12:50   S
6 2008-10-24 04:13:00   L
7 2008-10-24 04:13:05   L
8 2008-10-24 04:13:15   L
1
U9-Forward On

Try using merge with how='outer', fillna, and sort_values:

print(df_05min.merge(df_15min,how='outer').ffill().sort_values('dt'))

Output:

                   dt col
7  2008-10-2404:12:15   L
0  2008-10-2404:12:30   r
1  2008-10-2404:12:35   r
2  2008-10-2404:12:40   r
3  2008-10-2404:12:45   S
4  2008-10-2404:12:50   S
5  2008-10-2404:13:00   L
6  2008-10-2404:13:05   L
8  2008-10-2404:13:15   L

If you care about index, use:

print(df_05min.merge(df_15min,how='outer').ffill().sort_values('dt').reset_index(drop=True))