I need to split quite a few large (several million records) files into half-hourly files using pandas to use with some other third-party software. Here's what I tried:
import datetime as dt
import string
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(1728000, 2), index=pd.date_range('1/1/2014',
periods=1728000, freq='0.1S'))
df_groups = df.groupby(df.index.map(lambda t: dt.datetime(t.year, t.month,
t.day, t.hour)))
for name, group in df_groups:
group.to_csv(string.replace(str(name), ':', '_') + '.csv')
But this way I can only get pandas to split by hour. What should I do in case I want to split them into half-hourly files?
A couple of things to keep in mind: a) the large files can span several days, so if I use lambda t: t.hour
I get data from different days, but same hours grouped together; b) the large files have gaps, so some half-hours may not be full and some can be totally missing.
make your grouper like this:
In 0.14 this will be slightly different, e.g.
df.groupby(pd.Grouper(freq='30T'))