Splitting several days long dataframe into half-hourly dataframes using pandas and save them as csv-files

1.2k views Asked by At

I need to split quite a few large (several million records) files into half-hourly files using pandas to use with some other third-party software. Here's what I tried:

import datetime as dt
import string
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(1728000, 2), index=pd.date_range('1/1/2014',
    periods=1728000, freq='0.1S'))
df_groups = df.groupby(df.index.map(lambda t: dt.datetime(t.year, t.month,
    t.day, t.hour)))
for name, group in df_groups:
    group.to_csv(string.replace(str(name), ':', '_') + '.csv')

But this way I can only get pandas to split by hour. What should I do in case I want to split them into half-hourly files?

A couple of things to keep in mind: a) the large files can span several days, so if I use lambda t: t.hour I get data from different days, but same hours grouped together; b) the large files have gaps, so some half-hours may not be full and some can be totally missing.

1

There are 1 answers

2
Jeff On BEST ANSWER

make your grouper like this:

df.groupby(pd.TimeGrouper('30T'))

In 0.14 this will be slightly different, e.g. df.groupby(pd.Grouper(freq='30T'))