I would like to get a count for a columns by a time period in pandas dataframe.

my table:

 id1       date_time               adress       a_size       
 reom      2005-8-20 22:51:10      75157.5413   ceifwekd
 reom      2005-8-20 22:55:25      3571.37946   ceifwekd
 reom      2005-8-20 11:21:01      3571.37946   tnohcve
 reom      2005-8-20 11:29:09      97439.219    tnohcve
 penr      2005-8-20 17:07:16     97439.219    ceifwekd
 penr      2005-8-20 19:10:37      7391.6258    ceifwekd
 ....

i need:

id1      time_period                     num_of_address
reom     2005-8-20 22:50:00 - 23:00:00      2
reom     2005-8-20 11:20:00 - 11:30:00      2
penr     2005-8-20 17:00:00 - 17:10:00      1

My code: I have created a new column to get hours from the date_time.

 df['num_per_10_minutes'] = df['id1'].map(df.groupby('id1', 'hours').apply(lambda x: x['date_time'].count()))

But this is not what I want. I need to count the numnber of "address" per 10 minutes.

Thanks

2 Answers

2
Chris On

Make interval column first, and use pandas.DataFrame.groupby:

import pandas as pd

df['date_time'] = pd.to_datetime(df['date_time'])
df = df.set_index('date_time', drop= True).sort_index()

df['intervals'] = ["%s - %s" % (i, i+1) 
                   for i in pd.date_range('2005-08-20', '2005-08-21', freq='10 min')
                   for d in df.index if i<= d <= (i+1)]
df.groupby(['id1', 'intervals'])['adress'].count().reset_index()

Output:

    id1                                  intervals  adress
0  penr  2005-08-20 17:00:00 - 2005-08-20 17:10:00       1
1  penr  2005-08-20 19:10:00 - 2005-08-20 19:20:00       1
2  reom  2005-08-20 11:20:00 - 2005-08-20 11:30:00       2
3  reom  2005-08-20 22:50:00 - 2005-08-20 23:00:00       2
1
jezrael On

First aggregate counts by GroupBy.size with Series.dt.floor:

df['date_time'] = pd.to_datetime(df['date_time'])

df = df.groupby(['id1', df['date_time'].dt.floor('10Min')]).size().reset_index(name='adress')
print (df)
    id1           date_time  adress
0  penr 2005-08-20 17:00:00       1
1  penr 2005-08-20 19:10:00       1
2  reom 2005-08-20 11:20:00       2
3  reom 2005-08-20 22:50:00       2

And then change format of datetimes by Series.dt.strftime, with next 10 Min:

df['date_time'] = (df['date_time'].dt.strftime('%Y-%m-%d %H:%M:%S') + 
                   (df['date_time'] + pd.Timedelta(10, unit='min')).dt.strftime(' - %H:%M:%S'))
print (df)
    id1                       date_time  adress
0  penr  2005-08-20 17:00:00 - 17:10:00       1
1  penr  2005-08-20 19:10:00 - 19:20:00       1
2  reom  2005-08-20 11:20:00 - 11:30:00       2
3  reom  2005-08-20 22:50:00 - 23:00:00       2

df['date_time'] = (df['date_time'].dt.strftime('%Y-%m-%d %H:%M:%S') + 
                   (df['date_time'] + pd.Timedelta(10, unit='min')).
                     dt.strftime(' - %Y-%m-%d %H:%M:%S'))
print (df)
    id1                                  date_time  adress
0  penr  2005-08-20 17:00:00 - 2005-08-20 17:10:00       1
1  penr  2005-08-20 19:10:00 - 2005-08-20 19:20:00       1
2  reom  2005-08-20 11:20:00 - 2005-08-20 11:30:00       2
3  reom  2005-08-20 22:50:00 - 2005-08-20 23:00:00       2