Count minutes per day over index

32 views Asked by At

I have a dataframe with irregular timestamps in seconds that spans over multiple days, I would like to create a new column and bucket these entries in minute buckets and have an increasing counter in a separate column. So all values that are within one minute should get the same counter value which increases with the number of minutes per day, on a new day the counter should start from 1 again.

                       Value    Counter
2020-01-01 10:00:00      7.       1
2020-01-01 10:00:05      45.       1
2020-01-01 10:00:10      25.       1
2020-01-01 10:02:00      85.       2
2020-01-02 07:00:00      51.       1
2020-01-02 10:00:00      52.       2

I thought about sth like this

df['Counter'] = df.groupby([df.index.dt.day, df.index.dt.minute]).count()

Which does not seem to work.

2

There are 2 answers

0
Andrej Kesely On BEST ANSWER

You can try .groupby inside .groupby + "ngroup":

df.index = pd.to_datetime(df.index)

df["Counter"] = df.groupby(df.index.date, group_keys=False).apply(
    lambda x: x.groupby([x.index.hour, x.index.minute]).transform("ngroup") + 1
)
print(df)

Prints:

                     Value  Counter
2020-01-01 10:00:00      7        1
2020-01-01 10:00:05     45        1
2020-01-01 10:00:10     25        1
2020-01-01 10:02:00     85        2
2020-01-02 07:00:00     51        1
2020-01-02 10:00:00     52        2
3
Dejene T. On

I hope this will help you.

# Reset the index to make the timestamp a column
df.reset_index(inplace=True)

# Group the entries into minute intervals and assign a counter
df['Counter'] = df.groupby(pd.Grouper(key='index', freq='Min')).cumcount() + 1

The output I got

                index  Value  Counter
0 2020-01-01 10:00:00      7        1
1 2020-01-01 10:00:05     45        2
2 2020-01-01 10:00:10     25        3
3 2020-01-01 10:02:00     85        1
4 2020-01-02 07:00:00     51        1
5 2020-01-02 10:00:00     52        1

I am not sure what exactly you want to achieve.