How to find rows that fall within time range from a dataframe?

44 views Asked by At

I want to find all rows from my data frame that fall between 7am and 11am inclusive

Using this code I read a csv into a data frame with the relevent data

df = pd.read_csv(info.csv)
amount_df = pd.DataFrame(df['amount'])
datetime_df = pd.DataFrame((pd.to_datetime(df['datetime'])).dt.time)
concat_df = pd.concat([datetime_df, amount_df], axis=1)

the data frame looks like this:

datetime amount
00:51:00 15.84
00:35:00 11.64
00:13:00 10.20
00:33:00 8.00
00:53:00 22.95

when I run the following code it gives me the correct times but it wont include the instances when time = 11:00:00

mask = (df['datetime'].dt.hour <= 6) & (df['datetime'].dt.hour >= 11)
concat_df = concat_df[~mask]

I have tried to use .loc but it wont return any instances of 11:00:00 exactly

2

There are 2 answers

2
P M On BEST ANSWER

NEW

In the meantime I think I found a better solution

df.between_time('7:00', '10:45')

Full code:

import pandas as pd

idx = pd.date_range("2024-01-01 06:00", periods=3600*6+5, freq="S")
df = pd.DataFrame([i for i in range(len(idx))], index=idx)

print(df.between_time('7:00', '11:00'))

returns:

2024-01-01 07:00:00   3600
2024-01-01 07:00:01   3601
2024-01-01 07:00:02   3602
2024-01-01 07:00:03   3603
2024-01-01 07:00:04   3604
...                    ...
2024-01-01 10:59:56  17996
2024-01-01 10:59:57  17997
2024-01-01 10:59:58  17998
2024-01-01 10:59:59  17999
2024-01-01 11:00:00  18000

OLD

You could create a mask that additionally asks for the 11:00:00 exclusively. So to find all times between 7am and 11 am you could use:

mask = ((df.index.hour >= 7) & (df.index.hour <= 10)) | ((df.index.hour == 11) & (df.index.minute == 0) & (df.index.second == 0))
df1 = df[mask]

which gives me:

                         time
2024-01-01 07:00:00  07:00:00
2024-01-01 07:00:01  07:00:01
2024-01-01 07:00:02  07:00:02
                         time
2024-01-01 10:59:58  10:59:58
2024-01-01 10:59:59  10:59:59
2024-01-01 11:00:00  11:00:00
0
mozway On

Your logic and exact data are unclear.

One sure thing, you might need to double check De Morgan's law.

Your current code:

mask = (df['datetime'].dt.hour <= 6) & (df['datetime'].dt.hour >= 11)

Is equivalent to:

mask = ~(df['datetime'].dt.hour <= 6) | ~(df['datetime'].dt.hour >= 11)

(since not (A and B) = (not A) or (not B))

Also equivalent to:

mask = (df['datetime'].dt.hour > 6) | (df['datetime'].dt.hour < 11)

Since NOT >= it < / NOT <= it >.

Thus, if you want to include 11 in the final selection, you must exclude if from your original (inverted) mask:

mask = (df['datetime'].dt.hour <= 6) & (df['datetime'].dt.hour > 11)
concat_df = concat_df[~mask]

The logic is the same if you want to include 6

now back to your original issue

"I want to find all rows from my data frame that fall between 7am and 11am inclusive"

Then (df['datetime'].dt.hour <= 6) & (df['datetime'].dt.hour >= 11) wouldn't work anyway since the hour is necessarily EITHER <=6 or >=11 (It can't be both) and your condition would always be False (irrespective of the >/>=).

What you probably want is:

mask = (df['datetime'].dt.hour >= 7) & (df['datetime'].dt.hour <= 11)
concat_df = concat_df[mask]

Or:

mask = df['datetime'].dt.hour.between(7, 11, inclusive='both')
concat_df = concat_df[mask]