Pandas: Getting 10 rows above a selected date

465 views Asked by At

I am trying to use get_loc to get the current date and then return the 10 rows above the current date from the data, but I keep getting a Key Error.

Here is my datable => posting_df5:

   Posting_date  rooms  Origin  Rooms booked         ADR       Revenue
0    2019-03-31      1       1             1  156.000000    156.000000
1    2019-04-01     13      13            13  160.720577   2089.367500
2    2019-04-02     15      15            15  167.409167   2511.137500
3    2019-04-03     21      21            21  166.967405   3506.315500
4    2019-04-04     37      37            37  162.384909   6008.241643
5    2019-04-05     52      52            52  202.150721  10511.837476
6    2019-04-06     49      49            49  199.611887   9780.982476
7    2019-04-07     44      44            44  182.233171   8018.259527
8    2019-04-08     50      50            50  187.228192   9361.409623
9    2019-04-09     37      37            37  177.654422   6573.213623
10   2019-04-10     31      31            31  184.138208   5708.284456

I tried doing the following:

idx = posting_df7.index.get_loc('2019-04-05')
posting_df7 = posting_df5.iloc[idx - 5 : idx + 5]

But I received the following error:

    indexer = self._get_level_indexer(key, level=level)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2939, in _get_level_indexer
    code = level_index.get_loc(key)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 128, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index_class_helper.pxi", line 91, in pandas._libs.index.Int64Engine._check_type
KeyError: '2019-04-05'

So, I then tried to first index Posting_date before using get_loc but it didn't work as well:

                 rooms  Origin  Rooms booked         ADR       Revenue
   Posting_date                                                       
0  2019-03-31        1       1             1  156.000000    156.000000
1  2019-04-01       13      13            13  160.720577   2089.367500
2  2019-04-02       15      15            15  167.409167   2511.137500
3  2019-04-03       21      21            21  166.967405   3506.315500
4  2019-04-04       37      37            37  162.384909   6008.241643
5  2019-04-05       52      52            52  202.150721  10511.837476
6  2019-04-06       49      49            49  199.611887   9780.982476
7  2019-04-07       44      44            44  182.233171   8018.259527
8  2019-04-08       50      50            50  187.228192   9361.409623
9  2019-04-09       37      37            37  177.654422   6573.213623

Then I used the same get_loc function but the same error appeared. How can I select the row based no the date required.

Thanks

1

There are 1 answers

0
s3dev On BEST ANSWER

Here is a a different approach ...

Because iloc and get_loc can be tricky, this solution uses boolean masking to return the rows relative to a given date, then use the head() function to return the number of rows you require.

import pandas as pd

PATH = '/home/user/Desktop/so/room_rev.csv'

# Read in data from a CSV.
df = pd.read_csv(PATH)
# Convert the date column to a `datetime` format.
df['Posting_date'] = pd.to_datetime(df['Posting_date'], 
                                    format='%Y-%m-%d')
# Sort based on date.
df.sort_values('Posting_date')

Original Dataset:

   Posting_date  rooms  Origin  Rooms booked         ADR       Revenue
0    2019-03-31      1       1             1  156.000000    156.000000
1    2019-04-01     13      13            13  160.720577   2089.367500
2    2019-04-02     15      15            15  167.409167   2511.137500
3    2019-04-03     21      21            21  166.967405   3506.315500
4    2019-04-04     37      37            37  162.384909   6008.241643
5    2019-04-05     52      52            52  202.150721  10511.837476
6    2019-04-06     49      49            49  199.611887   9780.982476
7    2019-04-07     44      44            44  182.233171   8018.259527
8    2019-04-08     50      50            50  187.228192   9361.409623
9    2019-04-09     37      37            37  177.654422   6573.213623
10   2019-04-10     31      31            31  184.138208   5708.284456

Solution:

Replace the value in the head() function with the number of rows you want to return. Note: There is also a tail() function for the inverse.

df[df['Posting_date'] > '2019-04-05'].head(3)

Output:

  Posting_date  rooms  Origin  Rooms booked         ADR      Revenue
6   2019-04-06     49      49            49  199.611887  9780.982476
7   2019-04-07     44      44            44  182.233171  8018.259527
8   2019-04-08     50      50            50  187.228192  9361.409623