time series filtering with .loc with two index DataFrame

181 views Asked by At

I have a time series pandas (df) table with many columns and with 2 indexes "date" and "ticker". I would like to use df.loc to select a specific range of dates , let say ("2000-01-03”: "2000-01-06”) and a specific “ticker” let say (“A”). In this way I would like to get all the info in the table related to these two criteria of all the other columns.

ex. of Data Frame

I tried the following

df.loc[("date", "2000-01-03”: "2000-01-06”),"A"]

Alternatively I wish to select all the tickers, I tired the following:

df.loc[("date", "2000-01-03”: "2000-01-06”),:]

both are not working. Any inside on how to use .loc in DataFrame with two index columns?

2

There are 2 answers

0
dimButTries On

It would be great to see a sample of the dataframe, next time you submit a question, it eliminates the guess work.

Taking a look at the limited information you have provided, here are two potentials ways to solve your use case.

Approach 1 - Pandas has a great function called date_range docs. From the docs:

Returns the range of equally spaced time points (where the difference between any two adjacent points is specified by the given frequency) such that they all satisfy start <[=] x <[=] end


import pandas as pd

# Convert the 'date' column to datetime if it's not already in datetime format
df['date'] = pd.to_datetime(df['date'])

# Set 'date' and 'ticker' columns as the index
df.set_index(['date', 'ticker'], inplace=True)

# Select the desired range of dates and the specific ticker
date_range = pd.date_range('2000-01-03', '2000-01-06')
ticker = 'A'

# Use df.loc to filter based on the date range and ticker
selected_data = df.loc[(date_range, ticker), :]

Approach 2 - Slice your dataframe using a boolean You can use boolean conditions to filter the DataFrame based on the desired date range and ticker. We extract the 'date' and 'ticker' levels from the multi-index using df.index.get_level_values, and then apply the conditions.

# Set 'date' and 'ticker' columns as the index
df.set_index(['date', 'ticker'], inplace=True)

# Select the desired range of dates and the specific ticker using boolean condition
date_start = '2000-01-03'
date_end = '2000-01-06'
ticker = 'A'

selected_data = df.loc[(df.index.get_level_values('date') >= date_start) &
                       (df.index.get_level_values('date') <= date_end) &
                       (df.index.get_level_values('ticker') == ticker), :]

Approach 3 - Slicing without creating a multiindex

# Convert the 'date' column to datetime if it's not already in datetime format
df['date'] = pd.to_datetime(df['date'])

# Filter the DataFrame based on date range and ticker
date_start = '2000-01-03'
date_end = '2000-01-06'
ticker = 'A'

selected_data = df.loc[(df['date'] >= date_start) & (df['date'] <= date_end) & (df['ticker'] == ticker)]
0
Timeless On

You can use IndexSlice this way :

select a specific range of dates let say ("2000-01-03”: "2000-01-06”) and a specific “ticker” let say (“A”).

df.loc[pd.IndexSlice["2023-01-03":"2023-01-06", "AAPL"], :]

                   Adj Close  Close   High    Low   Open     Volume
Date       Ticker                                                  
2023-01-03 AAPL       124.71 125.07 130.90 124.17 130.28  112117500
2023-01-04 AAPL       125.99 126.36 128.66 125.08 126.89   89113600
2023-01-05 AAPL       124.66 125.02 127.77 124.76 127.13   80962700
2023-01-06 AAPL       129.24 129.62 130.29 124.89 126.01   87754700

Alternatively I wish to select all the tickers,

df.loc[pd.IndexSlice["2023-01-03":"2023-01-06", :], :]

                   Adj Close  Close   High    Low   Open     Volume
Date       Ticker                                                  
2023-01-03 AAPL       124.71 125.07 130.90 124.17 130.28  112117500
           GOOG        89.70  89.70  91.55  89.02  89.83   20738500
2023-01-04 AAPL       125.99 126.36 128.66 125.08 126.89   89113600
           GOOG        88.71  88.71  91.24  87.80  91.01   27046500
2023-01-05 AAPL       124.66 125.02 127.77 124.76 127.13   80962700
           GOOG        86.77  86.77  88.21  86.56  88.07   23136100
2023-01-06 AAPL       129.24 129.62 130.29 124.89 126.01   87754700
           GOOG        88.16  88.16  88.47  85.57  87.36   26612600

Input used :

#pip install yfinance
import yfinance as yf

df = (yf.download("AAPL GOOG", start="2023-01-01", end="2023-01-31")
          .stack().rename_axis(index=["Date", "Ticker"]))