Create an order column in a pandas dataframe based on one date column relative to another date column

Question

Create an order column in a pandas dataframe based on one date column relative to another date column

72 views Asked by Eoin Vaughan At 03 March 2024 at 11:30

I have an extract where I'm required to identify a certain type of surgery X seen in the Surg Type column.

I need to retain medical appointments seen as distinct rows over a window/period of time, 3 appointment's prior (-3, -2, -1) and 3 appointment's post (+1, +2, +3)

I must include this order as an additional column.

On top of this I need to exclude any appointments outside of the window and any other type of Surg Type, in this example any surgery's denoted Z.

In this example 7/9 rows/record's I want retained and an additional column Prior Post

*** UPDATED EXAMPLE ***

Original Df

| Patient ID | Surg ID | Surg Type | Surg Date  | Medical Appt Date |
|------------|---------|-----------|------------|-------------------|
| 1          | 1       | X         | 2022-09-03 | 2022-01-01        |
| 1          | 1       | X         | 2022-09-03 | 2022-03-04        |
| 1          | 1       | X         | 2022-09-03 | 2022-05-04        |
| 1          | 1       | X         | 2022-09-03 | 2022-06-04        |
| 1          | 1       | X         | 2022-09-03 | 2022-11-04        |
| 1          | 1       | X         | 2022-09-03 | 2022-11-29        |
| 1          | 2       | Z         | 2022-12-01 | 2022-11-01        |
| 1          | 1       | X         | 2022-09-03 | 2023-01-02        |
| 1          | 1       | X         | 2022-09-03 | 2023-01-13        |



Desired Df

| Patient ID | Surg ID | Surg Type | Surg Date  | Medical Appt Date | Inclusion |
|------------|---------|-----------|------------|-------------------|-----------|
| 1          | 1       | X         | 2022-09-03 | 2022-01-01        | Exclude   |
| 1          | 1       | X         | 2022-09-03 | 2022-03-04        | -3        |
| 1          | 1       | X         | 2022-09-03 | 2022-05-04        | -2        |
| 1          | 1       | X         | 2022-09-03 | 2022-06-04        | -1        |
| 1          | 1       | X         | 2022-09-03 | 2022-11-04        | 1         |
| 1          | 1       | X         | 2022-09-03 | 2022-11-29        | 2         |
| 1          | 2       | Z         | 2022-12-01 | 2022-11-01        | -1        |
| 1          | 1       | X         | 2022-09-03 | 2023-01-02        | 3         |
| 1          | 1       | X         | 2022-09-03 | 2023-01-13        | Exclude   |

Original Q&A

There are 2 answers

Panda Kim On 03 March 2024 at 11:54

Code

at first make datetime

df['Surg Date'] = pd.to_datetime(df['Surg Date'], errors='coerce')
df['Medical Appt Date'] = pd.to_datetime(df['Medical Appt Date'], errors='coerce')

make condition and use np.select

import numpy as np
cond1 = df['Surg Type'].eq('X')
cond2 = df['Surg Date'].gt(df['Medical Appt Date'])
cond3 = df['Surg Date'].lt(df['Medical Appt Date'])
s1 = (cond1 & cond2)[::-1].cumsum().mul(-1)
s2 = (cond1 & cond3).cumsum()

condlist = [~cond1 | s1.lt(-3) | s2.gt(3), cond2, cond3]

df['Inclusion '] = np.select(condlist, ['Exclude Row', s1[::-1], s2], np.NaN)

df:

Example Code

import pandas as pd
data1 = {'Patient ID': [1, 1, 1, 1, 1, 1, 1, 1, 1], 'Surg ID': [1, 1, 1, 1, 1, 1, 2, 1, 1], 'Surg Type': ['X', 'X', 'X', 'X', 'X', 'X', 'Z', 'X', 'X'], 'Surg Date': ['2022-09-03', '2022-09-03', '2022-09-03', '2022-09-03', '2022-09-03', '2022-09-03', '2022-12-01', '2022-09-03', '2022-09-03'], 'Medical Appt': ['Y', 'Y', 'Y', 'N', 'Y', 'Y', 'N', 'Y', 'Y'], 'Medical Appt Date': ['2022-01-01', '2022-03-04', '2022-05-04', 'NaT', '2022-11-04', '2022-11-29', 'NaT', '2023-01-02', '2023-01-13']}
df = pd.DataFrame(data1)

**mozway** · Accepted Answer · 2024-03-03T11:51:09+00:00

You can filter the surgeries X, then compute the time difference between the appointment and the surgery and compute the rank for negative and positive numbers per group with groupby.apply. Finally, mask the ranks outside of threshold.

# number of medical appointments to keep before/after a surgery
N = 3

# columns to use a grouper
group_cols = ['Patient ID', 'Surg ID']

# ensure datetime
df[['Surg Date', 'Medical Appt Date']] = df[['Surg Date', 'Medical Appt Date']].apply(pd.to_datetime)

# custom function to compute the rank of the Appt Date
def flag_nearest(g, N):
    diff = g['Medical Appt Date'].sub(g['Surg Date'])
    m = diff>'0'
    ranks = (diff.where(m).rank()
             .fillna(-diff.mask(m).rank(ascending=False))
             # optional, to have integers in the output
             .astype(int).astype(object)
            )
    return ranks.where(ranks.abs()<=N, 'Excluded').to_frame()

# identify non-X rows
m = df['Surg Type'].eq('X')

# compute the rank and flag columns outside of threshold
df['Inclusion'] = (df[m].groupby(group_cols)
                        .apply(flag_nearest, N)
                        .droplevel(group_cols)
                        # optional, to fill the non-X
                        .reindex(df.index, fill_value='non-X')
                  )

Output:

   Patient ID  Surg ID Surg Type  Surg Date Medical Appt Date Inclusion
0           1        1         X 2022-09-03        2022-01-01  Excluded
1           1        1         X 2022-09-03        2022-03-04        -3
2           1        1         X 2022-09-03        2022-05-04        -2
3           1        1         X 2022-09-03        2022-06-04        -1
4           1        1         X 2022-09-03        2022-11-04         1
5           1        1         X 2022-09-03        2022-11-29         2
6           1        2         Z 2022-12-01        2022-11-01     non-X
7           1        1         X 2022-09-03        2023-01-02         3
8           1        1         X 2022-09-03        2023-01-13  Excluded

Extended example:

    Patient ID  Surg ID Surg Type  Surg Date Medical Appt Date Inclusion
0            1        1         X 2022-09-03        2022-01-01  Excluded
1            1        1         X 2022-09-03        2022-03-04        -3
2            1        1         X 2022-09-03        2022-05-04        -2
3            1        1         X 2022-09-03        2022-06-04        -1
4            1        1         X 2022-09-03        2022-11-04         1
5            1        1         X 2022-09-03        2022-11-29         2
6            1        2         Z 2022-12-01        2022-11-01     non-X
7            1        1         X 2022-09-03        2023-01-02         3
8            1        1         X 2022-09-03        2023-01-13  Excluded
9            2        1         X 2022-09-05        2022-01-01  Excluded
10           2        1         X 2022-09-05        2022-01-02  Excluded
11           2        1         X 2022-09-05        2022-03-04        -3
12           2        1         X 2022-09-05        2022-05-04        -2
13           2        1         X 2022-09-05        2022-06-04        -1
14           2        1         X 2022-09-05        2022-11-04         1
15           2        1         X 2022-09-05        2022-11-29         2
16           2        2         Z 2022-12-02        2022-11-01     non-X
17           2        1         X 2022-09-05        2023-01-02         3
18           2        1         X 2022-09-05        2023-01-13  Excluded

TechQA.

Create an order column in a pandas dataframe based on one date column relative to another date column

There are 2 answers

Related Questions in PANDAS

Related Questions in RANK

Popular Questions

Trending Questions