Below dataset is my dataset and want expected output (Shown below) using overlaps function:
Input dataset:
values_1 | Id | DeptId | Start_DateTime | End_DateTime |
---|---|---|---|---|
PNM | 4579 | FHDU | 2023-09-04 14:15:29 | 2023-09-04 18:25:22 |
PNM | 1278 | FHDU | 2023-09-04 14:45:28 | 2023-09-04 18:35:19 |
POH | 8579 | ASKG | 2023-09-04 15:35:29 | 2023-09-04 17:25:22 |
POH | 3449 | ASKG | 2023-09-04 15:45:28 | 2023-09-04 18:35:19 |
WEH | 9124 | ASKG | 2023-09-04 17:25:28 | 2023-09-04 19:43:13 |
WEH | 4579 | ASKG | 2023-09-04 16:15:21 | 2023-09-04 18:24:02 |
Expected Output:
values_1 | Id | DeptId | Start_DateTime | End_DateTime |
---|---|---|---|---|
PNM | 4579 | FHDU | 2023-09-04 14:15:29 | 2023-09-04 18:25:22 |
WEH | 4579 | ASKG | 2023-09-04 16:15:21 | 2023-09-04 18:24:02 |
For this code below, we know that it provides overlapping for 2 intervals (Start_DateTime and End_DateTime) but my question is: How do I incorporate "Id" column to be included in this function below so the output is showing same as Expected Output:.
Thank you for your time and appreciate your help in advance!
from itertools import combinations
def find_overlaps(g):
idx = pd.IntervalIndex.from_arrays(
g["Start_DateTime"], g["End_DateTime"], closed="both"
)
to_keep = set()
for (i1, interval1), (i2, interval2) in combinations(zip(g.index, idx.values), 2):
if interval1.overlaps(interval2):
to_keep.add(i1)
to_keep.add(i2)
return g.loc[list(to_keep)]
df["Start DateTime"] = pd.to_datetime(df["Start_DateTime"])
df["End DateTime"] = pd.to_datetime(df["End_DateTime"])
out = df.groupby("values_1", group_keys=False).apply(find_overlaps)
print(out)
Thank you again!