Filter out rows based on unique column values (Id) in pandas

40 views Asked by At

Below dataset is my dataset and want expected output (Shown below) using overlaps function:

Input dataset:

values_1 Id DeptId Start_DateTime End_DateTime
PNM 4579 FHDU 2023-09-04 14:15:29 2023-09-04 18:25:22
PNM 1278 FHDU 2023-09-04 14:45:28 2023-09-04 18:35:19
POH 8579 ASKG 2023-09-04 15:35:29 2023-09-04 17:25:22
POH 3449 ASKG 2023-09-04 15:45:28 2023-09-04 18:35:19
WEH 9124 ASKG 2023-09-04 17:25:28 2023-09-04 19:43:13
WEH 4579 ASKG 2023-09-04 16:15:21 2023-09-04 18:24:02

Expected Output:

values_1 Id DeptId Start_DateTime End_DateTime
PNM 4579 FHDU 2023-09-04 14:15:29 2023-09-04 18:25:22
WEH 4579 ASKG 2023-09-04 16:15:21 2023-09-04 18:24:02

For this code below, we know that it provides overlapping for 2 intervals (Start_DateTime and End_DateTime) but my question is: How do I incorporate "Id" column to be included in this function below so the output is showing same as Expected Output:.

Thank you for your time and appreciate your help in advance!

        from itertools import combinations
    
    
    def find_overlaps(g):
        idx = pd.IntervalIndex.from_arrays(
            g["Start_DateTime"], g["End_DateTime"], closed="both"
        )
    
        to_keep = set()
        for (i1, interval1), (i2, interval2) in combinations(zip(g.index, idx.values), 2):
            if interval1.overlaps(interval2):
                to_keep.add(i1)
                to_keep.add(i2)
    
        return g.loc[list(to_keep)]
    
    
    df["Start DateTime"] = pd.to_datetime(df["Start_DateTime"])
    df["End DateTime"] = pd.to_datetime(df["End_DateTime"])
    
    out = df.groupby("values_1", group_keys=False).apply(find_overlaps)
    print(out)

Thank you again!

0

There are 0 answers