Pandas find consecutive ones, column wise

95 views Asked by At

I am having an output data frame like the one below and I wanted to format the output so that I can use it for the further pipeline.

Few pointers about the data frame:

1)This data frame is the weekly workload data for employees.
2)load 0, load 30, load 100, etc, represents half an hour slot. Each load is a half an hour shift.
2) Whenever "1" starts it represents a shift start and whenever "BREAK" appears it represents a break slot/shift. 

For example: In row 1, for the employee 1234, his shift starts at 12:00 and ends at 2:00, and in between, he is having a break from 1:00 to 1:30

employee date store   load0 load30 load100 load130  load200 load230 load300
1234  2021-12-1 450     1     1      BREAK    1       1       0        0
1234  2021-12-2 450     0     1      1     BREAK      1       1        0
5678 2021-12-1  650     0     0      0       0        1       1        0
5678 2021-12-2  650     0     0      1       1       BREAK    1        0 

For the above example the output should be something like:

     Start          End           Segment type
date+12:00:00    date+1:00:00   Regular_segment
date+1:00:00     date+1:30:00   Break segment
date+1:30:00     date+2:30:00   Regular segment

Ps. there are around 350 employees and for every employee, there will be a schedule like this for less than 7 days in a week

I want the output like BELOW:

 employee store Start                       End                SegmentType
    0   1234    450      2021-12-1T12:00:00Z    2021-12-1T12:30:00Z REGULAR_SEGMENT
    1   1234    450      2021-12-1T1:00:00Z     2021-12-1T1:30:00Z  BREAK_SEGMENT
    2   1234    450      2021-12-1T1:30:00Z     2021-12-1T2:00:00Z  REGULAR_SEGMENT
    3   1234    450     2021-12-2T12:30:00Z     2021-12-2T1:00:00Z  REGULAR_SEGMENT
    4   1234    450     2021-12-2T1:30:00Z      2021-12-2T2:20:00Z  BREAK_SEGMENT
    5   1234    450     2021-12-2T2:00:00Z      2021-12-2T2:30:00Z  REGULAR_SEGMENT
    6   5678    650     2021-12-1T2:00:00Z      2021-12-1T2:30:00Z  REGULAR_SEGMENT
    7   5678    650     2021-12-2T1:00:00Z      2021-12-1T2:30:00Z  REGULAR_SEGMENT
    8   5678    650     2021-12-2T2:00:00Z      2021-12-2T2:00:00Z  BREAK_SEGMENT
    9   5678    650      2021-12-2T2:30:00Z     2021-12-2T2:30:00Z  REGULAR_SEGMENT
         
1

There are 1 answers

1
William Rosenbaum On

I hope this will work!

from datetime import timedelta

def segment_type(df: pd.DataFrame) -> pd.DataFrame:
    df_melt = df.melt(id_vars=['employee', 'date', 'store'], var_name='time')
    df_melt['time'] = df_melt['time'].str.replace('load', '').astype(int)
    df_melt['hour'] = [int(str(x)[0]) + 12 if x != 30 else 12 for x in df_melt['time']]
    df_melt['hour'] = df_melt['hour'].astype(str)
    df_melt['minute'] = [str(str(x)[1:]) if x not in [0, 30] else 
                         '00' if x == 0 else
                         '30' for x in df_melt['time']]
    df_melt['clock'] = df_melt['hour'] + ':' + df_melt['minute']
    
    df_melt['date'] = df_melt['date'] + '-' + df_melt['clock'].astype(str)
    df_melt['date'] = df_melt['date'].astype('datetime64[ns]')
    
    df_melt['start'] = df_melt['date']
    df_melt['end'] = df_melt['start'] + timedelta(minutes=30)
    
    df_melt = df_melt[df_melt['value'].isin(['1', 'BREAK'])]
    df_melt['SegmentType'] = ['REGULAR_SEGMENT' if x == '1' else
                              'BREAK_SEGMENT' for x in df_melt['value']]
    
    df_melt = df_melt[['employee', 'date', 'start', 'end', 'SegmentType']]
    df_melt.sort_values(['employee', 'date'], inplace=True, ignore_index=True)

    return df_melt


new_frame = segment_type(df)