Pandas: How to extract and calculate the number of "hour" per row in a Dataframe

365 views Asked by At

I have a data frame representing the schedule of some restaurants in a week.

  • What i want to do is to add a column week_hours to my initial Dataframe df that represents the total number of hours the restaurant is open per week.

The Data (generated with df.head(20).to_dict('split'))

The days of the weeks in this example are in French enter image description here

{'index': [0,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9,
  10,
  11,
  12,
  13,
  14,
  15,
  16,
  17,
  18,
  19],
 'columns': ['restaurant_id',
  'lundi',
  'mardi',
  'mercredi',
  'jeudi',
  'vendredi',
  'samedi',
  'dimanche'],
 'data': [['lCwqJWMxvIUQt1Re_tDn4w',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0'],
  ['pd0v6sOqpLhFJ7mkpIaixw',
   '11:0-20:0',
   '11:0-20:0',
   '11:0-20:0',
   '11:0-20:0',
   '11:0-22:0',
   '11:0-22:0',
   '11:0-17:0'],
  ['0vhi__HtC2L4-vScgDFdFw',
   '11:30-22:0',
   '11:30-22:0',
   '11:30-22:0',
   '11:30-22:0',
   '11:30-22:0',
   '12:0-22:0',
   '16:30-21:30'],
  ['t65yfB9v9fqlhAkLnnUXdg',
   '11:30-21:0',
   '11:30-21:0',
   '11:30-21:0',
   '11:30-21:0',
   '11:30-21:0',
   nan,
   '11:30-21:0'],
  ['i7_JPit-2kAbtRTLkic2jA',
   '11:30-22:0',
   '11:30-22:0',
   '11:30-23:0',
   '11:30-23:0',
   '11:30-23:0',
   nan,
   nan],
  ['vMh4madPU3qhNX7P7d8WGA', nan, nan, nan, nan, nan, nan, nan],
  ['BsvCTCVG7lrzXZ68VyyIcg',
   '0:0-0:0',
   '11:0-2:30',
   '11:0-2:30',
   '11:0-2:30',
   '11:0-2:30',
   '11:0-2:30',
   '11:0-2:30'],
  ['es3Fq9KNp6Ry994x4T4ZYg',
   '6:30-16:0',
   '6:30-16:0',
   '6:30-16:0',
   '6:30-16:0',
   '6:30-16:0',
   '7:0-14:0',
   nan],
  ['Xb7jOAa17xtT_uA4sCCAsg', nan, nan, nan, nan, nan, nan, nan],
  ['1vrrpIhpK628PUA0XWWd8g',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0'],
  ['NYKxikYKbkacWumJ82TxzA',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0'],
  ['4sRJvmKh43AqMRrjdwEdwA',
   '11:0-22:0',
   '11:0-22:0',
   '11:0-22:0',
   '11:0-22:0',
   '11:0-23:0',
   '11:0-23:0',
   '11:0-23:0'],
  ['laac2uH1lQVzBjKFUjuA1Q',
   '7:0-14:0',
   '7:0-14:0',
   '7:0-14:0',
   '7:0-14:0',
   '7:0-14:0',
   '7:0-14:0',
   '8:0-14:0'],
  ['vVOoL5H8Fr-qlQv-_DdoMA',
   '10:0-22:0',
   '10:0-22:0',
   '10:0-22:0',
   '10:0-22:0',
   '10:0-23:0',
   '10:0-23:0',
   '10:0-22:0'],
  ['k1c4gg8Ri5dre6ruPUKxJQ',
   '9:0-21:30',
   '9:0-21:30',
   '9:0-21:30',
   '9:0-21:30',
   '9:0-22:30',
   '9:0-22:30',
   '12:0-21:0'],
  ['x9f9NBMweyyjCQHuc9K4sw',
   '11:0-17:0',
   '10:0-17:0',
   '10:0-17:0',
   '10:0-17:0',
   '10:0-18:0',
   '10:0-18:0',
   nan],
  ['KWfLQddMBZNoh1bVcgASfA',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0'],
  ['4ScLXRii_WwBn5PbGBI-eg', nan, nan, nan, nan, nan, nan, nan],
  ['LAswzVTnT3uCvnKr-SwxEg', nan, nan, nan, nan, nan, nan, nan],
  ['G_wqVaqV3TBsZPAIIRCU-Q',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0']]}

what is the syntax allowing me to extract the number of hours from each column (representing days in this case) to calculate the number of business hours in the week inside a new column?

If any clarifications or simpler example needed, please ask.

Edit - tried a solution listed below but the results does not make sens (in the first line for example?) enter image description here

1

There are 1 answers

5
D-E-N On BEST ANSWER

I think this should do your stuff. It saves the new data as float (in hours). With datetime (https://docs.python.org/3/library/datetime.html) you can easily calulate times, if you want to. This applays the function calculate_hours to the 7 given columns (days):

def calculate_hours(row: pd.Series) -> float:
    try:
        # split the given times to start and end time
        opening_time, closing_time = row.split("-")
        # split hours and minutes
        start_hour, start_minute = opening_time.split(":")
        end_hour, end_minute = closing_time.split(":")
        # calculate start time (in hours)
        start_time = float(start_hour) + float(start_minute) / 60
        # calculate end time (in hours)
        end_time = float(end_hour) + float(end_minute) / 60
        # handle overneight and 24h openings
        if start_time >= end_time:
            end_time += 24
        # return the duration from start time to end time
        return end_time - start_time
    # bare except are not recommended, you should look for your data, what could go wrong
    except:
        return 0.0


# Save the given data to the new column "open"
# sums up values for each day
df["open"] = df["lundi"].apply(calculate_hours) +\
             df["mardi"].apply(calculate_hours) +\
             df["mercredi"].apply(calculate_hours) +\
             df["jeudi"].apply(calculate_hours) +\
             df["vendredi"].apply(calculate_hours) +\
             df["samedi"].apply(calculate_hours) +\
             df["dimanche"].apply(calculate_hours)

PS: I used this to "import" your data, it does not look perfect, but i didn't know how to use your data better:

import pandas as pd
import datetime

all = {'index': [0,
                  1,
                  2,
                  3,
                  4,
                  5,
                  6,
                  7,
                  8,
                  9,
                  10,
                  11,
                  12,
                  13,
                  14,
                  15,
                  16,
                  17,
                  18,
                  19],
        'columns': ['restaurant_id',
                    'lundi',
                    'mardi',
                    'mercredi',
                    'jeudi',
                    'vendredi',
                    'samedi',
                    'dimanche'],
        'data': [['lCwqJWMxvIUQt1Re_tDn4w',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0'],
                 ['pd0v6sOqpLhFJ7mkpIaixw',
                  '11:0-20:0',
                  '11:0-20:0',
                  '11:0-20:0',
                  '11:0-20:0',
                  '11:0-22:0',
                  '11:0-22:0',
                  '11:0-17:0'],
                 ['0vhi__HtC2L4-vScgDFdFw',
                  '11:30-22:0',
                  '11:30-22:0',
                  '11:30-22:0',
                  '11:30-22:0',
                  '11:30-22:0',
                  '12:0-22:0',
                  '16:30-21:30'],
                 ['t65yfB9v9fqlhAkLnnUXdg',
                  '11:30-21:0',
                  '11:30-21:0',
                  '11:30-21:0',
                  '11:30-21:0',
                  '11:30-21:0',
                  None,
                  '11:30-21:0'],
                 ['i7_JPit-2kAbtRTLkic2jA',
                  '11:30-22:0',
                  '11:30-22:0',
                  '11:30-23:0',
                  '11:30-23:0',
                  '11:30-23:0',
                  None,
                  None],
                 ['vMh4madPU3qhNX7P7d8WGA', None, None, None, None, None, None, None],
                 ['BsvCTCVG7lrzXZ68VyyIcg',
                  '0:0-0:0',
                  '11:0-2:30',
                  '11:0-2:30',
                  '11:0-2:30',
                  '11:0-2:30',
                  '11:0-2:30',
                  '11:0-2:30'],
                 ['es3Fq9KNp6Ry994x4T4ZYg',
                  '6:30-16:0',
                  '6:30-16:0',
                  '6:30-16:0',
                  '6:30-16:0',
                  '6:30-16:0',
                  '7:0-14:0',
                  None],
                 ['Xb7jOAa17xtT_uA4sCCAsg', None, None, None, None, None, None, None],
                 ['1vrrpIhpK628PUA0XWWd8g',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0'],
                 ['NYKxikYKbkacWumJ82TxzA',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0'],
                 ['4sRJvmKh43AqMRrjdwEdwA',
                  '11:0-22:0',
                  '11:0-22:0',
                  '11:0-22:0',
                  '11:0-22:0',
                  '11:0-23:0',
                  '11:0-23:0',
                  '11:0-23:0'],
                 ['laac2uH1lQVzBjKFUjuA1Q',
                  '7:0-14:0',
                  '7:0-14:0',
                  '7:0-14:0',
                  '7:0-14:0',
                  '7:0-14:0',
                  '7:0-14:0',
                  '8:0-14:0'],
                 ['vVOoL5H8Fr-qlQv-_DdoMA',
                  '10:0-22:0',
                  '10:0-22:0',
                  '10:0-22:0',
                  '10:0-22:0',
                  '10:0-23:0',
                  '10:0-23:0',
                  '10:0-22:0'],
                 ['k1c4gg8Ri5dre6ruPUKxJQ',
                  '9:0-21:30',
                  '9:0-21:30',
                  '9:0-21:30',
                  '9:0-21:30',
                  '9:0-22:30',
                  '9:0-22:30',
                  '12:0-21:0'],
                 ['x9f9NBMweyyjCQHuc9K4sw',
                  '11:0-17:0',
                  '10:0-17:0',
                  '10:0-17:0',
                  '10:0-17:0',
                  '10:0-18:0',
                  '10:0-18:0',
                  None],
                 ['KWfLQddMBZNoh1bVcgASfA',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0'],
                 ['4ScLXRii_WwBn5PbGBI-eg', None, None, None, None, None, None, None],
                 ['LAswzVTnT3uCvnKr-SwxEg', None, None, None, None, None, None, None],
                 ['G_wqVaqV3TBsZPAIIRCU-Q',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0']]}


data = all["data"]
df = pd.DataFrame(data)
df.columns = all["columns"]