I have a dataframe with two date fields as shown below. I want to be able to use this data to calculate 'adjusted pay' for an employee - if the employee joined after the 15th of a month, they are paid for 15 days of March + April on the 10th of the month (payday), and equally if they leave in April, the calculation should only consider the days worked in April.
Hire_Date | Leaving_Date
_________________________
01/02/2007 | NaN
02/03/2007 | NaN
23/03/2020 | Nan
01/01/1999 | 04/04/2020
Oh and the above data didn't pull through in datetime format, and there are of course plenty of NaNs in the leaving_date field :)
Therefore, I did the following:
Converted the data to datetime format, retained the date, and filled N/As with a random date (not too happy about this, but this is only missing in a few records so not worried about the impact).
df['Hire_Date'] = pd.to_datetime(df['Hire_Date'])
df['Hire_Date'] = [a.date() for a in df['Hire_Date']]
df['Hire_Date'] = df['Hire_Date'].fillna('1800-01-01')
Repeated for Leaving date. Only difference here is that I've filled the NaNs with 0, given that we don't have that many leavers.
df['Leaving_Date'] = pd.to_datetime(df['Leaving_Date'])
df['Leaving_Date'] = [a.date() for a in df['Leaving_Date']]
df['Leaving_Date'] = df['Leaving_Date'].fillna('0')
I then ended up creating a fresh column to capture workdays, and here's where I run into the issue. My code is given below.
I identified the first day of the hire month, and attempted to work out the number of days worked in March, using a np.where()
function.
df['z_First_Day_H_Month'] = df['Hire_Date'].values.astype('datetime64[M]')
df['March_Workdays'] = np.where((df['z_First_Day_H_Month'] >= '2020-03-01'),
(np.busday_count(df['z_First_Day_H_Month'], '2020-03-31')), 'N/A')
Similar process repeated, albeit a simpler calculation to work out the number of days worked in the termination month.
df['z_First_Day_T_Month'] = df.apply(lambda x: '2020-04-01').astype('datetime64[M]')
df['T_Mth_Workdays'] = df.apply(lambda x: np.busday_count(x['z_First_Day_T_Month'],
x['Leaving_Date'])
However, the above process returns the following error:
iterator operand 0 dtype could not be cast from dtype(' m8 [ns] ') to dtype(' m8 [d] according to rule 'safe' ')
Please can I get some help to fix this issue? Thanks!
I did a bit of research and seems like that the datetime format might be a problem. The [ns] format has precision of nanoseconds and
np.busday_count
asks for date format, which is [D], causing error. Take a look at this numpy document and check Datetime Units Section.Numpy, TypeError: Could not be cast from dtype('<M8[us]') to dtype('<M8[D]')
Take a look at this post. It is exact same problem as yours!