I am analyzing staff turnover and need to create (1) a count of the number of staff hired and exited in a given year and (2) compute a cumulative "total staff" count across years. I have hire and exit dates like this:
ssh<-structure(list(HireDate = structure(c(1358, 4291, 5121, 6923, 9678, 12037, 16353, 17003, 18976, 19312, 19312, 19011), class = "Date"), ExitDate = structure(c(15861, 15401, 17140, 17347, NA, NA, 16911, 18856, 19193, NA, NA, NA), class = "Date"), id = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12")), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))
ssh$hireyear<-lubridate::year(ssh$HireDate)
ssh$exityear<-lubridate::year(ssh$ExitDate)
ssh$group<-c("a","b","c","a","","b","a","","","b","b","c")
For the simple accounting of hires and exits, I'd like to have a dummy variable for EACH year. So for the data above, if staff was hired in 2014, create a new column hired2014 equal to 1, else 0, like this:
ssh$hire1984<-ifelse(ssh$hireyear==1984,1,0)
or
ssh$exit2012<-ifelse(ssh$exityear==2012,1,0)
My full dataset ranges between 1972 and 2023, so I'd like an efficient method to compute all possible variables for any date range. This would yield a dataframe with many columns -- one for each year.
Next, I'd like to format the resulting dataframe by year, something like this:
Year NumberHired NumberExited NetChange CurrentTotal
1972 4 0 4 4
1973 2 1 1 5
1974 3 4 -1 4
.
.
2010 25 11 14 541 ...etc
I experimented creating a lookup table of years that I might populate with an aggregation of the dummy variables but am hitting a wall. Also, other solutions I've found on stackoverflow tend to focus on creating only one dummy variable.
Ideas? Thx!
We can use
tidyr::pivot_widerto make your dummies:And the summary table is fairly easily computed from your original data: