I'm trying to create a column of dummy variables to panel data recording whether a treatment was applied to a firm. If a treatment (grant
) was applied in a particular year, the variable should record for all years corresponding to that firm. I know it would be appropriate to use the lapply /sapply
function or a dplyr group_by()
but I'm not really sure how to apply it. Below is the original data:
head(q3data_a)
A tibble: 6 x 30
year fcode employ sales avgsal scrap rework tothrs union grant d89 d88 totrain hrsemp lscrap lemploy
<int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1987 410032 100 4.70e7 35000 NA NA 12 0 0 0 0 100 12 NA 4.61
2 1988 410032 131 4.30e7 37000 NA NA 8 0 0 0 1 50 3.05 NA 4.88
3 1987 410440 12 1.56e6 10500 NA NA 12 0 0 0 0 12 12 NA 2.48
4 1988 410440 13 1.97e6 11000 NA NA 12 0 0 0 1 13 12 NA 2.56
5 1987 410495 20 7.50e5 17680 NA NA 50 0 0 0 0 15 37.5 NA 3.00
6 1988 410495 25 1.10e5 18720 NA NA 50 0 0 0 1 10 20 NA 3.22
# ... with 14 more variables: lsales <dbl>, lrework <dbl>, lhrsemp <dbl>, lscrap_1 <dbl>, grant_1 <int>,
# clscrap <dbl>, cgrant <int>, clemploy <dbl>, clsales <dbl>, lavgsal <dbl>, clavgsal <dbl>,
# cgrant_1 <int>, chrsemp <dbl>, clhrsemp <dbl>
And below is my ad-hoc solution. It works, but it does not generalize (and it would be difficult to implement for time periods past 2, for example).
dummy1 = c(rep(0,nrow(q3data_a))) #Encodes the treatment across all time periods
for (i in 1:nrow(q3data_a)){ #so if a firm receives a treatment in 1988, it receives a 1 in 1987
if(i%%2 == 0){
if (q3data_a[i,]$grant == 1){
dummy1[i-1] = 1
dummy1[i] = 1
}
}
}
Thanks for any advice.
Is this what you need?
df
looks like this:Output is