Subdividing panel data to apply a function

75 views Asked by At

I'm trying to create a column of dummy variables to panel data recording whether a treatment was applied to a firm. If a treatment (grant) was applied in a particular year, the variable should record for all years corresponding to that firm. I know it would be appropriate to use the lapply /sapply function or a dplyr group_by() but I'm not really sure how to apply it. Below is the original data:

head(q3data_a)
 A tibble: 6 x 30
   year  fcode employ  sales avgsal scrap rework tothrs union grant   d89   d88 totrain hrsemp lscrap lemploy
  <int>  <dbl>  <int>  <dbl>  <dbl> <dbl>  <dbl>  <int> <int> <int> <int> <int>   <int>  <dbl>  <dbl>   <dbl>
1  1987 410032    100 4.70e7  35000    NA     NA     12     0     0     0     0     100  12        NA    4.61
2  1988 410032    131 4.30e7  37000    NA     NA      8     0     0     0     1      50   3.05     NA    4.88
3  1987 410440     12 1.56e6  10500    NA     NA     12     0     0     0     0      12  12        NA    2.48
4  1988 410440     13 1.97e6  11000    NA     NA     12     0     0     0     1      13  12        NA    2.56
5  1987 410495     20 7.50e5  17680    NA     NA     50     0     0     0     0      15  37.5      NA    3.00
6  1988 410495     25 1.10e5  18720    NA     NA     50     0     0     0     1      10  20        NA    3.22
# ... with 14 more variables: lsales <dbl>, lrework <dbl>, lhrsemp <dbl>, lscrap_1 <dbl>, grant_1 <int>,
#   clscrap <dbl>, cgrant <int>, clemploy <dbl>, clsales <dbl>, lavgsal <dbl>, clavgsal <dbl>,
#   cgrant_1 <int>, chrsemp <dbl>, clhrsemp <dbl>

And below is my ad-hoc solution. It works, but it does not generalize (and it would be difficult to implement for time periods past 2, for example).

dummy1 = c(rep(0,nrow(q3data_a))) #Encodes the treatment across all time periods 
for (i in 1:nrow(q3data_a)){   #so if a firm receives a treatment in 1988, it receives a 1 in 1987
  if(i%%2 == 0){
    if (q3data_a[i,]$grant == 1){
      dummy1[i-1] = 1
      dummy1[i] = 1
    }
  }
}

Thanks for any advice.

1

There are 1 answers

1
ekoam On BEST ANSWER

Is this what you need?

library(dplyr)
df %>% group_by(fcode) %>% mutate(dummy1 = as.integer(any(grant > 0)))

df looks like this:

# A tibble: 12 x 3
    year  fcode grant
   <int>  <dbl> <int>
 1  1985 410032     0
 2  1986 410032     1
 3  1987 410032     1
 4  1988 410032     1
 5  1985 410440     1
 6  1986 410440     0
 7  1987 410440     1
 8  1988 410440     1
 9  1985 410495     0
10  1986 410495     0
11  1987 410495     0
12  1988 410495     0

Output is

# A tibble: 12 x 4
# Groups:   fcode [3]
    year  fcode grant dummy1
   <int>  <dbl> <int>  <int>
 1  1985 410032     0      1
 2  1986 410032     1      1
 3  1987 410032     1      1
 4  1988 410032     1      1
 5  1985 410440     1      1
 6  1986 410440     0      1
 7  1987 410440     1      1
 8  1988 410440     1      1
 9  1985 410495     0      0
10  1986 410495     0      0
11  1987 410495     0      0
12  1988 410495     0      0