How to replace NA with latest value in unbalanced panel?

162 views Asked by At

I would like to replace NA with previous value in an unbalanced panel data (long format). Before replace, the data would look like

Firm    Date    Var_1       
AAA 19990430    NA      
AAA 19990531    10      
AAA 19990630    NA      
AAA 19990731    NA      
AAA 19990831    12      
AAA 19990930    NA      
BBB 20040331    NA      
BBB 20040430    NA      
BBB 20040531    8       
BBB 20040630    NA      
BBB 20040731    NA      
BBB 20040831    12      
BBB 20040930    NA      

After, it will look like:

Firm    Date    Var_1       
AAA 19990430    NA      
AAA 19990531    10      
AAA 19990630    10      
AAA 19990731    10      
AAA 19990831    12      
AAA 19990930    12      
BBB 20040331    NA      
BBB 20040430    NA      
BBB 20040531    8       
BBB 20040630    8       
BBB 20040731    8       
BBB 20040831    12      
BBB 20040930    12

The point is the two NAs at the beginning of Firm BBB shall not be replaced by the numbers from firm AAA. Thanks in advance.

1

There are 1 answers

0
David On

Take a look at zoo and the function na.locf().

Let's say your data.frame above is called df.

df$Firm<-factor(df$Firm)
library(zoo)

#turn your df into a zoo object
df.zoo<-zoo(df)

#split on "Firm" and put df into a list.
df.list<-split(df.zoo,f=df.zoo$Firm)

#apply na.locf function to both Firms
df.list$`AAA`<-na.locf(df.list$`AAA`, na.rm=F)
df.list$`BBB`<-na.locf(df.list$`BBB`, na.rm=F)

#paste your df back together
rbind(
data.frame(na.locf(df.list$`AAA`, na.rm=F)),
data.frame(na.locf(df.list$`BBB`, na.rm=F))
)

You will get

   Firm     Date Var_1
1   AAA 19990430  <NA>
2   AAA 19990531    10
3   AAA 19990630    10
4   AAA 19990731    10
5   AAA 19990831    12
6   AAA 19990930    12
7   BBB 20040331  <NA>
8   BBB 20040430  <NA>
9   BBB 20040531     8
10  BBB 20040630     8
11  BBB 20040731     8
12  BBB 20040831    12
13  BBB 20040930    12