my problem is this: I get NA
where I should get some values in the computation of robust standard errors.
I am trying to do a fixed effect panel regression with cluster-robust standard errors. For this, I follow Arai (2011) who on p. 3 follows Stock/ Watson (2006) (later published in Econometrica, for those who have access). I would like to correct the degrees of freedom by (M/(M-1)*(N-1)/(N-K)
against downward bias as my number of clusters is finite and I have unbalanced data.
Similar problems have been posted before [1, 2] on StackOverflow and related problems [3] on CrossValidated.
Arai (and the answer in the 1st link) uses the following code for functions (I provide my data below with some further comment):
gcenter <- function(df1,group) {
variables <- paste(
rep("C", ncol(df1)), colnames(df1), sep=".")
copydf <- df1
for (i in 1:ncol(df1)) {
copydf[,i] <- df1[,i] - ave(df1[,i], group,FUN=mean)}
colnames(copydf) <- variables
return(cbind(df1,copydf))}
# 1-way adjusting for clusters
clx <- function(fm, dfcw, cluster){
# R-codes (www.r-project.org) for computing
# clustered-standard errors. Mahmood Arai, Jan 26, 2008.
# The arguments of the function are:
# fitted model, cluster1 and cluster2
# You need to install libraries `sandwich' and `lmtest'
# reweighting the var-cov matrix for the within model
library(sandwich);library(lmtest)
M <- length(unique(cluster))
N <- length(cluster)
K <- fm$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)*dfcw
coeftest(fm, vcovCL) }
,where the gcenter
computes deviations from the mean (fixed effect). I then continue and do the regression with DS_CODE
being my cluster variable (I have named my data 'data').
centerdata <- gcenter(data, data$DS_CODE)
datalm <- lm(C.L1.retE1M ~ C.MCAP_SEC + C.Impact_change + C.Mom + C.BM + C.PD + C.CashGen + C.NITA + C.PE + C.PEdummy + factor(DS_CODE), data=centerdata)
M <- length(unique(data$DS_CODE))
dfcw <- datalm$df / (datalm$df - (M-1))
and want to calculate
clx(datalm, dfcw, data$DS_CODE)
However, when I want to compute uj (see formula clx
above) for the variance, I get only at the beginning some values for my regressors, then lots of zeros. If this input uj is used for the variance, only NAs
result.
My data
Since my data may be of special structure and I can't figure out the problem, I post the entire thing as a link from Hotmail. The reason is that with other data (taken from Arai (2011)) my problem does not occur. Sorry in advance for the mess but I'd be very grateful if you could have a look at it nevertheless. The file is a 5mb .txt file containing purely data.
After some time playing around, it works for me and gives me:
This leaves us with the question why it doesn't for you. I guess it has something to do with the format of your data. Is everything numeric? I converted the column classes and it looks like that for me:
What does
str(data)
return for you?