While estimating from the survey data involving clustering and using survey package of r, is it possible to estimate at the cluster level? For eg; for following survey design:
data(api)
dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
This is an example which is reproduced from the survey package. Here, dnum is district and fpc is number of school in the district. In this case, can we creat a subset at district level? For example, to estimate total enrollment in for district with code 637:
sub1=subset(dclus1, dnum==637)
svytotal(~enroll, sub1)
I got the following output:
total SE
enroll 205824 203774
I do not know whether it is correct method or not. Any help would be greatly appreciated.
i think it depends - and you might find that survey statisticians will disagree about whether you can do this in specific cases, but most would probably admit that, at least, you need to consider what it means for the data that you have before you can conclude your analysis is defensible.
consider how the sample was drawn and how many observations there were within the cluster. most complex sample surveys are not simple random samples, so both the clusters and the strata are not necessarily representative as individual pieces -- the survey design was constructed in order to construct a representative sample in aggregate but not at the sampling cluster level.
as one example, the bureau of labor statistics does not consider analyses using the
region
variable to be acceptable (region
is correlated with their sampling design) for the consumer expenditure surveyit's possible that a cluster could be only under-represented groups within some small village. an extreme example, but i'd recommend that you proceed with caution when subsetting your microdata using the design variables.