I am trying to apply the expectation-maximization algorithm to estimate missing count data but all the packages in R, such as missMethods, assume a multivariate Gaussian distribution. How would I apply the expectation-maximization algorithm to estimate missing count data assuming a Poisson distribution?
Say we have data that look like this:
x <- c(100, 96, 79, 109, 111, NA, 93, 95, 119, 90, 121, 96, NA,
NA, 85, 95, 110, 97, 87, 104, 101, 87, 87, NA, 89, NA,
113, NA, 95, NA, 119, 115, NA, 105, NA, 80, 90, 108, 90,
99, 111, 93, 99, NA, 87, 89, 87, 126, 101, 106)
Applying impute_EM using missMethods (missMethods::impute_EM(x, stochastic = FALSE)) gives an answer but the data are not continuous but discrete.
I understand that questions like these require a minimum, reproducible example, but I honestly do not know where to start. Even suggested reading to point me in the right direction would be helpful.
Defining
x0:The Jeffreys/reference prior for a Poisson distribution with mean
lambdais1/sqrt(lambda). From the observed values, this results inlambdahaving a gamma reference posterior with a shape parametersum(x0) + 0.5and a rate parameter1/length(x0). You could takensamples oflambdawith:Then sample
nmissing values (xm) withAlternatively, since a Gamma-Poisson compound distribution can be formulated as a negative binomial (after integrating out
lambda):As a function:
This will return a matrix with
ncolumns where each column contains the original vectorxwith allNAvalues imputed. Each column could be used separately in further analysis, then the results can be aggregated.