I have a dataset consisting of 2 continuous variables X1, X2 with missing values in both, and I need to impute the missing data. I am working with the MICE package in R. The trouble is that the values in one column are conditional on the other, specifically X1 >= X2. However, when I run mice, values are imputed that violate this condition.
Here is a minimal working example:
library(MASS)
library(tidyverse)
library(mice)
p1 <- 0.7
p2 <- 0.65
sample_size <- 100
sample_meanvector <- c(5, 5)
sample_covariance_matrix <- matrix(c(10, 5, 2, 9), ncol = 2)
mvrnorm(
n = sample_size,
mu = sample_meanvector,
Sigma = sample_covariance_matrix) %>%
data.frame() %>%
as_tibble() %>%
mutate(R1 = rbinom(sample_size, 1, p1)) %>%
mutate(R2 = rbinom(sample_size, 1, p2)) %>%
mutate(X1 = ifelse(R1 == 1, X1, NA)) %>%
mutate(X2 = ifelse(R2 == 1, X2, NA)) %>%
dplyr::select(X1, X2) %>%
filter(X1 >= X2 | is.na(X1) | is.na(X2)) -> sample_data
sample_data %>%
ggplot(aes(x=X1,y=X2)) +
geom_point() +
geom_abline(slope = 1, intercept = 0, color = 'red')
mice(sample_data, m=1) -> mids
complete(mids, 1) -> imputed_data
imputed_data %>%
ggplot(aes(x=X1,y=X2)) +
geom_point() +
geom_abline(slope = 1, intercept = 0, color = 'red')
I understand that I need to use the post feature somehow but I cannot find detailed enough documentation on this feature, specifically to help in the situation where the imputed values are constrained by other imputed values in the same dataset. Please help.
The easiest solution to your problem is to use a different
Rpackage:smcfcs. For example:If you do want to use
mice, what is the specific conditioning that you need? The conditional imputation example in FIMD squeezes the imputed values within a certain range as follows:Otherwise, take a look at the
micepostprocessing vignette or this answer.