using r-mice to limit the possible values to impute for a variable

55 views Asked by At

I have a question very similar to Alexia's (Imputing a categorical variable with MICE but restricting the possible values). However, I don't think i can use the response that makes the most sense to me, because I have a dataset in which there are multiple columns that I need to impute and which contain values that I do not want mice to use as a valid, imputable response (e.g. a person's response to a question about how many cigarettes they smoke per day is coded as -7 because they responded to an earlier item indicating that they don't smoke cigarettes). I don't want mice to impute -7 as a response to this item for rows/observations that currently contain an NA. Some of my variables have multiple values that I want to restrict as a possible outcome. Usually, the possible outcome that I want to restrict is a value that represents missing for reason that I don't want to impute.

So, if I remove a row from the dataset that contains a -7 in the "how many cigarettes?" column, it may have an NA in a different column that I wish to have imputed. I've searched for solutions to this and can't find anything.

The below creates a dataset that is, approximately, my problem.

df <- data.frame (var1  = c("NA", "1", "-5", "3", "-7"),
                  var2 = c("-7", "2", "4", "-7", "5"),
                  var3 = c("1", "2", "3", "-7", "NA")
                  )
print(df)
>  var1 var2 var3
>1   NA   -7    1
>2    1    2    2
>3    2    4   -5
>4    3   -7   -7
>5   -7    5   NA

Assume I want to use all variables as predictors and impute missing values in all variables. And, assume all variables are factors. If I remove rows 1, 4 and 5 to eliminate the possibility of r imputing -7 as a valid value for any variable, I will also prevent r from imputing a value for var1 in row 1 and var3 in row 5. Additionally, I will remove 3 as a valid, imputable response from row 4. I'm working with ~6200 observations, so I doubt the final example would happen in my actual dataset, it's at least theoretically possible so it's also an outcome I prefer to avoid. There is NOT this much missingness in my actual data, I'm simply trying to illustrate the problem(s) I'm experiencing.

I haven't tried anything yet because I haven't found a solution. Post-processing with "squeeze" doesn't work (I don't think) with a factor variable. I considered using "where" and creating a matrix in which all restricted values are not imputed, although I don't know if that restricts mice from using the values in its imputation. I've got ~530 variables, ~ 30 of which I want imputed. I haven't been dying to write the code for that without more assurance that it would restrict the imputation. This (https://github.com/amices/mice/issues/224#issuecomment-693935305) seemed like a possible solution, but I'm not a natural coder (by any means), so I wasn't certain how to implement the solution with my dataset.

Am I missing something obvious? Help? Thank you!

0

There are 0 answers