Obtaining standardized coefficients after multiple imputation with MICE in R

126 views Asked by At

I ran multiple imputations to deal with my missing data. Then I used the with() and pool() functions to run a linear regression for my dataset and get a pooled estimate. I am trying to predict a score across two groups (intervention and control).

Because I have so many variables and scores, I ran the imputations in blocks. Each group of questions related to one scale is imputed together, and so on.

Now I want to get standardized coefficients.

I tried to standardize my dataset before the imputation, but the standardized estimate is very close to unstandardized estimate (1.53 vs. -1.82) Does that make sense?

When I standardize the final scale directly instead of standardizing each question and then summing them at the regression step, I get a very small standardized coefficient (-0.24).

My two questions are

  1. Which method is the most accurate? standardizing each question or standardizing the final scale
  2. How to obtain standardized betas after imputation?

Here is my code to explain the things above.

`####read data
 data <- read.csv("post_for_imputation.csv")

 #####selected columns to impute
 columns_to_check4 <- c(
 "post_BSocialMAddictionS_Q1", "post_BSocialMAddictionS_Q2", 
 "post_BSocialMAddictionS_Q3", 
  "post_BSocialMAddictionS_Q4", "post_BSocialMAddictionS_Q5", 
  "post_BSocialMAddictionS_Q6")

 ####convert them to dataframe
selected_columns <- data %>%
    select(all_of(columns_to_check4)) 

#####use the scale() function to standardize the data
 j <- scale(selected_columns)
 j_df <- as.data.frame(j)

 ####add my independent variable (it is categorical and it doesn't work with the scale 
 function this is why I am adding it after scaling the data - it has no missings.
 column_to_add <- data$group_post

 # Adding the column to dataset1
 j_df <- cbind(j_df, group_post = column_to_add)

 ####run my imputation
imputed_data <- mice(j_df,m = 5, maxit = 10,  seed = 500)

####pool my data
X <- with(imputed_data, lm(
   I(as.numeric(post_BSocialMAddictionS_Q1) +
  as.numeric(post_BSocialMAddictionS_Q2) +
  as.numeric(post_BSocialMAddictionS_Q3) +
  as.numeric(post_BSocialMAddictionS_Q4) +
  as.numeric(post_BSocialMAddictionS_Q5) +
  as.numeric(post_BSocialMAddictionS_Q6))
 ~ group_post))            
 summary(pool(X))`

This method gives me the standardized coefficient that is very close to the unstandardized. Is there any better way to do this? Is this even accurate? And which one should I consider? the standardized calculation when using the sum directly or when summing the data at regression (like in the code above)

0

There are 0 answers