Produce Wally Plot using R

102 views Asked by At

I want to produce Wally plot using R and the dataset savings from the faraway package.

You can load the data and fit a linear model using this command:

library(faraway)
model <- lm(sr ~ ., data = savings)
summary(model)

Wally plots consist of several "ideal" reference plots that are obtained through simulations. Their intention is to reflect data for which model assumptions hold. Wally plots can be obtained as follows:

  • simulate new responses using estimates from "model" together with simulated error terms verifying the necessary assumptions,
  • refit the model using the simulated responses,
  • re-calculate fitted values and residuals, and
  • plot fitted values vs. residuals
  • (these steps might be repeated e.g. 10 times to produce several plots)

I need the R code to produce these plots. I hope everything is clear !

Thanks for your help.

2

There are 2 answers

3
Allan Cameron On BEST ANSWER

You could do

library(MESS)

wallyplot(model)

enter image description here

0
Adrien Riaux On

Another way to do it without using the MESS package is:

library(faraway)
library(dplyr)
library(ggplot2)

# Fit a linear model
model <- lm(sr ~ ., data = savings)

# Use the model to predict values
new_data <- select(savings, -sr)
y_pred <- predict(model, newdata = new_data)

# Simulate new responses using the fitted values and add normal distribution errors
sim.responses <- matrix(
  y_pred + rnorm(n = nrow(savings) * 9, mean = 0, sd = sigma(model)), 
  ncol = 9
)

# Add names to the matrix
colnames(sim.responses) <- paste("sim", 1:9, sep = "")
# Convert into a dataframe
sim.responses <- data.frame(sim.responses)

# Create a list for each new models
sim.models <- list()

# Loop through each column in sim.responses
for (col in colnames(sim.responses)) {
  # Fit a linear model using the column as the response variable
  model <- lm(paste(col, " ~ .", sep = ""), data = cbind(select(savings, -sr), sim.responses[col]))

  # Add the model to the list
  sim.models[[col]] <- model
}

# Create a dataframe for residuals and fitted values
resid_df <- data.frame()

for (name in colnames(sim.responses)){
  # Compute standardize residuals and fitted values
  std_residual <- rstandard(sim.models[[name]])
  fit_val <- fitted.values(sim.models[[name]])

  # Combine residuals and fitted values into data.frame
  model_df <- data.frame(Name = rep(name, length(std_residual)),
                         Fitted_Value = fit_val,
                         Standardized_Residual = std_residual)

  # Append model_df to resid_df
  resid_df <- rbind(resid_df, model_df)
}

# Plot
ggplot(data = resid_df, aes(x = Fitted_Value, y = Standardized_Residual)) +
  geom_point(alpha = 0.7) +
  geom_hline(yintercept = 0, alpha = 0.7, linetype = "dashed") +
  facet_wrap(~ Name, ncol = 3)

Here it's an example to produce 9 plots.

The resulting plot