How to plot a confidence interval in R

7.6k views Asked by At

So I need to graph a confidence interval for a prediction I ran. I can run the prediction, but when I go to graph the prediction I get a line through all of my data points as opposed to getting the actual confidence interval.

GunRate <- seq(0,100, length = 51)

LinearPredictionA <- predict(ModelA, 
    interval = "confidence", 
    newdata = data.frame(ProportionAdultsLivingWithGun = GunRate, 
                         LogMedianIncome = FinalSet$LogMedianIncome, 
                         PctofPeopleinMetro = FinalSet$PctofPeopleinMetro, 
                         PovertyRate = FinalSet$PovertyRate))

##This is my prediction model

plot(x = FinalSet$ProportionAdultsLivingWithGun, 
     y = FinalSet$ViolentCrime1K, 
     col = "red", 
     xlim = c(0, 80), ylim = c(0, 15), 
     xlab ="Proportion of Adults Living With a Gun", 
     ylab = "Violent Crime Rate per 1000", 
     main = "Violent Crime vs. Gun Ownership", 
     sub = "All 50 States & D.C.")

## This plot shows the actual data we used to obtain the prediction


lines(GunRate, LinearPredictionA[, "fit"], type = "l")
lines(GunRate, LinearPredictionA[, "lwr"], lty = "dashed", col = "green")
lines(GunRate, LinearPredictionA[, "upr"], lty = "dashed", col = "green")

These line functions are supposed to graph my CI, but instead I get the following graph

plot

1

There are 1 answers

0
eipi10 On

Here's an example of what's going wrong, using the built-in mtcars data frame:

# Regression model
m1 = lm(mpg ~ wt + hp + cyl, data=mtcars)

Now let's get predictions of mpg vs. wt, but with 2 different alternating values of hp and 3 different alternating values of cyl:

predData = data.frame(wt=seq(1,5,length=60), hp=rep(c(200,300), 30), cyl=rep(c(4,6,8), 20))
predData = cbind(predData, predict(m1, newdata=predData, interval="confidence"))

Note how the prediction jumps around, because hp and cyl change for each successive value of wt:

plot(predData$wt, predData$fit, type="l")
lines(predData$wt, predData$lwr, type="l", col="red")
lines(predData$wt, predData$upr, type="l", col="red")

enter image description here

But when we keep hp and cyl fixed, we get a straight line prediction for mpg vs. wt:

predData2 = data.frame(wt=seq(1,5,length=60), hp=rep(300,60), cyl=rep(6, 60))
predData2 = cbind(predData2, predict(m1, newdata=predData2, interval="confidence"))

plot(predData2$wt, predData2$fit, type="l")
lines(predData2$wt, predData2$lwr, type="l", col="red")
lines(predData2$wt, predData2$upr, type="l", col="red")

enter image description here

Instead of a single line, you can also plot predicted mpg vs. wt lines for several values of another variable. Below is an example where we plot a line for each value of cyl that we used to create predData. This is easier with ggplot2 so I've used that package. Using lines for the confidence intervals would make the plot difficult to understand, so I've shown the CI with a fill instead:

library(ggplot2)

ggplot(subset(predData, hp==200), aes(wt, fit, fill=factor(cyl), colour=factor(cyl))) +
  geom_ribbon(aes(ymin=lwr, max=upr), alpha=0.2, colour=NA) +
  geom_line() +
  labs(x="Weight", y="Predicted MPG", colour="Cylinders", fill="Cylinders") +
  theme_bw()

enter image description here