4. Now fit a linear model for each metric and use the confint function to compare the estimates. (Batting)

1k views Asked by At

Here's what I've done so far, I'm having difficulty figuring out the regression line.

  1. Before we get started, we want to generate two tables. One for 2002 and another for the average of 1999-2001 seasons. We want to define per plate appearance statistics. Here is how we create the 2017 table. Keeping only players with more than 100 plate appearances. Now compute a similar table but with rates computed over 1999-2001.
library(Lahman)
data("Batting")
avg <- Batting %>% filter(yearID %in% 1999:2001) %>%
  mutate(pa = AB + BB, 
         avg_singles = (H - X2B - X3B - HR) / pa, avg_bb = BB / pa) %>%
  filter(pa >= 100) %>%
  select(playerID, avg_singles, avg_bb)

dat <- Batting %>% filter(yearID == 2002) %>%
  mutate(pa = AB + BB, 
         singles = (H - X2B - X3B - HR) / pa, bb = BB / pa) %>%
  filter(pa >= 100) %>%
  select(playerID, singles, bb)
  1. Compute the correlation between 2002 and the previous seasons for singles and BB.
dat <- inner_join(dat, avg, by = "playerID")
rdat <- dat %>% 
  summarise(singles_r = cor(singles,avg_singles ), bb_r = cor(bb, avg_bb ))
rdat
  1. Note that the correlation is higher for BB. To quickly get an idea of the uncertainty associated with this correlation estimate, we will fit a linear model and compute confidence intervals for the slope coefficient. However, first make scatterplots to confirm that fitting a linear model is appropriate.
library(ggplot2)
dat %>% 
  ggplot(aes(singles,avg_singles))+
  geom_point(alpha = 0.5)

dat %>% 
  ggplot(aes(bb,avg_bb))+
  geom_point(alpha = 0.5)
  1. Now fit a linear model for each metric and use the confint function to compare the estimates.
2

There are 2 answers

0
abdullah On

I would use the lm function to solve this question.
Example:

lm(singles ~ avg_singles , data = dat)

Likewise for the bb as well.

0
DnLusho On

What is the correlation between 2002 singles rates and 1999-2001 average singles rates?

The following code can be used to determine the correlation:

dat <- inner_join(bat_02, bat_99_01)
cor(dat$singles, dat$mean_singles)

# Correct answer:

[1] 0.5509222

What is the correlation between 2002 BB rates and 1999-2001 average BB rates?

The following code can be used to determine the correlation:

cor(dat$bb, dat$mean_bb)

# Correct answer:

[1] 0.7174787