Loops in R - linear regression

9.6k views Asked by At

I've just started out learning R and can't seem to get this loop to work. I have a data frame containing 250 rows and 503 columns (y) and another data frame containing 250 rows and 1 column (x).

I am trying to get a loop to run 503 separate regressions without having to type it in individually ie.

(output_1 <- lm(y$1st column ~ x))
(output_2 <- lm(y$2nd column ~ x))

across all 250 rows in each regression.

I tried this loop:

for (i in 1:503) {
output_loop <- lm(y[,i]~x)
}
output_total <- cbind(output$coefficients)

but this only gave me one intercept and one coefficient, as opposed to 503 intercepts and 503 coefficients.

The rows of each data frame have time markers that are aligned in the format yyyy-mm-dd but I don't believe this affects the regression as the intercept and coefficients output sought are independent of time.

I have also tried using a basic lm:

(output <- lm(y~x))
 output_total <- cbind(output$coefficients)

and this gives 503 intercepts and 503 coefficients, however the output is wrong when I spot checked the output against some columns (running individual regression as above).

Any help on this loop is much appreciated!

Thank you

3

There are 3 answers

6
Eric Green On

I'm not sure you're approaching this the best way, but here's something that I think will achieve what you described.

# create some toy data to match your description

set.seed(340)
y <- data.frame(replicate(503, runif(250, 0, 1)))
x <- data.frame(v1=runif(250, 0, 1))


out <- data.frame(NULL)              # create object to keep results
for (i in 1:length(y)) {
  m <- summary(lm(y[,i] ~ x[,1]))    # run model
  out[i, 1] <- names(y)[i]           # print variable name
  out[i, 2] <- m$coefficients[1,1]   # intercept
  out[i, 3] <- m$coefficients[2,1]   # coefficient
}
names(out) <- c("y.variable", "intercept", "coef.x")
head(out)

# y.variable intercept       coef.x
# 1         X1 0.4841710 -0.015186852
# 2         X2 0.4972775 -0.002306964
# 3         X3 0.4410326  0.096450185
# 4         X4 0.4547249  0.041582039
# 5         X5 0.5039661  0.062429142
# 6         X6 0.5331573 -0.092806309
2
Seth On

Your loop is close you just need to create a place to catch the results.

output_loop=list(NA)
for (i in 1:503) {
output_loop[[i]] <- lm(y[,i]~x)

}

If you just want the coefficients in a data.frame then restructure things to catch just the two coefficients from each model

output_loop=data.frame(int=NA,slope=NA)
for (i in 1:503) {
  output_loop[i,] <- coefficients(lm(y[,i]~x))

}
0
albgarre On

Here is a "tidy" solution based on the tidyverse and broom packages:

library(tidyverse)
library(broom)


set.seed(340)
y <- data.frame(replicate(503, runif(250, 0, 1)))
x <- data.frame(v1=runif(250, 0, 1))

y %>%
    pivot_longer(everything(), names_to = "variable", values_to = "y") %>%
    split(.$variable) %>%
    map(., ~ mutate(., x = x$v1)) %>% # Note that v1 is the name of MY data frame x
    map(., ~ lm(y ~ x, data = .)) %>%
    map(tidy) %>% # This bit here extracts the model parameters. If you want any other information, you could use summary()
    imap_dfr(., ~ mutate(.x, variable = .y))

#> # A tibble: 1,006 x 6
#>    term         estimate std.error statistic  p.value variable
#>    <chr>           <dbl>     <dbl>     <dbl>    <dbl> <chr>   
#>  1 (Intercept)  0.484       0.0361  13.4     3.72e-31 X1      
#>  2 x           -0.0152      0.0631  -0.241   8.10e- 1 X1      
#>  3 (Intercept)  0.499       0.0356  14.0     2.84e-33 X10     
#>  4 x           -0.00601     0.0622  -0.0967  9.23e- 1 X10     
#>  5 (Intercept)  0.492       0.0344  14.3     3.03e-34 X100    
#>  6 x            0.000321    0.0600   0.00534 9.96e- 1 X100    
#>  7 (Intercept)  0.530       0.0361  14.7     1.65e-35 X101    
#>  8 x           -0.0972      0.0631  -1.54    1.25e- 1 X101    
#>  9 (Intercept)  0.534       0.0357  15.0     1.77e-36 X102    
#> 10 x           -0.0144      0.0623  -0.231   8.18e- 1 X102    
#> # ... with 996 more rows

Created on 2019-12-04 by the reprex package (v0.3.0)