All possible Regression in R: Saving coefficients in a matrix

8.1k views Asked by At

I am running code for all possible models of a phylogenetic generalised linear model. The issue I am having is extracting and saving the beta coefficients for each model.

I want to save the coefficients into a matrix, where the columns correspond to a specific variable and the rows correspond to a formula.The issue arises because the variables are different for every model. So one cannot simply row bind the coefficients to the matrix.

The example below shows where I am up to in the problem:

y = rnorm(10)
inpdv = matrix(c(rnorm(10), runif(10), rpois(10, 1)), ncol = 3)
colnames(inpdv) = c("A", "B", "C")
data = cbind(y, inpdv)

model.mat = expand.grid(c(TRUE,FALSE), c(TRUE,FALSE), c(TRUE,FALSE))
names(model.mat) = colnames(inpdv)

formula = apply(model.mat, 1, function(x)
                       paste(colnames(model.mat)[x], collapse=" + "))
formula = paste("y", formula, sep = " ~ ")
formula[8] = paste(formula[8], 1, sep = "")

beta = matrix(NA, nrow = length(formula), ncol = 3)

for(i in 1:length(formula)){
   fit = lm(formula(formula), data)
   ## Here I want to extract the beta coeffecients here into a n * k matrix
   ## However, I cannot find a way to assign the value to the right cell in the matrix

}

So I imagine each coefficient will need to be placed into the respective cell, but I cannot think of a quick and efficient way of doing so.

The true analysis will take place around 30, 000 times, so any tips on efficiency would also be appreciated.

Edit: So as an example, the output for a model of y ~ a + c will need to be in the form of

a NA b 

Where the letters represent the coefficient for that model. The next model may be y ~ b + c which would then be added in the bottom. So the result would now look like

a  NA b
NA b c
2

There are 2 answers

0
Simon O'Hanlon On BEST ANSWER

How about using names and %in% to subset the right columns. Extract the coefficient values using coef. Like this:

beta = matrix(NA, nrow = length(formula), ncol = 3)
colnames(beta) <- colnames(inpdv)

for(i in 1:length(formula)){
   fit = lm(formula(formula[i]), data)
    coefs <- coef(fit)
    beta[ i , colnames(beta) %in% names( coefs ) ] <- coefs[ names( coefs ) %in% colnames( beta ) ]
}
#              A          B         C
#[1,] -0.4229837 -0.0519900 0.3787666
#[2,]         NA  0.7015679 0.0555350
#[3,] -0.4165834         NA 0.3692974
#[4,]         NA         NA 0.1346726
#[5,] -0.2035173  0.7049951        NA
#[6,]         NA  0.7978726        NA
#[7,] -0.2229959         NA        NA
#[8,]         NA         NA        NA

I think it's perfectly acceptable to use a for loop here. For starters using something like lapply sometimes keep increasing memory usage as you run more and more of the simulations. R will sometimes not mark objects from earlier models as trash until the lapply loop finishes so so can sometimes get a memory allocation error. Using the for loop I find that R will reuse memory allocated to the previous iteration of the loop if necessary so if you can run the loop once, you can run it lots of times.

The other reason not to use a for loop is speed, but I would assume that the time to iterate is negligible compared to the time to fit the model so I would use it.

4
zero323 On
for(i in 1:length(formula)){
    fit = lm(formula(formula), data)
     beta[i, 1:length(fit$coefficients)] <- fit$coefficients
}

Update

Idea: name your columns after coefficients, and assign values to columns by name.

It is just a dummy example but should help you: Create your output matrix:

beta <- matrix(NA,  nrow=7, ncol=4)
colnames(beta) <- c("(Intercept)", 'A', 'B', 'C')

Create some dummy data:

 A <- rnorm(10)
 B <- rpois(10, 1)
 C <- rnorm(10, 2)
 Y <- rnorm(10, -1)

Now you can do something like that:

fit <- lm(Y ~ A + B + C)
beta[1, names(fit$coefficients)] <- fit$coefficients

fit <- lm(Y ~ A + B)
beta[2, names(fit$coefficients)] <- fit$coefficients

fit <- lm(Y ~ A + C)
beta[3, names(fit$coefficients)] <- fit$coefficients

fit <- lm(Y ~ B + C)
beta[4, names(fit$coefficients)] <- fit$coefficients

fit <- lm(Y ~ A)
beta[5, names(fit$coefficients)] <- fit$coefficients

fit <- lm(Y ~ B)
beta[6, names(fit$coefficients)] <- fit$coefficients

fit <- lm(Y ~ C)
beta[7, names(fit$coefficients)] <- fit$coefficients