Pass offset arguments into lm function

1.8k views Asked by At

I am doing a linear regression and I would like to fix some inputs. I have found the way to do this with offset. Let's see it in example:

set.seed(145)
df <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10))

summary(lm(formula = a ~ . + offset(0.1*c) - c + offset(0.05*d) - d, data = df))

The problem is that I have much more variables and I would like to generate my lm formula automatically.

Let's say, I want to pass the names of inputs (that are columns of data in lm) and a value for it's coefs, for example in the next way:

inputs_fix <- c("c", "d")
inputs_fix_coef <- c(0.1, 0.05)

Then I need a function that writes me a formula as above but I don't know how to write an expression offset(0.1*c) - c + offset(0.05*d) - d having inputs_fix and inputs_fix_coef objects.

Is it possible? There is another way to fix coefficients (more elegant)? Appreciate any help

UPDATE: creating formula with paste and as.formula with @Jan van der Laan suggestion

my.formula <- paste0(" + offset(", inputs_fix_coef, "*", inputs_fix, ") - ", inputs_fix, collapse = " ")
lm.fit <- lm(formula = as.formula(paste0("a ~ .", mi.expresion)), data = df))

It isn't so clear but it saves all the inputs into lm object lm.fit$model that are lost in @Jan van der Laan answer. And don't need to duplicate a data.frame

1

There are 1 answers

2
Jan van der Laan On BEST ANSWER

One way of handling this would be to calculate a new column with your total offset and remove the columns used in your offset from the data set:

# create copy of data withou columns used in offset
dat <- df[-match(inputs_fix, names(df))]

# calculate offset
dat$offset <- 0
for (i in seq_along(inputs_fix)) 
  dat$offset <- dat$offset + df[[inputs_fix[i]]]*inputs_fix_coef[i]

# run regression
summary(lm(formula = a ~ . + offset(offset) - offset, data = dat))

It is also always possible to generate your formula as a character vector (using paste etc) and then convert is to formula object using as.formula, but I suspect the solution above is cleaner.