For Loop alternatives for progressive operations

1.7k views Asked by At

I have to apply regression function progressively to a time series data (vector "time" and "tm" and I'm using a For Loop as follow:

top<-length(time)
for(k in 2:top){
    lin.regr<-lm(tm[1:k] ~ log(time[1:k]))
    slope[k]<-coef(lin.regr)[2]
}

But for vectors' length of about 10k it becomes very slow. Is there a faster alternative (maybe using apply function)?

In a more easy problem: if I have a vector like x<-c(1:10) how can I build a y vector containing (for example) the progressive sum of x values? Like:

x
1 2 3 4 5 6 7 8 9 10
y
1  3  6 10 15 21 28 36 45 55
2

There are 2 answers

2
darckeen On BEST ANSWER
results <- sapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])

~apply family of functions is the fastest way to iterate in R.

can also look at using lm.fit() to speed up your regrssion a bit

cumsum(1:10)

is how to do the second question

0
Joris Meys On

Well, there is no fast loop alternative, unless you can vectorize. In some circumstances functions like ave, aggregate, ddply, tapply, ... can give you a substantial win, but often the trick lies in using faster functions, like cumsum (cfr. the answer of user615147)

To illustrate :

top <- 1000
tm <- rnorm(top,10)   
time <- rnorm(top,10)

> system.time(
+ results <- sapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])
+ )
   user  system elapsed 
   4.26    0.00    4.27 

> system.time(
+ results <- lapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])
+ )
   user  system elapsed 
   4.25    0.00    4.25 

> system.time(
+ results <- for(k in 2:top) coef(lm(tm[1:k] ~ log(time[1:k])))[2]
+ )
   user  system elapsed 
   4.25    0.00    4.25 

> system.time(
+ results <- for(k in 2:top) lm.fit(matrix(log(time[1:k]),ncol=1),
+                                 tm[1:k])$coefficients[2]
+ )
   user  system elapsed 
   0.43    0.00    0.42 

The only faster solution is lm.fit(). Don't be mistaken, the timings differ a bit every time you run the analysis, so a difference of 0.02 is not significant in R. sapply, for and lapply are all exactly as fast here. The trick is to use lm.fit.

If you have a dataframe called Data, you could use something like :

Data <- data.frame(Y=rnorm(top),X1=rnorm(top),X2=rnorm(top))

mf <- model.matrix(Y~X1+X2,data=Data)
results <- sapply(2:top, function(k)
  lm.fit(mf[1:k,],Data$Y[1:k])$coefficients[2]
)

as a more general solution.