I have two single vector data frames of unequal length
aa<-data.frame(c(2,12,35))
bb<-data.frame(c(1,2,3,4,5,6,7,15,22,36))
For each observation in aa I want to count the number of instances bb is less than aa
My result:
bb<aa
1 1
2 7
3 9
I have been able to do it two ways by creating a function and using apply, but my datasets are large and I let one run all night without end.
What I have:
fun1<-function(a,b){k<-colSums(b<a)
k<-k*.000058242}
system.time(replicate(5000,data.frame(apply(aa,1,fun1,b=bb))))
user system elapsed
3.813 0.011 3.883
Secondly,
fun2<-function(a,b){k<-length(which(b<a))
k<-k*.000058242}
system.time(replicate(5000,data.frame(apply(aa,1,fun2,b=bb))))
user system elapsed
3.648 0.006 3.664
The second function is slightly faster in all my tests, but I let the first run all night on a dataset where bb>1.7m and aa>160k
I found this post, and have tried using with() but cannot seem to get it to work, also tried a for loop without success.
Any help or direction is appreciated.
Thank you!
Some more realistic examples:
So with
length(aa) = 1.6e4
this takes about 2.5 min (on my system), and the process scales asO(length(aa))
- no surprise there. Therefore, with your full dataset, it should run in about 25 min. Still kind of slow. Maybe someone else will come up with a better way.