I try to deal with multiple loops with data.table for quite some time and got frustrated. With sql it seamed quite intuitive, but with R I experience some problems.
For example, I would want to fread one txt file (as I have hundreds of files, each around 1 GB), make the calculations (sum price and quant, when time>my.time and for some selected isin's, grouped by my.time, isin and price), write the results to some csv file, remove original txt file from R memory; then redo these calculations one-by-one for all txt files and append the output csv file.
Lets start with the example data (quite small, just two identical files for illustration):
time<-format(seq.POSIXt(as.POSIXct(Sys.Date()), as.POSIXct(Sys.Date()+1), by = "1 sec"),"%H:%M:%S")
n<-length(time)
isin<-paste("US",1:n,sep="")
price<-rnorm(n,101,1)
quant<-rnorm(n,5,1)
dt<-data.table(time,isin,price,quant)
write.table(dt,"raw.txt",append = FALSE,sep = ",",col.names = TRUE, row.names = FALSE)
write.table(dt,"raw2.txt",append = FALSE,sep = ",",col.names = TRUE, row.names = FALSE)
my.files <- list.files(pattern = "raw*.txt")
my.time<-format(seq.POSIXt(as.POSIXct(Sys.Date()), as.POSIXct(Sys.Date()+1), by = "5 min"),"%H:%M:%S")
my.isin<-c("US100","US150","US225","US250","US1050")
Then I try these 2 simple loops:
for (i in my.files){
for (j in my.time){
dt<-fread(i)
write.table(dt[which(isin %in% my.isin & time>j),
.(sprice=sum(price),squant=sum(quant),**time.my=j**), by = .(isin,price)],
"output.csv",append = TRUE,sep = ",",col.names = TRUE)
rm(dt)
}}
Second edit: The loop with j finally started working for me (due to the bolded part). Maybe it would be possible to work without for loops and get the same results?
Thanks very much for help!
The problem you have is the output of your
which
statement is returning zero rows. First I would convert your time totime
type. I've then created a 5 minute grouping variable.This would first aggregate your tables.
you can then filter it by your
my.isn
or time5 greater than custom times?