How can I create a POSIXct vector in ffdf?

217 views Asked by At

I've had a look around and can't quite seem to get a grasp of is going on with this. I'm using R in Eclipse. The file I'm trying to import is 700mb with around 15mil rows and 6 columns. As I was having problems loading in I have started using the ff package.

library(ff)
FDF = read.csv.ffdf(file='C:\\Users\\William\\Desktop\\R Data\\GBPUSD.1986.2014.txt', header = FALSE, colClasses=c('factor','factor','numeric','numeric','numeric','numeric'), sep=',')
names(FDF)= c('Date','Time','Open','High','Low','Close')
#names the columns in the ffdf file
dim(FDF)
# produces dimensions of the file

I then want to create a POSIXct sequence which will later be joined against the imported file. I had tried;

tm1 = seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins")) 
tm1 = data.frame (DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))

However R kept of crashing. I then tested this is RStudio and saw that their where constraints on the vector. It did, however, produce the correct

dim(tm1)
names(tm1)

So I went back into Eclipse thinking this was something to do with memory allocation. I've attempted the following;

library(ff)
tm1 = as.ffdf(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins")) 
tm1 = as.ffdf(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
names(tm1) = c('DateTime')
dim(tm1)
names(tm1)

This gives an error of

no applicable method for 'as.ffdf' applied to an object of class "c('POSIXct', 'POSIXt')"

I can't seem to work around this. I then tried ...

library(ff)
tm1 = as.ff(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins")) 
tm1 = as.ff(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))

Which produce the output dates, however not in the correct format. In addition to this, when ...

dim(tm1)
names(tm1)

where executed they both returned null.

Question

  1. How can I produce a POSIXct seq in the format I require above?
1

There are 1 answers

0
AudioBubble On BEST ANSWER

We'll we got there in the end.

I believe the problem was the available RAM during the creation of the full vector. As this was the case I broke the vector into 3, converted them into ffdf format to free up RAM and then used rbind to bind them together.

The problem with formatting the vector once created, I believe, was due to accessing RAM. Every time I tried this R crashed.

Even with the work around below my machine is slowing (4gb). I've ordered some more RAM and hope this will smooth future operations.

Below is the working code;

library(ff)
library(ffbase)

tm1 = seq(from = as.POSIXct('1986-12-01 00:00'), to = as.POSIXct('2000-12-01 23:59'), by = 'min')
tm1 = data.frame(DateTime=strftime(tm1, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm1 = as.ffdf(tm1)
# converts to ffdf format 
memory.size()

tm2 = seq(from = as.POSIXct('2000-12-02 00:00'), to = as.POSIXct('2010-12-01 23:59'), by = 'min')
tm2 = data.frame(DateTime=strftime(tm2, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm2 = as.ffdf(tm2)
memory.size()

tm3 = seq(from = as.POSIXct('2010-12-2 00:00'), to = as.POSIXct('2014-09-04 23:59'), by = 'min')
tm3 = data.frame(DateTime=strftime(tm3, format='%Y.%m.%d %H:%M'))
memory.size()
tm3 = as.ffdf(tm3)
# converts to ffdf format 
memory.size()

tm4 = rbind(tm1, tm2, tm3)
# binds ffdf objects into one
dim(tm4)
# checks the row numbers