So, I have the following problem: I have a data set, A (data.table object), of the following structure:
date days rate
1996-01-02 9 5.763067
1996-01-02 15 5.745902
1996-01-02 50 5.673317
1996-01-02 78 5.608884
1996-01-02 169 5.473762
1996-01-03 9 5.763067
1996-01-03 14 5.747397
1996-01-03 49 5.672263
1996-01-03 77 5.603705
1996-01-03 168 5.470584
1996-01-04 11 5.729460
1996-01-04 13 5.726104
1996-01-04 48 5.664931
1996-01-04 76 5.601891
1996-01-04 167 5.468961
Note that the days column and its size may differ for each day. My goal is now to (piecewise linearly) interpolate rate along days. I am doing this for each day via
approx(x=A[,days],y=A[,rate],xout=days_vec,rule=2)
where days_vec <- min_days:max_days
, i.e. the days range I am interested in (say 1:100).
I have two problems here:
approx only interpolates, i.e. it does not create a linear fit across min(x) and max(x). If I am now interested in days 1:100, I first need to do it by hand using days 9 and 15 (first 2 lines of A) via:
first_days <- 1:(A[1,days]-1) #1:8 rate_vec[first_days] <- A[1,rate] + (first_days - A[1,days])/(A[2,days]-A[1,days])*(A[2,rate]-A[1,rate])
and then using the approx line above for rate_vec[9:100]
. Is there a way of doing this in 1 step?
- Right now, given that I need two steps and the shift point between the two procedures (here 9) differs among dates, I cannot see an implementation via data.table, although this would be vastly preferred (using data.table methods to interpolate/extrapolate and then returning the expanded data.table object). Thus, I currently run a for loop through the dates, which is of course much, much slower.
Question: Is the problem above better implementable and also, is this somehow doable with data.table methods instead of looping through A?
How about something like this.
1. Create NA values of rate for days in range 1:100 that aren't in dataset
2. For each value of date, use a linear model to predict NA values of rate.
Gives you:
If there are values of
date
without at least two observations ofrate
, you will probably get an error because you won't have enough points to fit a line.Alternative: Rolling joins solution for piecewise linear interpolation
This requires rolling joins to left and right, and an average of the two that ignores NA values.
This doesn't do well for extrapolation, though, since it's just a constant (either the first or last obs) outside the indices of observations.