im trying to find an answer but i still new to Dynamic time warping in r. I have a data set with over 20000 observation, 20 ID's and an outcome which was measured two and three times pe3r hour. my data looks something like this:
#ID Hour outcome
#1 00:30 3.4
#1 00:50 2.3
#... ......
#1 23:40 0.5
#2 00:21 2.3
#... ......
So for each ID i have around 1500 time points but the time series are not the same length (some ID start sooner or later and the time series have different time intervals)
I tried a distance matrix
dtwOmitNA <-function (x,y)
{
a<-na.omit(x)
b<-na.omit(y)
return(dtw(a,b,distance.only=TRUE)$normalizedDistance)
}
and i want to use my distance matrix for tsclust using DBA centriod which lookes something like this:
clustering_result <- tsclust(time_series_list
, k = 2L:19L #number of clusters
, distance = "dtwOmitNA" #dissimilarity function
, centroid = "dba"#DTW Barycenter Averaging
, trace = F
, seed = seed
, norm = "L2", window.size = NULL #for DBA
, args = tsclust_args(cent = list(trace = F, window.size = 18L), dist=list(window.size = 18L))
# , normalize=T # distance normalized
# , sqrt.dist =F
)
The question is that tsclust is loading too long and i dont know if i did a mistake somewhere? Maybe the problem is that i have to many observation (because i measure each id multiple times per hour?)
i tried searching for other examples but i only could set the window.size with the information i found.