How to apply dtw algorithm on multiple time series in R?

Question

How to apply dtw algorithm on multiple time series in R?

3.9k views Asked by umair durrani At 29 August 2017 at 18:11

Problem

I have time series of speed of different vehicles. My ultimate objective is to cluster different vehicles based on their similarities in speed over time. So, I basically need to produce a distance matrix where each cell contains the distance between a pair of vehicle speed time series. I want to use Dynamic Time Warping (dtw) as distance metric. Therefore, I want to apply dtw on each pair of speed time series.

Data

Here are some sample data that contain only 8 observations per car and only 3 cars:

> dput(c)
structure(list(file.ID2 = c("Cars_03", "Cars_03", "Cars_03", 
"Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_04", 
"Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", 
"Cars_04", "Cars_05", "Cars_05", "Cars_05", "Cars_05", "Cars_05", 
"Cars_05", "Cars_05", "Cars_05"), speed.kph.ED = c(129.3802848, 
129.4022304, 129.424176, 129.4461216, 129.4680672, 129.47904, 
129.5009856, 129.5229312, 127.8770112, 127.8221472, 127.7672832, 
127.7124192, 127.6575552, 127.6026912, 127.5478272, 127.4929632, 
134.1095616, 134.1205344, 134.1315072, 134.1534528, 134.1644256, 
134.1753984, 134.1863712, 134.197344)), row.names = c(NA, -24L
), class = c("tbl_df", "tbl", "data.frame"), .Names = c("file.ID2", 
"speed.kph.ED"))

What I tried

I can find the dtw::dtw() distance for one pair like following:

    library(dplyr) 
    library(dtw) 
    c3 <- c %>% filter(file.ID2=="Cars_03")  
    c4 <- c %>% filter(file.ID2=="Cars_04")  
    query <- c4$speed.kph.ED  
    reference <- c3$speed.kph.ED  
    dtw_results <- dtw(x = query, y = reference)
    dtw_results$distance

But my question is : Is there a way to automatically find the dtw()$distance between each pair and generate a distance matrix? In this example, it means these pairs:

Cars_03 - Cars_03
Cars_03 - Cars_04
Cars_03 - Cars_05
Cars_04 - Cars_03
Cars_04 - Cars_04
Cars_04 - Cars_05
and so on

I know for loop is one way to do this. But since dtw itself requires a lot of RAM, for loop can further slow down the process. Any alternatives? I'm sorry if this is a silly question but I'm quite new to using dtw.

Original Q&A

There are 2 answers

user2313186 On 30 August 2017 at 06:47

DTW only takes a lot of memory if implemented with recursion. If implemented with iterative version it only requires O(1) space overhead.

Using a warping window width constraint, you can build a matrix say 300 length 1,000 time series in a few minutes (at most). If you have even more data, try TADPOLE.

I suggest you read this tutorial

http://www.cs.unm.edu/~mueen/DTW.pdf

**CPak** · Accepted Answer · 2017-09-05T12:36:40+00:00

The following works

Split your data frame into a list by file.ID2

ds <- split(df, df$file.ID2)

Use expand.grid to make all combinations of your names, file.ID2 and your values

Names <- expand.grid(unique(df$file.ID2), unique(df$file.ID2))
Values <- expand.grid(ds, ds)

purrr:map_dbl iterates through all row-combinations of Values and returns a vector of doubles

library(dtw)
library(purrr)
Dist <- map_dbl(1:nrow(Values), ~dtw(x = Values[.x,]$Var1[[1]]$speed.kph.ED, y = Values[.x,]$Var2[[1]]$speed.kph.ED)$distance)

Bind answer to Names

library(dplyr)
ans <- Names %>% 
          mutate(distance = Dist)

Output

     Var1    Var2 distance
1 Cars_03 Cars_03  0.00000
2 Cars_04 Cars_03 25.66538
3 Cars_05 Cars_03 69.72117
4 Cars_03 Cars_04 25.66538
5 Cars_04 Cars_04  0.00000
6 Cars_05 Cars_04 96.00103
7 Cars_03 Cars_05 69.72117
8 Cars_04 Cars_05 96.00103
9 Cars_05 Cars_05  0.00000

TechQA.

How to apply dtw algorithm on multiple time series in R?

Problem

Data

What I tried

There are 2 answers

Related Questions in R

Related Questions in DPLYR

Related Questions in DTW

Popular Questions

Trending Questions