Time Series Clustering: problem to converting a dplyr data frame into a list of time series

68 views Asked by At

I'd like to use time series clustering using the dtwclust package. The problem is the conversion of my data.frame to list of time series. All my blocks ID (named STAND) has 180 days in negative values (DATE_TIME) The B2_MAX is my variable response. In my example:

library(dplyr)
library(ggplot2)
library(dtwclust)

all.B2_MAX.stands <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_ts_data.csv")

all.B2_MAX.tsc <-  all.B2_MAX %>%
  group_by(STAND) %>%
  summarise(var = list(B2_MAX[order(DATE_TIME)]), 
            var_ts = purrr::map(var, ts))

clusters <- tsclust(all.B2_MAX.tsc[-1], 
                   type="partitional", 
                   k=2L, 
                   distance="dtw",
                   centroid = "pam")

#plot
plot(cluster, type = "sc")

#Error in lapply(series, base::as.numeric) : 
#  'list' object cannot be coerced to type 'double'

Please, any help with it?

2

There are 2 answers

0
Leprechault On BEST ANSWER

In this case split by response variable and idBlocks after using the tsclust function, work very well:

d <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_ts_data.csv")
l <- split(d$B2_MAX,d$STAND)
o <- tsclust(l, 
        type="partitional", 
        k=2L, 
        distance="dtw_basic",
        centroid = "pam")
#plot
plot(o)
o

# partitional clustering with 2 clusters
# Using dtw_basic distance
# Using pam centroids

# Time required for analysis:
#   usuário   sistema decorrido 
#      1.13      0.00      0.16 

# Cluster sizes with average intra-cluster distance:

#   size       av_dist
# 1   14 3.518299e+198
# 2   50  4.526561e+08
1
Marwi On

The problem seems to come from the construction of time series. The ts function requires a numeric vector, while your var is a list of lists.

Try unlisting the var first using the unlist() function.

library(dplyr)
library(ggplot2)
library(dtwclust)
library(purrr)

all.B2_MAX.stands <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_ts_data.csv")

all.B2_MAX.tsc <-  all.B2_MAX.stands %>%
  group_by(STAND) %>%
  summarise(var = list(B2_MAX[order(DATE_TIME)])) %>% 
  mutate(var_ts = purrr::map(var, ~ts(unlist(.))))

clusters <- tsclust(all.B2_MAX.tsc$var_ts, 
                   type="partitional", 
                   k=2L, 
                   distance="dtw",
                   centroid = "pam")

#plot
plot(clusters, type = "sc")