Measures of similarity for time series data

16 views Asked by At

I've got two year's worth of energy data in 15 minute increments, and need to develop a similarity score for a forecasted day i.e. identify past days that are similar to the forecasted day.

I started by splitting the initial dataframe into a list (called trading_days below) of 730 dataframes (one dataframe for each 24 hour period), planning on feeding the forecasted day and this list into a function to calculate a similarity score then rank the historic days based on this metric.

I'm struggling with which similarity measure would be better, any help would be hugely appreciated!

I tried Euclidean distance and it worked fine, it's just clearly too primitive and doesn't take the trend over time into account.

I tried cross correlation using ccf(), adapting some AI generated code, trying to compare a new day (called first_day) to just one of the past days as a test. It got me 96 numbers (as expected, the cross correlation at each lag) but each of these 96 values is just the same number! My code is shown below:

cross_corr2 <- function(vec1, vec2) {
  # Initialize a vector to store cross-correlation values
  ccf_values <- numeric(length(vec1))

  # Iterate over each timestamp in vec1
  for (i in 1:length(vec1)) {
    # Calculate cross-correlation at the current timestamp
    ccf_result <- ccf(vec1, vec2, lag.max = i - 1, plot = FALSE)
    # Extract the cross-correlation value at lag 0
    ccf_values[i] <- ccf_result$acf[i]
  }

  return(ccf_values)
}

attempt2 <- cross_corr2(first_day[["NI Demand"]], trading_days[[12]]$`NI Demand`)
attempt2

I would have expected 96 different values as my output, but it was just the same number repeated 96 times. Changing which day from "trading_days" I used made this number change, but it was always repeated 96 times.

0

There are 0 answers