I have two time series of temperature data. Both have (if they are in the same time zone very similar underlying functions (annual and daily seasons)

Signal A is measured in UTC0 with hourly frequency. A measurement of e.g. 5°C at 1pm means that the average temperature between noon and 1pm was 5°C.

Signal B's time zone is unknown and it is measured in 10 min intervals. A measurement e.g. of 5°C at 1pm means that the average temperature between 12.50 and 1pm was 5°C. Furthermore, Signal B may have multiple time zone changes.

I would like to write an algorithm that is able to detect the time zone(s) of Signal B.

I thought cross-correlation of multiple intervals could be the way to approach this problem, so I tried the following:

  1. I aggregated signal B to an hourly frequency by using the mean of each hour.

  2. I fit a model to Signal A to get the residuals of Signal A.

  3. I subtracted the model of Signal A from Signal B to get the residuals of Signal B.

  4. I ran cross-correlation for multiple lags on the two residuals for consecutive intervals (I tried multiple intervals here from several weeks to three months)

Unfortunately, the results I get are quite wild and seem to be all over the place.

'''
For this example I am making the assumption that the signal consists of two sine waves.
'''

# Creating the two signals
x = np.arange(0, 4*math.pi, 0.01)

signalA = np.sin(x) + np.sin(x*10) + np.random.randn(len(x))*0.1
signalB = np.sin(x) + np.sin(x*10) + np.random.randn(len(x))*0.1
data = pd.DataFrame()
data['A'] = signalA
data['B'] = signalB
data.loc[500:, 'B'] = data.loc[500:, 'B'].shift(10)
data.loc[800:, 'B'] = data.loc[800:, 'B'].shift(-15)

# Pre-whiten
data['Aw'] = data.A - (np.sin(x) + np.sin(x*10))
data['Bw'] = data.B - (np.sin(x) + np.sin(x*10))

# Find max cross correlation
def crosscorr(d1, d2, lag):
    return d1.corr(d2.shift(lag))

def max_lag(d1, d2):
    d = {}
    for i in range(-30, 31):
        d[crosscorr(d1, d2, i)] = i
    return d[max(d)]

d_index = []
d_lag = []

for i in range(100, len(x), 10):
    d_index.append(i)
    d_lag.append(max_lag(data[i-100:i].Aw, data[i-100:i].Bw))

# Plot result
plt.scatter(d_index, d_lag)

I would like to have a argmax of the ccf from 0:500 to be 0, 500:800 to be 10 and 800:end to be -5, but the results I get are quite all over the place.

0 Answers