Cross correlation of multiple sequences avoiding for loop

3k views Asked by At

Reading this and having tried np.correlate and cv2.matchTemplate I still have a question that I don't seem to be able to solve.

I've got two numpy arrays each with the shape (6000,50). 6000 sequences with each 50 values. Now I would like to do a cross-correlation of two 1-dimensional sequences of this array to detect the timeshift. I tried openCV briefly, but for me this returns a single number (I expect the highest correlation), so now I use numpy.correlate like this:

np.correlate(x[2500], y[2500], mode='same')

(In the cross-correlation plot I'm not looking for the highest peak, but I'm looking for the first peak using this. See plot for an example)

enter image description here

As you might expect, I would like to do this for all 6000 sequences, but hoping to avoid iteration. I was hoping this would work:

np.correlate(x, y, mode='same')

But this gives me the following error: ValueError: object too deep for desired array.

Is there any change that this is possible with NumPy or OpenCV. Or will i have to do it like this :(

for i in range(x.shape[0]):
    np.correlate(x[i], y[i], mode='same')
1

There are 1 answers

3
Jaime On BEST ANSWER

scipy.ndimage.correlate1d seemed like what you are after, but it only broadcasts on the first array, the second has to be strictly 1D, so no luck there. And the functions in scipy.signal do multi-dimensional correlation, not 1D like you are after. So there doesn't seem to be anything in the stack that solves your problem.

Just for the fun of it, you could always do this with FFTs and the cross-correlation theorem:

def correlate1(a, b):
    c = np.empty_like(a)
    for j in range(len(a)):
        c[j] = np.correlate(a[j], b[j], 'same')
    return c

def correlate2(a, b):
    n = a.shape[-1]
    a_fft = np.fft.fft(a, n=2*n)
    b_fft = np.fft.fft(b, n=2*n)
    cc = np.fft.ifft(a_fft * b_fft.conj()).real
    return np.concatenate((cc[..., -n//2:], cc[..., :(n-1)//2 + 1]), axis=-1)

With your use case this isn't a great idea:

In [11]: a = np.random.rand(6000, 50)
    ...: b = np.random.rand(6000, 50)
    ...: 

In [12]: np.allclose(correlate1(a, b), correlate2(a, b))
Out[12]: True

In [13]: %timeit correlate1(a, b)
10 loops, best of 3: 37.5 ms per loop

In [14]: %timeit correlate2(a, b)
10 loops, best of 3: 71.8 ms per loop

But the approach does have its merits, mostly for larger sequences:

In [15]: a = np.random.rand(50, 6000)
    ...: b = np.random.rand(50, 6000)
    ...: 

In [16]: %timeit correlate1(a, b)
1 loops, best of 3: 516 ms per loop

In [17]: %timeit correlate2(a, b)
10 loops, best of 3: 89.2 ms per loop