Suppose I have two very simple arrays with numpy:
import numpy as np
reference=np.array([0,1,2,3,0,0,0,7,8,9,10])
probe=np.zeros(3)
I would like to find which slice of array reference has the highest pearson's correlation coefficient with array probe. To do that, I would like to slice the array reference using some sort of sub-arrays that are overlapped in a for loop, which means I shift one element at a time of reference, and compare it against array probe. I did the slicing using the non elegant code below:
from statistics import correlation
for i in range(0,len(reference)):
#get the slice of the data
sliced_data=reference[i:i+len(probe)]
#only calculate the correlation when probe and reference have the same number of elements
if len(sliced_data)==len(probe):
my_rho = correlation(sliced_data, probe)
I have one issues and one question about such a code:
1-once I run the code, I have the error below:
my_rho = correlation(sliced_data, probe) File "/usr/lib/python3.10/statistics.py", line 919, in correlation raise StatisticsError('at least one of the inputs is constant') statistics.StatisticsError: at least one of the inputs is constant
2- is there a more elegant way of doing such slicing with python?
You can use
sliding_window_viewto get the successive values, for a vectorized computation of the correlation, use a custom function:Output: