I have a really big numpy array(145000 rows * 550 cols). And I wanted to create rolling slices within subarrays. I tried to implement it with a function. The function lagged_vals
behaves as expected but np.lib.stride_tricks
does not behave the way I want it to -
def lagged_vals(series,l):
# Garbage implementation but still right
return np.concatenate([[x[i:i+l] for i in range(x.shape[0]) if i+l <= x.shape[0]] for x in series]
,axis = 0)
# Sample 2D numpy array
something = np.array([[1,2,2,3],[2,2,3,3]])
lagged_vals(something,2) # Works as expected
# array([[1, 2],
# [2, 2],
# [2, 3],
# [2, 2],
# [2, 3],
# [3, 3]])
np.lib.stride_tricks.as_strided(something,
(something.shape[0]*something.shape[1],2),
(8,8))
# array([[1, 2],
# [2, 2],
# [2, 3],
# [3, 2], <--- across subarray stride, which I do not want
# [2, 2],
# [2, 3],
# [3, 3])
How do I remove that particular row in the np.lib.stride_tricks
implementation? And how can I scale this cross array stride removal for a big numpy array ?
Sure, that's possible with
np.lib.stride_tricks.as_strided
. Here's one way -Sample input, output -
Note that the last step of reshaping forces it to make a copy there. But that's can't be avoided if we need the final output to be a
2D
. If we are okay with a3D
output, skip that reshape and thus achieve aview
, as shown with the sample case -