I have a long array of 32 bit integers which represent a time series, e.g.
[1,2,3,4,5,6,7,8,9,10,11,12...]
(though mine is more or less random)
I would like to do two things to do this data to prepare it for its Machine Learning Fateā¢ in Tensorflow. The first thing I would like to do is convert each of these integers to their bitwise representation (32 bits): e.g.
x= range(100)
bits =(df[:,None] & (1 << np.arange(width,dtype='uint64')) > 0).astype(int)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
...
Endianness isn't particularly important here. The second thing I would like to do is slice this data into n-length steps, in order to create time-series data like (for example) (t-4,t-3,t-2,t-1). For example, on the above sequence with n=4
we get
[[0,1,2,3], [1,2,3,4], [2,3,4,5], ...
The excellent solution provided at Split Python sequence (time series/array) into subsequences with overlap provides an efficient solution for this:
def subsequences(ts, window):
shape = (ts.size - window + 1, window)
strides = ts.strides * 2
return np.lib.stride_tricks.as_strided(ts, shape=shape, strides=strides)
Each of these solutions works individually, but the problem is that I don't know how to compose them; I would like a three-dimensional array that contains the bitwise array of each timestep:
[
[
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
],
[
[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
[0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
],
...
Is there a way to do this that takes advantage of Numpy's vectorized implementations of these operations? Clearly running the bitwise expansion before the stride is faster, but the problem is that I am unsure of the correct shape and stride to pass to stride_tricks.as_strided
. It's also possible this is a solved problem in either Tensorflow or Numpy.