Transforming a sequence of integers into the binary representation of that sequence's strides

33 views Asked by At

I have a long array of 32 bit integers which represent a time series, e.g. [1,2,3,4,5,6,7,8,9,10,11,12...] (though mine is more or less random)

I would like to do two things to do this data to prepare it for its Machine Learning Fateā„¢ in Tensorflow. The first thing I would like to do is convert each of these integers to their bitwise representation (32 bits): e.g.

x= range(100)
bits =(df[:,None] & (1 << np.arange(width,dtype='uint64')) > 0).astype(int)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
...

Endianness isn't particularly important here. The second thing I would like to do is slice this data into n-length steps, in order to create time-series data like (for example) (t-4,t-3,t-2,t-1). For example, on the above sequence with n=4 we get [[0,1,2,3], [1,2,3,4], [2,3,4,5], ... The excellent solution provided at Split Python sequence (time series/array) into subsequences with overlap provides an efficient solution for this:

def subsequences(ts, window):
    shape = (ts.size - window + 1, window)
    strides = ts.strides * 2
    return np.lib.stride_tricks.as_strided(ts, shape=shape, strides=strides)

Each of these solutions works individually, but the problem is that I don't know how to compose them; I would like a three-dimensional array that contains the bitwise array of each timestep:

[
    [
        [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
        [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
        [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
        [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    ],
    [
        [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
        [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
        [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0],
        [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    ],
    ...

Is there a way to do this that takes advantage of Numpy's vectorized implementations of these operations? Clearly running the bitwise expansion before the stride is faster, but the problem is that I am unsure of the correct shape and stride to pass to stride_tricks.as_strided. It's also possible this is a solved problem in either Tensorflow or Numpy.

0

There are 0 answers