For a compression application I am trying to implement the most simplest differential pulse-code modulation (DPCM). If you do not know DPCM, it is just a differential encoding scheme, where you quantize the prediction error of a predictor and send the quantized prediction error to the decoder who can invert the process. So basically, in the most simplest case, you do
e(n) = x(n) - xhat(n-1)
with xhat(n) being the reconstructed sample, the you quantize e(n) and reconstruct x(n) according to
xhat(n) = xhat(n-1) + Q(e(n))
where Q denotes the quantizer.I implemented this in tensorflow, however, the resulting code is extremly slow due to the for loop that I believe is necessary, which conflicts with vectorization, which I am not sure can be done here. My current code is
class DPCM(tf.keras.Model):
def __init__(self, **kwargs):
super(DPCM, self).__init__(**kwargs)
self.quantizer = None
def quantize(self,x):
x_np = x.numpy().astype(np.float32)
x_np_q = self.quantizer.cluster_centers_[self.quantizer.predict(x_np),:]
return x_np_q
def SetQuantizer(self, quantizer, bypass=False):
self.quantizer = quantizer
# @tf.function
def call(self, inputs):
if self.quantizer is not None:
reconstructed = tf.TensorArray(tf.float32, size = tf.shape(inputs)[1], dynamic_size=True)
last_sample = tf.zeros(shape=(tf.shape(inputs)[0],1,tf.shape(inputs)[2]))
for i in range(tf.shape(inputs)[1]):
pred_error = inputs[:,i,:] - last_sample
pred_error_q = tf.py_function(self.quantize, [pred_error[:,0,:]], tf.float32)
pred_error_q = tf.expand_dims(pred_error_q, axis=0)
reconstructed = reconstructed.write(i, pred_error_q + last_sample)
last_sample = reconstructed.read(i)
out = tf.transpose(reconstructed.stack(),[1,2,0,3])
out = tf.squeeze(out,axis=0)
return out
else:
return inputs
inputs is of shape [batchsize, 3999, 8]. The quantizer is just scikits kmeans algorithm's codebook after I fit it to the raw prediction errors. This code works, but is EXTREMELY slow. Is it possible to speed it up somehow? Recurrent neural networks are implemented in tensorflow without a problem apparently, so I guess it must be possible to do it way faster.
Ok, I was in a programmer tunnel. It might perhaps never be efficient on a GPU due to their optimization for certain operations. However, if I switch to CPU using with tf.device('/cpu:0'): the speed drastically improved to an expected level. However, training still is surprisingly slow for such a small model that I am using (< 10000 neurons)