I am trying to index into a tensor to get a slice or single element from 1d tensors. I find that there is significant performance difference when using the numpy
way of indexing [:]
and slice vs tf.gather
(almost 30-40% ).
Also I observe that tf.gather
has significant overhead when used on scalars (looping over unstacked tensor) as opposed to tensor . Is this a known issue ?
example code (inefficient) :
for node_idxs in graph.nodes():
node_indice_list = tf.unstack(node_idxs)
result = []
for nodeid in node_indices_list:
x = tf.gather(..., nodeid)
y = tf.gather(..., nodeid)
result.append(tf.mul(x,y))
return tf.stack(result)
as opposed to example code (efficient) :
for node_idxs in graph.nodes():
x = tf.gather(..., node_idxs)
y = tf.gather(..., node_idxs)
return tf.mul(x, y)
I understand that the first inefficient implementation is doing more work of unstacking, stacking and then looping and more gather operations, but i was not expecting 100x slowdown when the order of nodes i am operating on is few hundred nodes (is unstacking and overhead of gather on single scalar that slow, in first case i have many more gather operation each operating on single element as opposed to tensor of offsets) . Are there faster way of indexing , i tried numpy and slice which turned out to be slower than gather.
First, the code doesn't really compare gather vs Numpy indexing - it compares vectorized indexing (tf.gather) vs looped indexing (Python "for" loop). No surprise that looping is slow.
Note that Numpy-like indexing
tensor[idxs]
is anyway restricted in Tensorflow:So use
tf.gather
for general applications.