indexing in tensorflow slower than gather

1.7k views Asked by At

I am trying to index into a tensor to get a slice or single element from 1d tensors. I find that there is significant performance difference when using the numpy way of indexing [:] and slice vs tf.gather (almost 30-40% ).

Also I observe that tf.gather has significant overhead when used on scalars (looping over unstacked tensor) as opposed to tensor . Is this a known issue ?

example code (inefficient) :

for node_idxs in graph.nodes():
    node_indice_list = tf.unstack(node_idxs)
    result = []
    for nodeid in node_indices_list:
        x = tf.gather(..., nodeid)
        y = tf.gather(..., nodeid)
        result.append(tf.mul(x,y))
return tf.stack(result)

as opposed to example code (efficient) :

for node_idxs in graph.nodes():
    x = tf.gather(..., node_idxs)
    y = tf.gather(..., node_idxs)
return tf.mul(x, y)

I understand that the first inefficient implementation is doing more work of unstacking, stacking and then looping and more gather operations, but i was not expecting 100x slowdown when the order of nodes i am operating on is few hundred nodes (is unstacking and overhead of gather on single scalar that slow, in first case i have many more gather operation each operating on single element as opposed to tensor of offsets) . Are there faster way of indexing , i tried numpy and slice which turned out to be slower than gather.

1

There are 1 answers

0
Maciej Skorski On

First, the code doesn't really compare gather vs Numpy indexing - it compares vectorized indexing (tf.gather) vs looped indexing (Python "for" loop). No surprise that looping is slow.

Note that Numpy-like indexing tensor[idxs] is anyway restricted in Tensorflow:

Only integers, slices (:), ellipsis (...), tf.newaxis (None) and scalar tf.int32/tf.int64 tensors are valid indices

So use tf.gather for general applications.