I was wondering whether it was possible to vectorise this implementation of VLAD computation.
For context:
feats = numpy array of shape (T, N, F)
kmeans = KMeans object from scikit-learn initialised with K clusters.
Current method
k = kmeans.n_clusters # K
centers = kmeans.cluster_centers_ # (K, F)
vlad_feats = []
for feat in feats:
# feat shape - (N, F)
cluster_label = kmeans.predict(feat) #(N,)
vlad = np.zeros((k, feat.shape[1])) # (K, F)
# computing the differences for all the clusters (visual words)
for i in range(k):
# if there is at least one descriptor in that cluster
if np.sum(cluster_label == i) > 0:
# add the differences
vlad[i] = np.sum(feat[cluster_label == i, :] - centers[i], axis=0)
vlad = vlad.flatten() # (K*F,)
# L2 normalization
vlad = vlad / np.sqrt(np.dot(vlad, vlad))
vlad_feats.append(vlad)
vlad_feats = np.array(vlad_feats) # (T, K*F)
Getting the kmeans predictions as a batch is not a problem as we can do as follows:
feats2 = feats.reshape(-1, F) # (T*N, F)
labels = kmeans.predict(feats2) # (T*N, )
But I'm stuck at computing cluster distances.
While @MadPhysicist's answer vectorizes, I've found it hurts performance.
Below,
loopingis essentially a re-written version of OP's algorithm andnaivecemploys vectorization through the exploded 4D tensor.Indeed, see below for a benchmark.
An idiomatic vectorization which avoids memory growing beyond output size (asymptotically) would leverage a scatter reduction.
However, disappointingly, this also only gets us about
200 mscomputation time. This boils down to two reasons:looping()np.add.atactually cannot use vectorized CPU instructions, unlike the original strided reductionnp.sum(local_lf[label_l == i] - centers_kf[i], axis=0)A performant vectorized version of the VLAD algorithm requires some sophisticated techniques to leverage contiguous array accesses. This version gets 40% improvement over
looping(), but requires a lot of setup---see my blog on the approach here.