I have a dataset of a thousand 128 dimensional features in the shape of e.g. (1000,128).
I want to find the sorted nearest neighbors of a 128 dimensional feature in the shape of (128,1).
The distance in calculated via a Matrix Multiplication between dataset (1000,128) and feature (128,1) which would give an array of similarities in the shape of (1000,1) :
DATASET (1000,128) x FEATURE (128,1) = SIMILARITIES (1000,1)
This is done via:
# features.shape=(1000,128) ; feature.shape=(128,1) ; similarities.shape=(1000,1) similarities = features.dot(feature)
After calculating the distance (similarities), I'm finding the nearest neighbors using the code below:
# The n Nearest Neighbors Indexes (But Not Sorted) nearest_neighbours_indexes_unsorted = np.argpartition(similarities, kth=-n)[-n:] # The n Nearest Neighbors (But Not Sorted) nearest_neighbours_similarities_unsorted = similarities[nearest_neighbours_indexes_unsorted] # The Indexes of n Nearest Neighbors Sorted nearest_neighbours_indexes_sorted = np.flip(nearest_neighbours_indexes_unsorted[np.argsort(nearest_neighbours_similarities_unsorted)], axis=0)
This code works very fast for millions of data (I'm interested if someone has a tip to make it faster) But I want to be able to find the nearest neighbors of more than one feature in one go:
DATASET (1000,128) x FEATURE (128,n) = SIMILARITIES (1000,n)
One way is to calculate the above code for each feature in a loop (which is slow) and the other way is to change the code to accommodate for multidimensional indexing and here's where I'm stuck: I don't know how to write the above code for features in the shape of (128,n) and not (128,1).