I have a folder of images. I want to compute VLAD features from each image.
I loop over each image, load it, and obtain the SIFT descriptors as follows:
repo = '/media/data/images/';
filelist = dir([repo '*.jpg']);
sift_descr = {}
for i = 1:size(filelist, 1)
I = imread([repo filelist(i).name]) ;
I = single(rgb2gray(I)) ;
[f,d] = vl_sift(I) ;
sift_descr{i} = d
end
However, VLAD requires the matrix of descriptors to be 2D. See here. What is the correct way to process my SIFT descriptors, before VLAD encoding? Thank you.
First, you need to obtain a dictionary of visual words, or to be more specific: cluster the SIFT features of all images using k-means clustering. In [1], a coarse clustering using e.g. 64, or 256 clusters is recommended.
For that, we have to concatenate all descriptors into one matrix, which we can then pass to the
vl_kmeansfunction. Further, we convert the descriptors fromuint8tosingle, as thevl_kmeansfunction requires the input to be eithersingleordouble.Second, you have to create an assignment matrix, which has the dimensions NumberOfClusters-by-NumberOfDescriptors, which assigns each descriptor to a cluster. You have a lot of flexibility in creating this assignment matrix: you can do soft or hard assignments, you can use simple nearest neighbor search or kd-trees or other approximate or hierarchical nearest neighbor schemes at your discretion.
In the tutorial, they use kd-trees, so let's stick to that: First, a kd-tree has to be built. This operation belongs right after finding the
centroids:Then, we are ready to construct the VLAD vector for each image. Thus, we have to go through all images again, and calculate their VLAD vector independently. First, we create the assignment matrix exactly as described in the tutorial. Then, we can encode the SIFT descriptors using the
vl_vladfunction. The resulting VLAD vector will have the size NumberOfClusters * SiftDescriptorSize, i.e. 64*128 in our example..Finally, we have the high-dimensional VLAD vectors for all images in the database. Usually, you'll want to reduce the dimensionality of the VLAD descriptors e.g. using PCA.
Now, given new image which is not in the database, you can extract the SIFT features using
vl_sift, create the assignment matrix withvl_kdtreequery, and create the VLAD vector for that image usingvl_vlad. So, you don't have to find new centroids or create a new kd-tree:[1] Arandjelovic, R., & Zisserman, A. (2013). All About VLAD. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1578–1585. https://doi.org/10.1109/CVPR.2013.207