Several questions have been asked about the SIFT algorithm, but they all seem focussed on a simple comparison between two images. Instead of determining how similar two images are, would it be practical to use SIFT to find the closest matching image out of a collection of thousands of images? In other words, is SIFT scalable?
For example, would it be practical to use SIFT to generate keypoints for a batch of images, store the keypoints in a database, and then find the ones that have the shortest Euclidean distance to the keypoints generated for a "query" image?
When calculating the Euclidean distance, would you ignore the x, y, scale, and orientation parts of the keypoints, and only look at the descriptor?
There are several approaches.
One popular approach is the so called bag of words representation which does matching based solely upon how many descriptors match, thus ignoring the location part consisting of (x, y, scale, and orientation) and just look at the descriptor.
Efficient querying of a large database may use approximate methods like locality sensitive hashing
Other methods may involve vocabulary trees or other data structures.
For an efficient method that also takes into account location information, check out pyramid match kernels