I have been studying word2vec model by Google. I was able to generate vectors for text word corpus for maximum 300 dimensions. It is a very impressive tool and accuracy goes much further, on big data.
I am curious, is there any way to use word2vec to generate vectors on grayscale images. I am sure the approach is same, you generate vectors based on pixel intensity and then compute a cosine similarity.
I am trying to do build a model to compute similarity distance on grayscale images. Any library is capable of doing this besides word2vec or glove that works on text?
Word2vec is not a good model for images, however I think what you really need is a bag of word model. In a basic method of image comparison, you convert images to a list of key point features (e.g. SIFT, SURF or etc.), then you match clusters of points with each other (e.g. FLANN).
The high amount of features in an image and uncertainty of each point representation makes it difficult to use a basic one layer network learning such as word2vec for image recognition. You may find better examples in this tutorials.
UPDATE after 3 years: I should also mention ConvNets and several pre-trained models available now which you can extract visual features from pixels.