I have to find the similarity between a reference document and the set of documents in a repository .
Method :
1. I find the term document matrix for all the documents including the reference document
2. The svd is calculated for this matrix
3. I take the v array(The third result)
4. I transpose this matrix so that the each row represents a document .
5. The first row represents the reference document .
6. I find the cosine similarity beween this row and the rest of the rows
My doubts :
Since i have around 7 documents in my db , i get only 8*8 varray(document matrix) . SO will i get a correct result if i find the cosine similarity with these 8 values alone ?
Is such a method adopted generally ?
I use java to code this . I make use of the jama package to find the svd .
You can read an example of LSA Here