Doubts regarding LSA

454 views Asked by At

I have to find the similarity between a reference document and the set of documents in a repository .

Method : 

1. I find the term document matrix for all the documents including the reference document 
2. The svd is calculated for this matrix 
3. I take the v array(The third result)
4. I transpose this matrix so that the each row represents a document . 
5. The first row represents the reference document . 
6. I find the cosine similarity beween this row and the rest of the rows 

My doubts :

  1. Since i have around 7 documents in my db , i get only 8*8 varray(document matrix) . SO will i get a correct result if i find the cosine similarity with these 8 values alone ?

  2. Is such a method adopted generally ?

I use java to code this . I make use of the jama package to find the svd .

1

There are 1 answers

3
Debaditya On
  • I have tried with Matlab using TMG tool box. It works fine.
  • For better results ( or for more accuracy ) use larger data sets.
  • In LSA , svd is a part of it ( For Dimension reduction ) . For calculating your Cosine similarity, you will require the last matrix which you will get after this calculation A = U * S * V^t .

You can read an example of LSA Here