TechQA.

Doubts regarding LSA

463 views Asked by CTsiddharth At 27 January 2012 at 02:53

I have to find the similarity between a reference document and the set of documents in a repository .

Method : 

1. I find the term document matrix for all the documents including the reference document 
2. The svd is calculated for this matrix 
3. I take the v array(The third result)
4. I transpose this matrix so that the each row represents a document . 
5. The first row represents the reference document . 
6. I find the cosine similarity beween this row and the rest of the rows

My doubts :

Since i have around 7 documents in my db , i get only 8*8 varray(document matrix) . SO will i get a correct result if i find the cosine similarity with these 8 values alone ?
Is such a method adopted generally ?

I use java to code this . I make use of the jama package to find the svd .

There are 1 answers

Debaditya

Debaditya On 27 January 2012 at 05:33

I have tried with Matlab using TMG tool box. It works fine.
For better results ( or for more accuracy ) use larger data sets.
In LSA , svd is a part of it ( For Dimension reduction ) . For calculating your Cosine similarity, you will require the last matrix which you will get after this calculation A = U * S * V^t .

You can read an example of LSA Here