Latent Semantic Indexing

Question

Latent Semantic Indexing

857 views Asked by avd At 20 November 2009 at 15:06

It is said that through LSI, the matrices that are produced U, A and V, they bring together documents which have synonyms. For e.g. if we search for "car", we also get documents which have "automobile". But LSI is nothing but manipulations of matrices. It only takes into account the frequency, not semantics. So whats the thing behind this magic that I am missing? Please explain.

Original Q&A

There are 2 answers

Jason Orendorff On 20 November 2009 at 15:18

According to the Wikipedia article, "LSI is based on the principle that words that are used in the same contexts tend to have similar meanings." That is, if two words seem to be used interchangeably, they might be synonyms.

It's not infallible.

**Jerry Coffin** · Accepted Answer · 2009-11-21T02:52:39+00:00

LSI basically creates a frequency profile of each document, and looks for documents with similar frequency profiles. If the remainder of the frequency profile is enough alike, it'll classify two documents as being fairly similar, even if one systematically substitutes some words. Conversely, if the frequency profiles are different, it can/will classify documents as different, even if they share frequent use of a few specific terms (e.g., "file" being related to a computer in some cases, and a thing that's used to cut and smooth metal in other cases).

LSI is also typically used with relatively large groups of documents. The other documents can help in finding similarities as well -- even if document A and B look substantially different, if document C uses quite a few terms from both A and B, it can help in finding that A and B are really fairly similar.

TechQA.

Latent Semantic Indexing

There are 2 answers

Related Questions in ALGORITHM

Related Questions in LATENT-SEMANTIC-INDEXING

Popular Questions

Popular Tags

Trending Questions