ABSTRACT

Latent Semantic Indexing (LSI) is a proven successful indexing and retrieval method. This method is based on an automated, mathematical technique known as singular value decomposition (SVD). Given a large information database, LSI uses SVD to create a “semantic space” of the document collection where both terms and documents are represented. It does this by producing a reduced dimensional vector space in which the underlying or “latent” semantic structure in the pattern of word usage of the document collection emerges. Similarities between terms, terms and documents, or documents in the document collection are then based on semantic content not on individual terms. This ability to extract meaning of terms and documents has given LSI success in many different applications.