LSI (Latent Semantic Indexing)
What is LSI (Latent Semantic Indexing)?
LSI (Latent Semantic Indexing) – A search engine indexing method that creates a relationship between words and phrases to form a better understanding of a text’s subject matter. Latent semantic indexing helps search engines serve up results to queries with higher precision.
Latent semantic indexing, sometimes referred to as latent semantic analysis, is a mathematical method developed in the late 1980s to improve the accuracy of information retrieval. It uses a technique called singular value decomposition to scan unstructured data within documents and identify relationships between the concepts contained therein.
In Principle, it finds the hidden (latent) relationships between words (semantics) in order to improve information understanding (indexing).
It provided a significant step forward for the field of text comprehension as it accounted for the contextual nature of language.
Past technologies struggled with the use of synonyms that characterizes natural language use, and the changes in meanings that come with new surroundings.
For example, the words ‘bike’ and ‘week’ may seem easy to understand, but both have multiple definitions based on how they are used. Put both together and you have a whole new concept altogether.
So how can we train a machine to adapt to these distinctions?
This is a problem that has troubled great minds for centuries and LSI has helped computers to start understanding language in use.
It works best on static content and on small sets of documents, which was great for its initial purposes. LSI also allows documents to be clustered together based on their thematic commonalities, which was a very useful capability for early search engines.
LSI (Latent semantic indexing) can be summarized as follows:
- A technology developed in the late 1980s for information retrieval, in response to earlier technologies that could not understand synonymy or polysemy.
- A specific approach that tries to grasp the underlying structure of meaning in language.
- Capable of inducing from these findings the hierarchical categories into which terms and concepts fall.
- Originally useful for working on small sets of static documents.
« Back to Glossary Index