2.1. info retrieval#

Some notes on information retrieval, based on UVA”s Info Retrieval course.

2.1.1. introduction#

  • building blocks of search engines

    • search (user initiates)

    • reccomendations - proactive search engine (program initiates e.g. pandora, netflix)

    • information retrieval - activity of obtaining info relevant to an information need from a collection of resources

    • information overload - too much information to process

    • memex - device which stores records so it can be consulted with exceeding speed and flexibility (search engine)

  • IR pieces

    1. Indexed corpus (static)

      • crawler and indexer - gathers the info constantly, takes the whole internet as input and outputs some representation of the document

        • web crawler - automatic program that systematically browses web

      • document analyzer - knows which section has what -takes in the metadata and outputs the index (condensed), manage content to provide efficient access of web documents

    2. User

      • query parser - parses the search terms into managed system representation

    3. Ranking

      • ranking model -takes in the query representation and the indices, sorts according to relevance, outputs the results

      • also need nice display

      • query logs - record user’s search history

      • user modeling - assess user’s satisfaction

  • steps

    1. repository -> document representation

    2. query -> query representation

    3. ranking is performed between the 2 representations and given to the user

    4. evaluation - by users

  • information retrieval:

    1. reccomendation

    2. question answering

    3. text mining

    4. online advertisement