info retrieval

6.3. info retrieval#

building blocks of search engines
- search (user initiates)
- reccomendations - proactive search engine (program initiates e.g. pandora, netflix)
- information retrieval - activity of obtaining info relevant to an information need from a collection of resources
- information overload - too much information to process
- memex - device which stores records so it can be consulted with exceeding speed and flexibility (search engine)
IR pieces
1. Indexed corpus (static)
  - crawler and indexer - gathers the info constantly, takes the whole internet as input and outputs some representation of the document
    - web crawler - automatic program that systematically browses web
  - document analyzer - knows which section has what -takes in the metadata and outputs the index (condensed), manage content to provide efficient access of web documents
2. User
  - query parser - parses the search terms into managed system representation
3. Ranking
  - ranking model -takes in the query representation and the indices, sorts according to relevance, outputs the results
  - also need nice display
  - query logs - record user’s search history
  - user modeling - assess user’s satisfaction
steps
1. repository -> document representation
2. query -> query representation
3. ranking is performed between the 2 representations and given to the user
4. evaluation - by users
information retrieval:
1. reccomendation
2. question answering
3. text mining
4. online advertisement