A Brief History of Information Retrieval

The Classic Model for Information Retrieval

Information retrieval (IR) can be understood as the task of finding material (usually documents) of an unstructured nature (usually text), which satisfies information need from within large collections (usually stored on computers). Formal retrieval models have formed the basis of IR research. Since early 1960s, a number of different models have been developed to describe aspects of the retrieval task: document content and structure, inter-document linkage, queries, users, their information needs and the context in which the retrieval task is embedded. The reliability on formal retrieval models is one of the great strengths of IR research [1,2,3, 4].

While using an IR system, a user, driven by an information need, constructs a query in some query language. The query is then submitted to a system that selects from a collection of documents (corpus), those documents which match the query as indicated by certain matching rules. A query refinement process might be used to create a new query and/or to refine the results.

An IR system typically consists of three main subsystems: document representation, representation of user’s requirements (queries), and the algorithms used to match user requirements (queries) with document representations. A document collection consists of many documents containing information about various subjects or topics of interests [5]. Document contents are transformed into a document representation (either manually or automatically) which is done in a way such that matching these with queries is easy and these representations should correctly reflect the author's intention [4,5]. The primary concern in representation is how to select proper index terms. Typically, representation proceeds by extracting keywords that are considered as content identifiers and organizing them into a given format. Queries transform the user's information need into a form that correctly represents the user's underlying information requirement and is suitable for the matching process [6,7]. A matching algorithm matches a user's requests (in terms of queries) with the document representations and retrieves documents that are most likely to be relevant to the user. A lot of theoretical models from natural language processing, statistical text analysis, word-stemming, stop lists and information theory have been experimented with the IR system. In order to find useful information, two paradigms are well-established in traditional information retrieval. Searching is a discovery paradigm which is useful for a user who knows precisely what to look for, while browsing is a paradigm useful for a user who is either unfamiliar with the content of the data collection or who has casual knowledge of the jargon used in a particular discipline. Browsing and searching complement each other, and they are most effective when used together [6,7]

Since, in the Web context, the human–computer interaction factors and the cognitive aspects play a significant role [9], it is useful to detail this model further as in Figure 1.2. IR systems recognize that the information need is associated with some task. This need is verbalized (usually mentally, not loud) and translated into a query posed to a search engine. This process of deriving a query from an information need in the Web context has received a great deal of attention.

Evolution of Modern WebIR

In 1995, everything changed with the creation of the web. Web objects are the largest collection of information ever created by humans, and this collection changes continuously when new objects are created and old ones removed. In order to adapt to this changed scenario, a new discipline has been created: Web Information Retrieval [8,9]. It uses some concepts of traditional IR, and introduces many innovative ones. Modern WebIR [10] is a discipline which has exploited some of the classical results of information retrieval, thereby developing innovative models of information access. A recent report showed that 80% of Web surfers discover new sites (that they visit) through search engines [10] (such as Ask, Google, MSN or Yahoo).

1. Ellis D., “Behavioal Approach to Information Retrieval”, Journal of Documentation, Vol.46, pp191-213,1989.

2. Ellis, D., “Modeling the Information Seeking Patterns of Academic Users: a Ground Theory Approach”, Library Quarterly, Vol. 63, 4, pp 69-86,1993.

3. Finin Tim,Mayfield James, Joshi Anupam, "Information Retrieval and the Semantic Web", Proceedings of the 38th Hawaii International Conference on System Sciences – 2005.

4. Salton G. and McGill M.,"Introduction to Modern Information Re-trieval",McGraw-Hill, New York,1983.

5. Salton G. and Buckley C.,”Improving retrieval performance by relevance feedback”, J. ASIST,1990, 41 4, 288-287.

6. Jansen Bernard J.,” Paid Search”, IEEE Internet Computing Report,2005.

7. Jarvelin K. and Kek¨al¨ainen J. ,"Cumulated gain-based evaluation of IR techniques", ACM Trans. Inf. Syst., 20(4):422–446, 2002.

8. Pitkow James Edward, “Characterstics World Wide Web Ecologies”, Thesis,Georgia Institute of Technology,1997.

9. Ricardo Baeza-Yates and Berthier Ribeiro-Neto,"Modern Information Retrieval",Addison-Wesley, 1999.

10. Singhal Amit (Google, Inc.),"Modern Information Retrieval: A Brief Overview",Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.

System Analysis and Design (SAD)

Introduction to System Analysis and Design (SAD) System are created to solve Problems. One can think of the systemsapproch as an organised way of dealing with a problem. In this dynamic world , the subject system analysis and design, mainly deals with the software development activities. This post include:- What is System? What are diffrent Phases of System Development Life Cycle? What are the component of system analysis? What are the component of system designing? What is System? A collection of components that work together to realize some objectives forms a system. Basically there are three major components in every system, namely input, processing and output. In a system the different components are connected with each other and they are interdependent. For example, human body represents a complete natural system. We are also bound by many national systems such as political system, economic system, educational system and so forth. The objective of the system demands tha...

Shruti Speak's

Search This Blog

A Brief History of Information Retrieval

Labels

Comments

Popular posts from this blog

Simulation Practice Problems

System Analysis and Design (SAD)