The Classic Model for Information Retrieval
Information retrieval (IR) can be understood as the task
of finding material (usually
documents) of an unstructured nature (usually text), which satisfies information
need from within large collections (usually stored on computers). Formal
retrieval models have formed the basis of IR research. Since early 1960s, a number
of different models have been developed to describe aspects of the retrieval
task: document content and structure, inter-document linkage, queries, users,
their information needs and the context in which the retrieval task is
embedded. The reliability on formal retrieval models is one of the great
strengths of IR research [1,2,3, 4].
While using an IR system, a user, driven by an
information need, constructs a query in some query language. The query is then submitted
to a system that selects from a collection of documents (corpus), those
documents which match the query as indicated by certain matching rules. A query
refinement process might be used to create a new query and/or to refine the
results.
An IR system typically consists of three main
subsystems: document representation, representation of user’s requirements
(queries), and the algorithms used to match user requirements (queries) with
document representations. A document collection consists of many documents
containing information about various subjects or topics of interests [5].
Document contents are transformed into a document representation (either
manually or automatically) which is done in a way such that matching these with
queries is easy and these representations should correctly reflect the author's
intention [4,5]. The primary concern in representation is how to select proper
index terms. Typically, representation proceeds by extracting keywords that are
considered as content identifiers and organizing them into a given format.
Queries transform the user's information need into a form that correctly
represents the user's underlying information requirement and is suitable for
the matching process [6,7]. A matching algorithm matches a user's requests
(in terms of queries) with the document representations and retrieves documents
that are most likely to be relevant to the user. A lot of theoretical models from natural language processing,
statistical text analysis, word-stemming, stop lists and information theory
have been experimented with the IR system. In order to find useful information,
two paradigms are well-established in traditional information retrieval. Searching is a discovery paradigm which
is useful for a user who knows precisely what to look for, while browsing is a paradigm useful for a user
who is either unfamiliar with the content of the data collection or who has
casual knowledge of the jargon used in a particular discipline. Browsing and
searching complement each other, and they are most effective when used together
[6,7]
Since, in the Web context, the human–computer
interaction factors and the cognitive aspects play a significant role [9], it
is useful to detail this model further as in Figure 1.2. IR systems recognize
that the information need is associated with some task. This need is verbalized
(usually mentally, not loud) and translated into a query posed to a search
engine. This process of deriving a query from an information need in the Web
context has received a great deal of attention.
Evolution
of Modern WebIR
In
1995, everything changed with the creation of the web. Web objects are the
largest collection of information ever created by humans, and this collection
changes continuously when new objects are created and old ones removed. In
order to adapt to this changed scenario, a new discipline has been created: Web
Information Retrieval [8,9]. It uses
some concepts of traditional IR, and introduces many innovative ones. Modern
WebIR [10] is a discipline which has exploited some of the classical results
of information retrieval, thereby developing innovative models of information
access. A recent report showed that 80% of Web surfers discover new sites (that
they visit) through search engines [10] (such as Ask, Google, MSN or
Yahoo).
1.
Ellis D., “Behavioal
Approach to Information Retrieval”,
Journal of Documentation, Vol.46, pp191-213,1989.
2.
Ellis, D., “Modeling the
Information Seeking Patterns of Academic Users: a Ground Theory Approach”,
Library Quarterly, Vol. 63, 4, pp 69-86,1993.
3.
Finin Tim,Mayfield James,
Joshi Anupam, "Information Retrieval and the Semantic
Web", Proceedings of the 38th Hawaii International Conference on System
Sciences – 2005.
4. Salton G. and McGill M.,"Introduction to Modern
Information Re-trieval",McGraw-Hill ,
New York ,1983.
5. Salton G. and Buckley C.,”Improving retrieval performance by
relevance feedback”, J. ASIST,1990, 41 4, 288-287.
6. Jansen Bernard J.,” Paid Search”, IEEE
Internet Computing Report,2005.
7. Jarvelin K. and Kek¨al¨ainen J. ,"Cumulated
gain-based evaluation of IR techniques", ACM Trans. Inf. Syst.,
20(4):422–446, 2002.
8. Pitkow James Edward, “Characterstics
World Wide Web Ecologies”, Thesis,Georgia Institute of Technology,1997.
9. Ricardo Baeza-Yates and
Berthier Ribeiro-Neto,"Modern Information Retrieval",Addison-Wesley,
1999.
10. Singhal Amit (Google, Inc.),"Modern
Information Retrieval: A Brief Overview",Bulletin of the IEEE
Computer Society Technical Committee on Data Engineering.
Comments