The classic model for IR
An IR system typically consists of three main
subsystems: document representation, representation of user’s requirements
(queries), and the algorithms used to match user requirements (queries) with
document representations. A document collection consists of many documents
containing information about various subjects or topics of interests [1].
Document contents are transformed into a document representation (either
manually or automatically) which is done in a way such that matching these with
queries is easy and these representations should correctly reflect the author's
intention [2]. The primary concern in representation is how to select proper
index terms. Typically, representation proceeds by extracting keywords that are
considered as content identifiers and organizing them into a given format.
Queries transform the user's information need into a form that correctly
represents the user's underlying information requirement and is suitable for
the matching process [3,4]. A matching algorithm matches a user's requests
(in terms of queries) with the document representations and retrieves documents
that are most likely to be relevant to the user. A lot of theoretical models from natural language processing,
statistical text analysis, word-stemming, stop lists and information theory
have been experimented with the IR system. In order to find useful information,
two paradigms are well-established in traditional information retrieval. Searching is a discovery paradigm which
is useful for a user who knows precisely what to look for, while browsing is a paradigm useful for a user
who is either unfamiliar with the content of the data collection or who has
casual knowledge of the jargon used in a particular discipline. Browsing and
searching complement each other, and they are most effective when used together
[5,6].
Since, in the Web context, the human–computer
interaction factors and the cognitive aspects play a significant role [7], it
is useful to detail this model further. IR systems recognize
that the information need is associated with some task. This need is verbalized
(usually mentally, not loud) and translated into a query posed to a search
engine. This process of deriving a query from an information need in the Web
context has received a great deal of attention.
1. Allan J. ,
Carterette B., and Lewis J.,
" When will information retrieval be good enough?", In
SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on
Research and development in information retrieval, pages 433–440, New York, NY,
USA,ACM Press,2005.
2. Lee U. et al.,"Automatic
identification of user goals in web search", Proceedings of WWW
2005.ACM Press.
3. Jarvelin K. and Kek¨al¨ainen J. ,"Cumulated
gain-based evaluation of IR techniques", ACM Trans. Inf. Syst.,
20(4):422–446, 2002.
4. Jarvelin K and Kekalainen J.,"IR evaluation
methods for retrieving highly relevant documents", In Proceedings
of the ACM Conference on Research and Development on Information Retrieval
(SIGIR), 2000.
5. Hellmann M. ,”Fuzzy Logic
Introduction”, Epsilon Nought Radar Remote Sensing Tutorials, 2001.
6. Montebello M., “Wrapping
WWW Information Sources”, Proceedings of the 2000 International
Database Engineering and Applications Symposium (IDEAS’00).
7. Lieberman H. and
Selker. T, "Out of context: Computer systems that adapt to, and learn
from, context", IBM
Systems Journal 39(3 & 4),2007.
Comments