Skip to main content

Journey of Information Retrieval

                                                            Background

The explosion of the World Wide Web (more commonly referred to as the Web) as an important information source has moulded the behaviour of many information seekers and consumers [1,2,3]. With such a popularity of the Web, a new discipline based on the concepts of traditional information retrieval (IR), called the Web information retrieval (WebIR) has been created; many innovative ones have also been introduced. In 1999, the Web was estimated to have only one–two billion publicly accessible pages, but was growing exponentially. Search systems, primarily viewed as tools for topical research, are now often used in a growing number of tasks, including navigation and shopping assistance. As more and more users are relying on the Internet for information, search engines have emerged as a handy tool for information retrieval. This is clearly apparent on the World Wide Web, where the growth of available information and services has made search engine usage the second most common online activity next to email [4]. Google currently claims that their index contains over eight billion pages; others also claim index sizes in the billions [5]. The expertise of a search engine lies in its search algorithm which is a major player in fetching results for user query [6]. Apart from being popular as an information-seeking vehicle, search engines have emerged as a preferred medium for advertising. Google AdSense, Yahoo! Publisher Network and MSN adCenter provide contextual advertising that is known to have allured many online ad-publishers. Their advertising strategies have not only attracted potential customers to the advertiser’s website but have also generated revenue for search engines. In lieu of such revenue, search engines are able to provide free search results to the seekers. In short, online industry is now blessed with a relatively sustainable model [214].

Though search engines have established themselves as revolutionary working metaphor, the field of information retrieval (IR) is not a new discipline. Information retrieval can be understood as the branch of computer science, which deals with facilitating access to large collections of data. The field of information retrieval spans a number of sub-areas, including information retrieval per se, as performed by the users of Internet search engines or digital libraries: text categorization, which labels text documents with one or more predefined categories (possibly organized in a hierarchy); information filtering (or routing), which matches input documents with user’s interest profiles; and question answering, which aims to extract specific (and preferably short) answers rather than providing full documents containing them.  In the 1960s, Gerard Salton developed SMART, an experimental information retrieval system. He showed that the traditional task of IR was to retrieve the most “relevant” set of documents from a collection of documents for a given query. The seminal series of early IR experiments were those on the SMART system by Gerard Salton and colleagues [168, 169]. User studies on the effectiveness of IR systems began more recently and since then have gained popularity.



Though search engines have established themselves as revolutionary working metaphor, the field of information retrieval (IR) is not a new discipline. Information retrieval can be understood as the branch of computer science, which deals with facilitating access to large collections of data [7]. The field of information retrieval spans a number of sub-areas, including information retrieval per se, as performed by the users of Internet search engines or digital libraries: text categorization, which labels text documents with one or more predefined categories (possibly organized in a hierarchy); information filtering (or routing), which matches input documents with user’s interest profiles; and question answering, which aims to extract specific (and preferably short) answers rather than providing full documents containing them [7 161, 193, 211]. Rigorous formal testing of IR systems was first done in the Cranfield experiments, beginning in the late 1950s [8]. In the 1960s, Gerard Salton developed SMART, an experimental information retrieval system. He showed that the traditional task of IR was to retrieve the most “relevant” set of documents from a collection of documents for a given query. The seminal series of early IR experiments were those on the SMART system by Gerard Salton and colleagues [8,9]. User studies on the effectiveness of IR systems began more recently and since then have gained popularity.


1.  Bruce Harry,“A User Oriented View of Internet as Information Infrastructure”, Proceedings of an international conference on Information seeking in context,ACM,1997. 

2. Hewson, Claire, Peter Yule, Diana Laurent and Carl Vogel, ”Internet Research Methods: a Practical Guide to the Social and Behavioral Sciences”, Sage Publications, London, United Kingdom,2002.

3.  Montebello M., “Wrapping WWW Information Sources”, Proceedings of the 2000 International Database Engineering and Applications Symposium (IDEAS’00).

4.     Gena Cristina  and Weibelzahl Stephan,” Usability Engineering for the Adaptive Web”, http://www.springerlink.com/content/c87l5h7872762163/.

5. Sergey Brin and Larry Page. Google search engine, http://google.stanford.edu.
6.   Xing  Bo and  Lin Zhangxi," The Impact of Search Engine Optimization on Online Advertising Market”, ACM International Conference Proceeding Series.Vol. 156.,2004.     
7.     Gulil Antonino,"On Two Web IR Boosting Tools:Clustering and Ranking",Ph.D. Thesis, University of Pisa. May 2006

         8. Salton G. and McGill M.,"Introduction to Modern Information Re-trieval",McGraw-Hill, New York,1983.
         9. Salton G. and Buckley C.,”Improving retrieval performance by relevance feedback”, J. ASIST,1990, 41 4, 288-287.






Comments