Background
The explosion of the World Wide Web (more commonly
referred to as the Web) as an important information source has moulded the
behaviour of many information seekers and consumers [1,2,3]. With
such a popularity of the Web, a new discipline based on the concepts of
traditional information retrieval (IR), called the Web information retrieval
(WebIR) has been created; many innovative ones have also been introduced.
In 1999, the Web was estimated to have only one–two billion publicly accessible
pages, but was growing exponentially. Search systems, primarily viewed as tools
for topical research, are now often used in a growing number of tasks,
including navigation and shopping assistance. As more
and more users are relying on the Internet for information, search engines have
emerged as a handy tool for information retrieval. This is clearly apparent on the World Wide Web,
where the growth of available information and services has made search engine
usage the second most common online activity next to email [4]. Google
currently claims that their index contains over eight billion pages; others
also claim index sizes in the billions [5]. The
expertise of a search engine lies in its search algorithm which is a major
player in fetching results for user query [6]. Apart from being popular as an
information-seeking vehicle, search engines have emerged as a preferred medium
for advertising. Google AdSense, Yahoo! Publisher Network and MSN adCenter
provide contextual advertising that is known to have allured many online
ad-publishers. Their advertising strategies have not only attracted potential customers
to the advertiser’s website but have also generated revenue for search engines.
In lieu of such revenue, search engines are able to provide free search results
to the seekers. In short, online industry is now blessed with a relatively
sustainable model [214].
Though search engines have established themselves as revolutionary
working metaphor, the field of information retrieval (IR) is not a new
discipline. Information retrieval can
be understood as the branch of computer science, which deals with facilitating
access to large collections of data. The field of information retrieval
spans a number of sub-areas, including information retrieval per se, as performed by the users of
Internet search engines or digital libraries: text categorization, which labels
text documents with one or more predefined categories (possibly organized in a
hierarchy); information filtering (or routing), which matches input documents
with user’s interest profiles; and question answering, which aims to extract
specific (and preferably short) answers rather than providing full documents
containing them. In
the 1960s, Gerard Salton developed SMART, an experimental information retrieval
system. He showed that the traditional task of IR was to retrieve the most
“relevant” set of documents from a collection of documents for a given query.
The seminal series of early IR experiments were those on the SMART system by
Gerard Salton and colleagues [168, 169]. User
studies on the effectiveness of IR systems began more recently and since then
have gained popularity.
Though search engines have established themselves as revolutionary
working metaphor, the field of information retrieval (IR) is not a new
discipline. Information retrieval can
be understood as the branch of computer science, which deals with facilitating
access to large collections of data [7]. The field of information retrieval
spans a number of sub-areas, including information retrieval per se, as performed by the users of
Internet search engines or digital libraries: text categorization, which labels
text documents with one or more predefined categories (possibly organized in a
hierarchy); information filtering (or routing), which matches input documents
with user’s interest profiles; and question answering, which aims to extract
specific (and preferably short) answers rather than providing full documents
containing them [7 161, 193, 211]. Rigorous formal testing of IR systems was
first done in the Cranfield experiments, beginning in the late 1950s [8]. In
the 1960s, Gerard Salton developed SMART, an experimental information retrieval
system. He showed that the traditional task of IR was to retrieve the most
“relevant” set of documents from a collection of documents for a given query.
The seminal series of early IR experiments were those on the SMART system by
Gerard Salton and colleagues [8,9]. User
studies on the effectiveness of IR systems began more recently and since then
have gained popularity.
1. Bruce Harry,“A
User Oriented View of Internet as Information Infrastructure”, Proceedings of
an international conference on Information seeking in context,ACM,1997.
2. Hewson, Claire, Peter Yule,
Diana Laurent and Carl Vogel, ”Internet Research Methods: a Practical
Guide to the Social and Behavioral Sciences”, Sage Publications, London , United
Kingdom ,2002.
3. Montebello M., “Wrapping
WWW Information Sources”, Proceedings of the 2000 International
Database Engineering and Applications Symposium (IDEAS’00).
4. Gena Cristina and Weibelzahl Stephan,” Usability
Engineering for the Adaptive Web”, http://www.springerlink.com/content/c87l5h7872762163/.
5. Sergey Brin
and Larry Page. Google search engine, http://google.stanford.edu.
6. Xing Bo and
Lin Zhangxi," The Impact of Search Engine Optimization on
Online Advertising Market”, ACM International Conference Proceeding
Series.Vol. 156.,2004.
7. Gulil Antonino,"On
Two Web IR Boosting Tools:Clustering and Ranking",Ph.D. Thesis, University of Pisa . May 2006
8. Salton G. and McGill M.,"Introduction to Modern
Information Re-trieval",McGraw-Hill ,
New York ,1983.
Comments