Skip to main content

Journey of Information Retrieval

                                                            Background

The explosion of the World Wide Web (more commonly referred to as the Web) as an important information source has moulded the behaviour of many information seekers and consumers [1,2,3]. With such a popularity of the Web, a new discipline based on the concepts of traditional information retrieval (IR), called the Web information retrieval (WebIR) has been created; many innovative ones have also been introduced. In 1999, the Web was estimated to have only one–two billion publicly accessible pages, but was growing exponentially. Search systems, primarily viewed as tools for topical research, are now often used in a growing number of tasks, including navigation and shopping assistance. As more and more users are relying on the Internet for information, search engines have emerged as a handy tool for information retrieval. This is clearly apparent on the World Wide Web, where the growth of available information and services has made search engine usage the second most common online activity next to email [4]. Google currently claims that their index contains over eight billion pages; others also claim index sizes in the billions [5]. The expertise of a search engine lies in its search algorithm which is a major player in fetching results for user query [6]. Apart from being popular as an information-seeking vehicle, search engines have emerged as a preferred medium for advertising. Google AdSense, Yahoo! Publisher Network and MSN adCenter provide contextual advertising that is known to have allured many online ad-publishers. Their advertising strategies have not only attracted potential customers to the advertiser’s website but have also generated revenue for search engines. In lieu of such revenue, search engines are able to provide free search results to the seekers. In short, online industry is now blessed with a relatively sustainable model [214].

Though search engines have established themselves as revolutionary working metaphor, the field of information retrieval (IR) is not a new discipline. Information retrieval can be understood as the branch of computer science, which deals with facilitating access to large collections of data. The field of information retrieval spans a number of sub-areas, including information retrieval per se, as performed by the users of Internet search engines or digital libraries: text categorization, which labels text documents with one or more predefined categories (possibly organized in a hierarchy); information filtering (or routing), which matches input documents with user’s interest profiles; and question answering, which aims to extract specific (and preferably short) answers rather than providing full documents containing them.  In the 1960s, Gerard Salton developed SMART, an experimental information retrieval system. He showed that the traditional task of IR was to retrieve the most “relevant” set of documents from a collection of documents for a given query. The seminal series of early IR experiments were those on the SMART system by Gerard Salton and colleagues [168, 169]. User studies on the effectiveness of IR systems began more recently and since then have gained popularity.



Though search engines have established themselves as revolutionary working metaphor, the field of information retrieval (IR) is not a new discipline. Information retrieval can be understood as the branch of computer science, which deals with facilitating access to large collections of data [7]. The field of information retrieval spans a number of sub-areas, including information retrieval per se, as performed by the users of Internet search engines or digital libraries: text categorization, which labels text documents with one or more predefined categories (possibly organized in a hierarchy); information filtering (or routing), which matches input documents with user’s interest profiles; and question answering, which aims to extract specific (and preferably short) answers rather than providing full documents containing them [7 161, 193, 211]. Rigorous formal testing of IR systems was first done in the Cranfield experiments, beginning in the late 1950s [8]. In the 1960s, Gerard Salton developed SMART, an experimental information retrieval system. He showed that the traditional task of IR was to retrieve the most “relevant” set of documents from a collection of documents for a given query. The seminal series of early IR experiments were those on the SMART system by Gerard Salton and colleagues [8,9]. User studies on the effectiveness of IR systems began more recently and since then have gained popularity.


1.  Bruce Harry,“A User Oriented View of Internet as Information Infrastructure”, Proceedings of an international conference on Information seeking in context,ACM,1997. 

2. Hewson, Claire, Peter Yule, Diana Laurent and Carl Vogel, ”Internet Research Methods: a Practical Guide to the Social and Behavioral Sciences”, Sage Publications, London, United Kingdom,2002.

3.  Montebello M., “Wrapping WWW Information Sources”, Proceedings of the 2000 International Database Engineering and Applications Symposium (IDEAS’00).

4.     Gena Cristina  and Weibelzahl Stephan,” Usability Engineering for the Adaptive Web”, http://www.springerlink.com/content/c87l5h7872762163/.

5. Sergey Brin and Larry Page. Google search engine, http://google.stanford.edu.
6.   Xing  Bo and  Lin Zhangxi," The Impact of Search Engine Optimization on Online Advertising Market”, ACM International Conference Proceeding Series.Vol. 156.,2004.     
7.     Gulil Antonino,"On Two Web IR Boosting Tools:Clustering and Ranking",Ph.D. Thesis, University of Pisa. May 2006

         8. Salton G. and McGill M.,"Introduction to Modern Information Re-trieval",McGraw-Hill, New York,1983.
         9. Salton G. and Buckley C.,”Improving retrieval performance by relevance feedback”, J. ASIST,1990, 41 4, 288-287.






Comments

Popular posts from this blog

Advantages and Disadvantages of EIS Advantages of EIS Easy for upper-level executives to use, extensive computer experience is not required in operations Provides timely delivery of company summary information Information that is provided is better understood Filters data for management Improves to tracking information Offers efficiency to decision makers Disadvantages of EIS System dependent Limited functionality, by design Information overload for some managers Benefits hard to quantify High implementation costs System may become slow, large, and hard to manage Need good internal processes for data management May lead to less reliable and less secure data

Inter-Organizational Value Chain

The value chain of   a company is part of over all value chain. The over all competitive advantage of an organization is not just dependent on the quality and efficiency of the company and quality of products but also upon the that of its suppliers and wholesalers and retailers it may use. The analysis of overall supply chain is called the value system. Different parts of the value chain 1.  Supplier     2.  Firm       3.   Channel 4 .   Buyer

Big-M Method and Two-Phase Method

Big-M Method The Big-M method of handling instances with artificial  variables is the “commonsense approach”. Essentially, the notion is to make the artificial variables, through their coefficients in the objective function, so costly or unprofitable that any feasible solution to the real problem would be preferred, unless the original instance possessed no feasible solutions at all. But this means that we need to assign, in the objective function, coefficients to the artificial variables that are either very small (maximization problem) or very large (minimization problem); whatever this value,let us call it Big M . In fact, this notion is an old trick in optimization in general; we  simply associate a penalty value with variables that we do not want to be part of an ultimate solution(unless such an outcome is unavoidable). Indeed, the penalty is so costly that unless any of the  respective variables' inclusion is warranted algorithmically, such variables will never be p