Skip to main content

A Brief History of Information Retrieval

                                     The Classic Model for Information Retrieval 


Information retrieval (IR) can be understood as the task of finding material (usually documents) of an unstructured nature (usually text), which satisfies information need from within large collections (usually stored on computers). Formal retrieval models have formed the basis of IR research. Since early 1960s, a number of different models have been developed to describe aspects of the retrieval task: document content and structure, inter-document linkage, queries, users, their information needs and the context in which the retrieval task is embedded. The reliability on formal retrieval models is one of the great strengths of IR research [1,2,3, 4].

While using an IR system, a user, driven by an information need, constructs a query in some query language. The query is then submitted to a system that selects from a collection of documents (corpus), those documents which match the query as indicated by certain matching rules. A query refinement process might be used to create a new query and/or to refine the results.

An IR system typically consists of three main subsystems: document representation, representation of user’s requirements (queries), and the algorithms used to match user requirements (queries) with document representations. A document collection consists of many documents containing information about various subjects or topics of interests [5]. Document contents are transformed into a document representation (either manually or automatically) which is done in a way such that matching these with queries is easy and these representations should correctly reflect the author's intention [4,5]. The primary concern in representation is how to select proper index terms. Typically, representation proceeds by extracting keywords that are considered as content identifiers and organizing them into a given format. Queries transform the user's information need into a form that correctly represents the user's underlying information requirement and is suitable for the matching process [6,7]. A matching algorithm matches a user's requests (in terms of queries) with the document representations and retrieves documents that are most likely to be relevant to the user. A lot of theoretical models from natural language processing, statistical text analysis, word-stemming, stop lists and information theory have been experimented with the IR system. In order to find useful information, two paradigms are well-established in traditional information retrieval. Searching is a discovery paradigm which is useful for a user who knows precisely what to look for, while browsing is a paradigm useful for a user who is either unfamiliar with the content of the data collection or who has casual knowledge of the jargon used in a particular discipline. Browsing and searching complement each other, and they are most effective when used together [6,7]

Since, in the Web context, the human–computer interaction factors and the cognitive aspects play a significant role [9], it is useful to detail this model further as in Figure 1.2. IR systems recognize that the information need is associated with some task. This need is verbalized (usually mentally, not loud) and translated into a query posed to a search engine. This process of deriving a query from an information need in the Web context has received a great deal of attention.

Evolution of Modern WebIR

In 1995, everything changed with the creation of the web. Web objects are the largest collection of information ever created by humans, and this collection changes continuously when new objects are created and old ones removed. In order to adapt to this changed scenario, a new discipline has been created: Web Information Retrieval [8,9]. It uses some concepts of traditional IR, and introduces many innovative ones. Modern WebIR [10] is a discipline which has exploited some of the classical results of information retrieval, thereby developing innovative models of information access. A recent report showed that 80% of Web surfers discover new sites (that they visit) through search engines [10] (such as Ask, Google, MSN or Yahoo).

1.      Ellis D., “Behavioal Approach to Information Retrieval”,  Journal of Documentation, Vol.46, pp191-213,1989. 

2.      Ellis, D., “Modeling the Information Seeking Patterns of Academic Users: a Ground Theory Approach”, Library Quarterly, Vol. 63, 4, pp 69-86,1993.

3.      Finin Tim,Mayfield  James,  Joshi Anupam, "Information Retrieval and the Semantic Web", Proceedings of the 38th Hawaii International Conference on System Sciences – 2005.



4.  Salton G. and McGill M.,"Introduction to Modern Information Re-trieval",McGraw-Hill, New York,1983.

5. Salton G. and Buckley C.,”Improving retrieval performance by relevance feedback”, J. ASIST,1990, 41 4, 288-287.

6.  Jansen  Bernard J.,” Paid Search”, IEEE Internet Computing Report,2005.

7.     Jarvelin  K. and Kek¨al¨ainen J. ,"Cumulated gain-based evaluation of IR techniques", ACM Trans. Inf. Syst., 20(4):422–446, 2002.

8.     Pitkow James Edward, “Characterstics World Wide Web Ecologies”, Thesis,Georgia Institute of Technology,1997.
9.   Ricardo Baeza-Yates and Berthier Ribeiro-Neto,"Modern Information Retrieval",Addison-Wesley, 1999.
10.        Singhal Amit (Google, Inc.),"Modern Information Retrieval: A Brief Overview",Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.



Comments

Popular posts from this blog

Advantages and Disadvantages of EIS Advantages of EIS Easy for upper-level executives to use, extensive computer experience is not required in operations Provides timely delivery of company summary information Information that is provided is better understood Filters data for management Improves to tracking information Offers efficiency to decision makers Disadvantages of EIS System dependent Limited functionality, by design Information overload for some managers Benefits hard to quantify High implementation costs System may become slow, large, and hard to manage Need good internal processes for data management May lead to less reliable and less secure data

Inter-Organizational Value Chain

The value chain of   a company is part of over all value chain. The over all competitive advantage of an organization is not just dependent on the quality and efficiency of the company and quality of products but also upon the that of its suppliers and wholesalers and retailers it may use. The analysis of overall supply chain is called the value system. Different parts of the value chain 1.  Supplier     2.  Firm       3.   Channel 4 .   Buyer

Big-M Method and Two-Phase Method

Big-M Method The Big-M method of handling instances with artificial  variables is the “commonsense approach”. Essentially, the notion is to make the artificial variables, through their coefficients in the objective function, so costly or unprofitable that any feasible solution to the real problem would be preferred, unless the original instance possessed no feasible solutions at all. But this means that we need to assign, in the objective function, coefficients to the artificial variables that are either very small (maximization problem) or very large (minimization problem); whatever this value,let us call it Big M . In fact, this notion is an old trick in optimization in general; we  simply associate a penalty value with variables that we do not want to be part of an ultimate solution(unless such an outcome is unavoidable). Indeed, the penalty is so costly that unless any of the  respective variables' inclusion is warranted algorithmically, such variables will never be p