Shruti Speak's

Posts

Showing posts from May 23, 2013

Web Search Technology [Lecture notes Information retrieval]

Web search engines work by storing information about many web pages, which they retrieve from the Web itself. Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site [1]. This is what it means when someone refers to a site being "spidered" or "crawled". The spider returns to the site on a regular basis, such as every month or two, to look for changes. Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalogue, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with the new information. Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed". Until ...

Lecture Notes Infromation Retrieval (cont)

Acronyms and Abbreviations CSV: Comma Separated Values. DARPA: Defense Advanced Research Projects Agency. EOI: Effort of Improvement. 4. ISP: Internet Service Provider. 5. NIST: National Institute of Standards and Technology 6. PEOU: Perceived Ease of Usefulness. PU: Perceived Usefulness. RBS: Rule Based System. SEO: Search Engine Optimization. SERP: Search Engine Result Pages. SU: System Usage. TAM: Technology Acceptance Model. Text Retrieval Conference UDA: User Dependency Algorithm. UI: User Intention. URL: Uniform Resource Locator. VDC: Vicious Dependency Cycle. WG: Web Graph.

Modern Web IR

Evolution of Modern WebIR In 1995, everything changed with the creation of the web. Web objects are the largest collection of information ever created by humans, and this collection changes continuously when new objects are created and old ones removed. In order to adapt to this changed scenario, a new discipline has been created: Web Information Retrieval [1,2,3]. It uses some concepts of traditional IR, and introduces many innovative ones. Modern WebIR [4] is a discipline which has exploited some of the classical results of information retrieval, thereby developing innovative models of information access. A recent report showed that 80% of Web surfers discover new sites (that they visit) through search engines [4,5] (such as Ask, Google, MSN or Yahoo). 1.3.1 Types of Modern WebIR Information retrieval on the Web can be broadly classified into two technologi...

Classic Model of Web IR

The classic model for IR An IR system typically consists of three main subsystems: document representation, representation of user’s requirements (queries), and the algorithms used to match user requirements (queries) with document representations. A document collection consists of many documents containing information about various subjects or topics of interests [1]. Document contents are transformed into a document representation (either manually or automatically) which is done in a way such that matching these with queries is easy and these representations should correctly reflect the author's intention [2]. The primary concern in representation is how to select proper index terms. Typically, representation proceeds by extracting keywords that are considered as content identifiers and organizing them into a given format. Queries transform the user's information need into a form that correctly represents the user's underlying information requirement and is suitable...

A Brief History of Information Retrieval

The Classic Model for Information Retrieval Information retrieval (IR) can be understood as the task of finding material (usually documents) of an unstructured nature (usually text), which satisfies information need from within large collections (usually stored on computers). Formal retrieval models have formed the basis of IR research. Since early 1960s, a number of different models have been developed to describe aspects of the retrieval task: document content and structure, inter-document linkage, queries, users, their information needs and the context in which the retrieval task is embedded. The reliability on formal retrieval models is one of the great strengths of IR research [1,2,3, 4]. While using an IR system, a user, driven by an information need, constructs a query in some query language. The query is then submitted to a system...

Journey of Information Retrieval

Background The explosion of the World Wide Web (more commonly referred to as the Web) as an important information source has moulded the behaviour of many information seekers and consumers [1,2,3]. With such a popularity of the Web, a new discipline based on the concepts of traditional information retrieval (IR), called the Web information retrieval (WebIR) has been created; many innovative ones have also been introduced. In 1999, the Web was estimated to have only one–two billion publicly accessible pages, but was growing exponentially. Search systems, primarily viewed as tools for topical research, are now often used in a growing number of tasks, including navigation and shopping assistance. A s more and more users are relying on the Internet for information, search engin...