Skip to main content

Posts

Showing posts with the label lecture notes Information retrieval

Web Analytics

Due to the growth of WWW related technologies, the number of web sites on the Internet has increased rapidly, and human daily life is beginning to depend on such sites like shopping sites, official sites of enterprises, promotion sites of events and so on.   Each website has different types of information or content e.g. articles, blogs, newsletters, and training videos. These web sites contain a variety of content and complex link structures. Therefore it also requires an understanding of what content draws users attention and how users interact with that content. As content on any website is one of the most important element, we need to optimize content.   For content optimization we need some metrics to tell us how each aspect of the content performs. How does the content on the web site affect the traffic patterns? Does it lead users to the site?   Is there content on the site that performs better than we expect it to?   Web site administrators, who are constantly required to

Web Search Technology [Lecture notes Information retrieval]

Web search engines work by storing information about many web pages, which they retrieve from the Web itself. Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site [1]. This is what it means when someone refers to a site being "spidered" or "crawled". The spider returns to the site on a regular basis, such as every month or two, to look for changes. Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalogue, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with the new information. Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed". Until

Lecture Notes Infromation Retrieval (cont)

Acronyms and Abbreviations CSV: Comma Separated Values. DARPA: Defense Advanced Research Projects Agency.  EOI: Effort of Improvement. 4.       ISP: Internet Service Provider. 5.       NIST: National Institute of Standards and Technology 6.       PEOU: Perceived Ease of Usefulness. PU: Perceived Usefulness. RBS: Rule Based System. SEO: Search Engine Optimization. SERP: Search Engine Result Pages. SU: System Usage. TAM: Technology Acceptance Model. Text Retrieval Conference UDA: User Dependency Algorithm. UI: User Intention. URL: Uniform Resource Locator. VDC: Vicious Dependency Cycle. WG: Web Graph.

Modern Web IR

                                 Evolution of Modern WebIR In 1995, everything changed with the creation of the web. Web objects are the largest collection of information ever created by humans, and this collection changes continuously when new objects are created and old ones removed. In order to adapt to this changed scenario, a new discipline has been created: Web Information Retrieval [1,2,3]. It uses some concepts of traditional IR, and introduces many innovative ones. Modern WebIR [4] is a discipline which has exploited some of the classical results of information retrieval, thereby developing innovative models of information access. A recent report showed that 80% of Web surfers discover new sites (that they visit) through search engines [4,5] (such as Ask, Google, MSN or Yahoo). 1.3.1 Types of Modern WebIR Information retrieval on the Web can be broadly classified into two technologies: 1. Question Answering Systems (QA): In information retrieval , question ans

Classic Model of Web IR

The classic model for IR An IR system typically consists of three main subsystems: document representation, representation of user’s requirements (queries), and the algorithms used to match user requirements (queries) with document representations. A document collection consists of many documents containing information about various subjects or topics of interests [1]. Document contents are transformed into a document representation (either manually or automatically) which is done in a way such that matching these with queries is easy and these representations should correctly reflect the author's intention [2]. The primary concern in representation is how to select proper index terms. Typically, representation proceeds by extracting keywords that are considered as content identifiers and organizing them into a given format. Queries transform the user's information need into a form that correctly represents the user's underlying information requirement and is suitable

A Brief History of Information Retrieval

                                     The Classic Model for Information Retrieval   Information retrieval (IR) can be understood as the task of finding material (usually documents) of an unstructured nature (usually text), which satisfies information need from within large collections (usually stored on computers). Formal retrieval models have formed the basis of IR research. Since early 1960s, a number of different models have been developed to describe aspects of the retrieval task: document content and structure, inter-document linkage, queries, users, their information needs and the context in which the retrieval task is embedded. The reliability on formal retrieval models is one of the great strengths of IR research [1,2,3, 4]. While using an IR system, a user, driven by an information need, constructs a query in some query language. The query is then submitted to a system that selects from a collection of documents (corpus), those documents which match the query as ind

Journey of Information Retrieval

                                                            Background The explosion of the World Wide Web (more commonly referred to as the Web) as an important information source has moulded the behaviour of many information seekers and consumers [1,2,3]. With such a popularity of the Web, a new discipline based on the concepts of traditional information retrieval (IR), called the Web information retrieval (WebIR) has been created; many innovative ones have also been introduced. In 1999, the Web was estimated to have only one–two billion publicly accessible pages, but was growing exponentially. Search systems, primarily viewed as tools for topical research, are now often used in a growing number of tasks, including navigation and shopping assistance. A s more and more users are relying on the Internet for information, search engines have emerged as a handy tool for information retrieval. This is clearly apparent on the World Wide Web, where the growth of available information

Set of Organizations

¢ Internet Society (ISOC): Founded in 1992, an international nonprofit professional organization that provides administrative support for the Internet. Founded in 1992, ISOC is the organizational home for the standardization bodies of the Internet. ¢ Internet Engineering Task Force (IETF): Forum that   coordinates the development of new protocols and standards. Organized into working groups that are each devoted to a specific topic or protocol. Working groups document their work in reports, called Request For Comments (RFCs). ¢ IRTF (Internet Research Task Force): The Internet Research Task Force is a composed of a number of focused, long-term and small Research Groups. ¢ Internet Architecture Board (IAB) : a technical advisory group of the Internet Society, provides oversight of the architecture for the protocols and   the standardization process ¢ The Internet Engineering Steering Group (IESG) : The IESG is responsible for technical managem