Modern Web IR

Evolution of Modern WebIR

In 1995, everything changed with the creation of the web. Web objects are the largest collection of information ever created by humans, and this collection changes continuously when new objects are created and old ones removed. In order to adapt to this changed scenario, a new discipline has been created: Web Information Retrieval [1,2,3]. It uses some concepts of traditional IR, and introduces many innovative ones. Modern WebIR [4] is a discipline which has exploited some of the classical results of information retrieval, thereby developing innovative models of information access. A recent report showed that 80% of Web surfers discover new sites (that they visit) through search engines [4,5] (such as Ask, Google, MSN or Yahoo).

1.3.1 Types of Modern WebIR

Information retrieval on the Web can be broadly classified into two technologies:

1. Question Answering Systems (QA): In information retrieval, question answering (QA) is the task of automatically answering a question posed in natural language. To find the answer to a question, a QA computer programme may use either a pre-structured database or a collection of natural language documents (a text corpus such as the World Wide Web or some local collection). Question System is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval such as document retrieval. Thus, natural language search engines are sometimes regarded as the next step beyond current search engines. QA research attempts to deal with a wide range of question types including: fact, list, definition, How, Why, hypothetical, semantically constrained, and cross-lingual questions.

· Closed-domain question answering deals with questions under a specific domain (for example, medicine or automotive maintenance), and can be seen as an easier task because NLP systems can exploit domain-specific knowledge frequently formalized in ontologies.

· Open-domain question answering deals with questions about nearly everything, and can only rely on general ontology and world knowledge. These systems usually have much more data available for extracting answers to a given query [5,6].

2. Search Engines: The goal of a modern Web Search Engine is to retrieve documents “relevant” to a user query from a given collection. Nowadays, a user query is modelled as a set of keywords extracted from a large dictionary of words; a document is typically a web page, PDF, PostScript, doc file, or whatever files that can be parsed into a set of tokens. Global Search Engines serve as de facto Internet portals, while local search engines are embedded in numerous individual websites. Search engines can be broadly categorized into the following categories on basis of their functionality [6]:

a. Crawler-Based Search Engines: Crawler-based Search Engines use automated software programmes to survey and categorize web pages [7,8]. The programmes used by the search engines to access web pages are called ‘spiders’, ‘crawlers’, ‘robots’ or ‘bots’. A spider will find a web page, download it and analyse the information presented on the web page. This is a seamless process. The web page will then be added to the search engine’s database. Then when a user performs a search, the search engine will check its database of web pages for the keywords the user searched for, and present a list of link results [8]. The results (list of suggested links to go to) are listed on pages by order of which is ‘closest’ (as defined by the ‘bots’) to what the user wants to find online. Crawler-based Search Engines are constantly searching the Internet for new web pages and updating their database of information with these new pages. Examples of Crawler-based Search Engines are Google and AskJeeves.

b. Directories: A ‘directory’ uses human editors who decide what category the site belongs to; they place websites within specific categories in the ‘directories’ database. The human editors comprehensively check the website and rank it, based on the information they find, using a pre-defined set of rules. There are two major directories (i) Yahoo! Directory (ii) Open Directory which have gained popularity.

c. Hybrid Search Engines: A Hybrid Search Engine uses a combination of both crawler-based results and directory results. More and more search engines these days are moving on to a hybrid-based model. Some of the popular Hybrid Search Engines are Yahoo! and Google].

d. Meta Search Engines: Unlike other search engines, Meta Search Engines gather results from different search engines results, and combine them into one large listing. Some of the popular Meta Search Engines are ‘Meta crawler’, ‘Dog pile’.

e. Specialty Search Engines: Specialty Search Engines have been developed to cater for the demands of niche areas. The promise is of higher relevancy within the given context. Overall, all of these specialization approaches have one goal in common: increased precision with simple queries. There are various dimensions for search specialization. Some of them are listed below:

(i) File format: Specialization is done by file or mime type of a document (e.g. Yahoo! Image Search 4).

(ii) Geography/Language: Major search engines offer language specific search applications (e.g. Google.in and Google.uk).

(iii) Transient Information: Specialization is done by focusing on transient information. Google News 6 or Day pop 7 are examples of search engines which focus on transient information.

(iv) Document Type: Specialization is done by the intent of the document.

(v) User Intent: Specialization is done by looking at the user’s intent (what a user is searching) at the moment of search. E.g., search engines could base specialization on Broder’s [7] classification of queries into informational, navigational and transactional queries.

(vi) Search Context: Specialization is done by applying and leveraging the search context to augment the query.

There are many specialty search engines dedicated for shopping, e.g., Froogle, Yahoo! Shopping, BizRate and PriceGrabber; some are used to conduct local searches, e.g., NZPages(a New Zealand Website Directory), SearchNZ(a New Zealand Search-based Engine). Some search engines are specifically used to conduct Domain Name Searches, e.g., iServe and Free parking. Search engines like Tucows and CNET Download.com are used to conduct freeware and shareware software-related searches.

1. Ellis D., “Behavioal Approach to Information Retrieval”, Journal of Documentation, Vol.46, pp191-213,1989.

2. Pitkow James Edward, “Characterstics World Wide Web Ecologies”, Thesis,Georgia Institute of Technology,1997.

3. Ricardo Baeza-Yates and Berthier Ribeiro-Neto,"Modern Information Retrieval",Addison-Wesley, 1999.

4. Singhal Amit (Google, Inc.),"Modern Information Retrieval: A Brief Overview",Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.

5. Torrey C.,Churchill E.F.,McDonald D.W. ,” Learning How: The Search for Craft Knowledge on the Internet”, CHI 2009, April 3rd, 2009, Boston.

6. Sergey Brin, Lawrence Page, ”The Anatomy of a Large-Scale Hypertextual Web Search Engine”, 1998,http://www-db.stanford.edu/pub/papers/google.pdf

7. Belkin N.J.," Helping people find what they don’t know”, Comm. ACM, 43, 8, 58-61,2000.

8. Sergey Brin and Larry Page. Google search engine, http://google.stanford.edu.

9. Anagnostopoulos A., Broder A. and Carmel D., "Sampling search engine results", In Proceedings of 14th International World Wide Web Conference, pages 245–256, Chiba, Japan,2005 .

System Analysis and Design (SAD)

Introduction to System Analysis and Design (SAD) System are created to solve Problems. One can think of the systemsapproch as an organised way of dealing with a problem. In this dynamic world , the subject system analysis and design, mainly deals with the software development activities. This post include:- What is System? What are diffrent Phases of System Development Life Cycle? What are the component of system analysis? What are the component of system designing? What is System? A collection of components that work together to realize some objectives forms a system. Basically there are three major components in every system, namely input, processing and output. In a system the different components are connected with each other and they are interdependent. For example, human body represents a complete natural system. We are also bound by many national systems such as political system, economic system, educational system and so forth. The objective of the system demands tha...

Shruti Speak's

Search This Blog

Modern Web IR

Labels

Comments

Popular posts from this blog

Inter-Organizational Value Chain

System Analysis and Design (SAD)