Understanding WWW

To begin with, simple
definition of WWW also known as web, “WWW is a global information space where people can communicate via computers connected to the Internet”. Some people use "internet" and "the web" interchangeably, even though the web is a service that operates over the internet. WWW is an acronym used for World Wide Web.

Technically defining WWW can be understood as a system of Internet servers through which several Internet protocols can be accessed using a web browser. Precisely, WWW can be understood as the most commonly used services on the Internet. Almost all protocols available on the Internet are available on the WWW cloud. This flexibility not only makes it convenient but also user-friendly environment through which email, FTP, Telnet, Usenet News etc. can be accessed. Due to the ability of the Web to work with multimedia and advanced programming languages, the World Wide Web has been the growing at rocket speed and the most interesting part of the Internet.

User uses a browser software like Internet Explorer, Netscape, Google chrome, mozilla, Firefox, etc to navigate the Web. WWW also makes valuable use of hypertext along with high quality graphics. As against text based services like FTP and telnet the Web is a graphic medium with most Web pages having some amount of images.Today we also have web pages that have sound and video embedded in them.

Fig 1.1 Sample Picture describing web and its associates

The figure above represents the Internet including various services like FTP,SMTP,Telnet and World Wide Web.

WWW Protocol

Celebrity on the web is the HyperText Transfer Protocol (HTTP), as far protocols are concerned. It is the application protocol that makes the web work. Its application level protocol as it rides on top of the TCP layer in the protocol stack and is used by specific applications to talk to one another. In this case the applications are web browsers and web servers. These protocols are taken in detail in the discussions below.

Do not confuse HTTP with the Hypertext Markup Language (HTML), as HTML is the scripting language used to create web pages. Many times web pages are called hyper text documents. Hyper text documents are the documents that contain links to connect them other documents or files. The user can activate these links (through a mouse button click, for example) and the target document will then be transferred on to the client machine and if it is a web page, it would be displayed in the browser. These links can be placed on text, pictures etc in the hypertext document. A single HyperText document can contain multiple hyperlinks. It is because of all this "linking" between the WebPages a virtual web of connections is created.

Characteristics of HTTP

HTTP is a connectionless text based protocol. Clients (web browsers) send requests to web servers for web elements such as web pages and images. Please note, before an HTTP request can be made by a client, a new connection must be made to the server. After the request is serviced by a server, the connection between client and server across the Internet is disconnected. It is important to understand that a new connection is established between the client and the server each time when client makes a request.

Also Note: Most protocols are connection oriented, i.e. two computers communicating with each other keep the connection open over the Internet. HTTP does not however.

When you type a URL into a web browser HTTP will work as follows:

1. In case URL contains a domain name (like www.yahoo.com) the browser will first connect to a domain name server and retrieves the corresponding IP address for the web server. URL can be understood as the unique address of a file which can be accessed through internet.
Note: URL has been taken in detail in a later sub-section.
2. The web browser connects to the web server and sends an HTTP request for the desired web page.
3. The web server receives the request and checks for the desired page. If the page exists, the web server sends it. If the server cannot find the requested page, it will send an HTTP 404 error message. (404 mean 'Page Not Found’, many people who would like to be glued to net, would have seen it!
4. The web browser receives the page back and the connection is closed. The browser then parses through the webpage and looks for other page elements it needs to complete the webpage. These usually include images, applets, etc. for each element needed, the browser makes additional connections and HTTP requests to the server for each element. Once the browser has finished loading all images, applets, etc. the page will be displayed in the browser window.
1.2.1 Internet Basics

Basics dumb rule: No single person or entity owns internet, everyone uses internet! …there are millions of contributors and billions of users of Internet!!

The Internet is a worldwide collection of computer networks. When user is surfing internet, he/she is part of a community of millions who use computers of communicate with one another and share ideas / information.

Internet is also called as information superhighway or cyberspace. As you travel the highway, you will encounter many different information communities. Information and its accessibility is actual treasure on the internet!!

The Internet has revolutionized the computers and communications world like nothing before. Invention of the telegraph, telephone, radio, and computer set the stage for this unprecedented integration of capabilities. The Internet is at once a world-wide broadcasting capability, a mechanism for information dissemination, and a medium for collaboration and interaction between individuals and their computers irrespective of geographic location The Internet also represents one of the most successful examples of the benefits of sustained investment and commitment to research and development of information infrastructure.

Client/Server Model

Internet is based upon client/server model. It basically contains two types of computers:

1. Servers: Servers are computers which provide services or information required by other computers. Servers run special software called Web Server software to respond to client’s request. You will learn about them in the later section of this unit.
2. Clients: Computers that request information from the server are called Clients.

1.2.2. History

In 1969, Department of Defense (DoD) of United States started a network called ARPANET (Advanced Research Project Administration Network). It was an experiment carried out to reveal whether networking could be reliable? Objective of this network was setup for military to ensure that communication did not break down in the events of war. The DoD wanted to maintain contacts with military research contractors and universities in the event of war. The DoD also wanted these agencies to share software and hardware resources that they could not afford! Later, the military allowed universities to join the network. Students at these universities caught on to the network and developed much of the software that its present shapes.

APARNET quickly agree to cover the entire American continent and became a big success. Every university in the country wanted to be part of this cloud! To better structure out things, network was diversified MILNET for managing military sites and APARNET for managing non military sites.

Henceforth, it evolved into a huge network which is presently known as Internet.

Figure 1.2: Internet History

Growth of Internet

• Internet doubles each year
• Reasons for success:
• Decisions not politically based
• Internet is distributed in operations
• Open standards, free (or inexpensive) software
• Easy to operate

1.2.3 Internet Protocols

Internet Protocols are used to transfer data from one machine to the other. All computers on the Internet communicate with each other using the Transmission Control Protocol / Internet Protocol (TCP/IP). Thus, data is sent from the server to the client (and vice-versa) using TCP/IP.
Usually, the client is your browser and the server is a program running on a different computer. You use the browser on your computer (called the client machine in Internet lingo) to access the information on another computer (called the server machine). Interesting to note, this server machine can be located thousands of miles from your workplace!

There are variety of other protocols such as the File Transfer Protocol (FTP) used in FTP applications and the HyperText Transfer Protocol (HTTP) employed on the World Wide Web.

The File Transfer Protocol(FTP)

File Transfer Protocol is an robust method to transfer (download and send) files from one computer to the other on the Internet. When file size is large / you have multiple files to send across, email option would not right choice, FTP fits place very well. The expectation from FTP is
• promote file sharing
• efficiently and easy data transfer
• provide a common platform for file storages among different hosts
The HTTP - HyperText Transfer Protocol

HTTP provides set of instructions for accurate information exchange. Communication between the client (your browser) and the server (a software located on a remote computer) involves requests sent by the client and responses from the server.

The Telnet Protocol

The Telnet protocol allows you to connect to another machine. Once connected, your computer behaves like a terminal screen of distant machine and you can utilize all the resources on the remote system if you have the required permissions.

Some older Internet protocols

Protocols such as Gopher, Archie etc. were used extensively on the Internet. But now they have faded into oblivion; why? Thanks to the WWW.

The Email Protocol

Email is the most used application on the Internet. Emails allow users to communicate with each other almost instantly across globe: Each email message consists of a header and a body. The header contains the following information:
• Recipient email address
• Senders email address
• Email address of the people to whom a carbon copy (Cc) and blind carbon copy (Bcc) has been sent.
• The subject line
• The main text message resides in the email body.
1.2.4 The Way Internet Works

The most peculiar thing about the Internet is the way data is transferred from one computer to another. This is what happens with every type of data (e.g. a Web page / email) when it is transferred over the Internet:
• It is broken up into a whole lot of same-sized pieces (technically speaking packets).
• A header is added to each packet. This header explains the source and destination of the packet. Also how it will fit among the rest of the packets.
• Each packet is sent from computer to computer until it finds its way to its destination. Each computer along the way decides where next to send the packet. This could depend on number of things like how busy the other computers are when the packet was received. It is quite possible that packets may not all take the same route, but reach safely!
• At the destination, the packets are examined. In case any packet is missing or damaged, a message is sent asking for those packets to be resent. This continues until all the packets have been received intact and message/data is complete.
• The packets are reassembled into their original form and presented further.
Fig 1.3 depicts routing of packets in Internet.
Fig 1.3 Working of Internet

Internet uses TCP/IP protocol for communication. TCP/IP (Transmission Control Protocol/Internet Protocol) specifies how computers connect, send, and receive information where as IP specifies how packets are routed between two computers. Message is split into IP packets which contain following information:

• Pieces of message
• Information about sender
• Information about receiver
• Sequence number
• Error checking information

When the packet has been received by destination computer, it reassembles the message. In case any packet gets corrupted receiving computer sends request for corrupt packets back to the sender. Advantages of using packets can be summarized as follows:

• Error recovery
• Load distribution
• Flexibility

Fig 1.4 Delivery of E-mail on Internet

Above figure (fig 1.4) explains, detailed path email message takes from one computer to another.

Overview of TCP/IP and its Services

TCP/IP stands for Transmission control protocol/Internet Protocol. Some of the protocols that compose TCP/IP are:

1. Network Access Layer

TCP/IP design hides the function of this layer from users—its main objective is draw best route to push traffic across using mode of physical network (i.e. Ethernet, Optical, Token Ring, etc.). The functions performed at this level include encapsulating the IP datagram’s into frames that are transmitted by the network. It also maps the IP addresses to the physical addresses used by the network. Thanks to TCP/IP addressing scheme it is possible to uniquely identify every computer on the network. This IP address is converted into whatever address is appropriate for the physical network over which the datagram is transmitted.
2. Inter-network Layer

The famous TCP/IP protocol at the inter-network layer is the Internet Protocol.
IP provides the basic packet delivery service for all TCP/IP networks. An important concept: In addition to the physical node addresses used at the network access layer, the IP protocol implements a system of logical host addresses called IP addresses. IP addresses are used by the inter-network and higher layers to identify devices and to perform inter-network routing. The Address Resolution Protocol (ARP) is used to identify the physical address that matches a given IP address. IP is used by all protocols in the layers above and below it to deliver data. This implies all TCP/IP data flows through IP when it is sent and received, regardless of its final destination.

3. Internet Protocol

It’s like putting traditional postal mail and dropping in post-box. Although destination address is written but you are not 100% sure, it would reach destination.

Similarly, IP is a connectionless protocol. This implies ‘IP’ do not exchange control information (called a handshake) to establish an end-to-end connection before transmitting data. In other hand, connection-oriented protocol exchanges control information with the remote computer to verify that it is ready to receive data before sending it. When the handshaking is successful, the computers are said to have established a connection.
IP protocol relies on protocols in other layers to establish the connection in case connection-oriented services are required. IP also relies on protocols in another layer to provide error detection and error recovery. As it doesn’t contain any error detection or recovery code it is sometimes called an unreliable protocol.

The functions performed at this layer are as follows:
1. Define the datagram, which is the basic unit of transmission in the Internet.
2. Define the Internet addressing scheme.
3. Move data between the Network Access Layer and the Host-to-Host Transport
Layer.
4. Route datagram’s to remote hosts.
5. Fragment and reassemble datagram’s.
4. Transport Layer (host-to-host layer)

The protocol layer just above the inter-network layer known as host-to-host layer.
Its like “referee” in playground, i.e layer is responsible for end-to-end data integrity. The two most important protocols employed at this layer are the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). TCP provides reliable, full-duplex connections and reliable service by ensuring that data is resubmitted when transmission results in an error (end-to-end error detection and correction). TCP enables hosts to maintain multiple, simultaneous connections. UDP provides unreliable datagram service (connectionless) that enhances network throughput at the host-to-host transport layer. Both protocols deliver data between the application layer and the inter-network layer.

a. User Datagram Protocol: The User Datagram Protocol gives application programs direct access to a datagram delivery service, like the delivery service that IP provides. It is an unreliable, connectionless datagram protocol. "Unreliable" merely means that the protocol has no technique for verifying that the data reached the other end of the network correctly. Within your computer, UDP will deliver data correctly. When data to be transmitted in small, this protocol is ideal, moreover the overhead of creating connections and ensuring reliable delivery may be greater than the work of retransmitting the entire data set.
b. Transmission Control Protocol: Applications that require the host-to-host transport protocol to provide reliable data delivery use TCP as it verifies that the data is delivered across the network accurately and in the proper sequence. TCP is a reliable, connection-oriented, byte-stream protocol.

5. Application Layer

Widely known and implemented TCP/IP application layer protocols are FTP, HTTP, Telnet, HTTP, Simple Mail Transfer Protocol (SMTP). In addition to widely known protocols, the application layer includes the following protocols:
• Domain Name Service (DNS). Also called name service; this application maps IP addresses to the names assigned to network devices.E.g. courtesy DNS service we don’t have to memories ip address of websites !
• Routing Information Protocol (RIP). Routing is central to the way TCP/IP works. RIP is used by network devices to exchange routing information.
• Simple Network Management Protocol (SNMP). Its asset to any NMS (network management station). Its protocol used to collect management information from network devices.
• Network File System (NFS). Working on SUN platform is not child’s play, one need special skills to work on this platform. NFS is system developed by Sun Microsystems that enables computers to mount drives on remote hosts and operate them as if they were local drives.

1.2.5 Noteworthy Stats about internet

Just have a look on the figures below which provide insight to the way Internet is growing. Fig 1.5 shows the growing statistics of Internet users in different parts of the world

Fig 1.5 Internet Users

Fig 1.6 shows the growing rate of Internet users in Asia

Fig 1.5 Internet growth in different countries in Asia

1.2.6 Language of Internet

Does Internet have any language?

Internet World Stats presents its latest estimates for Internet Users by Language because of the importance of this research, and due to the lack of other sources. Internet World Stats publishes several tables and charts featuring analysis and details here for the top ten languages and also for the top three languages in use by Internet users

Fig 1.7 Top 10 languages of Internet

1.2.7 Internet Management

Can you guess among biggies Bill Gates, Larry Page, and Jack Welch, who owns internet. Well none of them !! There is no central control, administration, or management of the Internet, while this is generally true, there are some noteworthy organizations that work day n night together in a relatively well structured and roughly democratic environment to collectively participate in the research, development, and management of the Internet.

Internet management organizations are described in the following sections, where the ASO, CCNSO, and GNSO are part of the ICANN:
• ISOC -- Internet Society
• IAB -- Internet Architecture Board
• IETF -- Internet Engineering Task Force
• IRTF -- Internet Research Task Force
• ICANN - Internet Corporation For Assigned Names And Numbers
• IANA -- Internet Assigned Numbers Authority
• NSI -- Network Solutions
• Accredited Domain Name Registrars.
Also couple of other organizations play a role in the management of the Internet is listed below:

Other Internet Organizations
• W3C -- World Wide Web Consortium
• Create A Usenet 8 newsgroup
• Create A Usenet Alt newsgroup
• Find IRC networks
• Find MUD servers
• Find mailing lists.

1.3 URL
URLs - What is an URL?

URL stands for Uniform Resource Locator, which means it is a uniform (same throughout the world) way to locate a resource (file or document) on the Internet. The URL specifies the address of a file and every file on the Internet has a unique address. Web software, such as your browser, uses the URL to retrieve a file from the computer on which it resides.

The actual URL is a set of four numbers separated by periods. An example of this would be 202.147.23.8 but as these are difficult for humans to use, addresses are represented in alphanumeric form that is more descriptive and easy to remember. Thus, the URL of site which is URL 209.62.20.192 can also be written as www.justdail.com.

Try this

On the computer command prompt window try below stuff (of course when connected to internet  :-

In above exercise you are trying to look details for web portal www.justdelhi.com. In first cases you are writing the numeric address (IP address) and in the second case using the domain name. In both cases you are sending packets to the same destination. Hence, at backend web portal www.justdelhi.com has unique identity which is its Public IP address 209.62.20.192 (probably entry at DNS server: Domain Name Server)
Also, how to locate IP address of website: check out following URL and lookout for IP address tab [ http://www.networksolutions.com/whois-search/justdail.com]

URL Format :

Protocol://site address/path/filename

Let’s take example to understand it better,
URL of company site is: http://www.google.com/
One of page on this site will be addressed as:
http://www.google.com/intl/en-GB/mobile/default/mail.html

The above URL consists of:

• Protocol: http
• Host computer name: www
• Domain name: google
• Domain type: com
• Path: /intl/en-GB/mobile/default
• File name /mail.html

Protocol can be defined as ‘set of rules’.
Protocols of the Transport Layer of the Internet Protocol Suite, usually the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP), but also other protocols, use a numerical identifier for the data structures of the endpoints for host-to-host communications. Such an endpoint is known as a port and the identifier is the port number.

http protocol works on port 80.

Domain Name

Site address consists of the host computer name, the domain name and the domain type.
Such names should be descriptive for easy comprehension and is usually the name of the organization or company. Noteworthy examples

• com: specifies commercial entities
• net: highlights networks
• org: organizations (usually non-profit)
• edu: colleges and universities (education providers)
• gov: government organizations
• mil: military entities of the United States of America

Also these days, localized domains are more famous. It could represent country
in a two-letter extension standardized by the International Standards Organization as ISO 3166. Couple of country specific codes are

in: India
sg: Singapore
bd: Bangladesh
cu: Cuba
dk:denmark
cn: China
uk: United Kingdom

Shruti Speak's

Search This Blog

Understanding WWW

Labels

Comments

Popular posts from this blog

Inter-Organizational Value Chain

Big-M Method and Two-Phase Method