The Internet and the World Wide Web
Charles A. Cornell, Multi-Media Communications Internet Services
What is the Internet
Before you can understand what the World Wide Web is, you have
to understand what the Internet is, since the World Wide Web is
a subset of the Internet.
The Internet is a world-wide network of networks with gateways
linking organizations around the world. The organizations are
administratively independent from one another. There is no central,
worldwide, technical control point. Yet, working together these
organizations have created what to a user seems to be a single
virtual network that spans the globe.
On one level, the Internet appears to be a single switched network
that allows any computer attached to it to establish communications
with any other computer attached to it. The analogy that I like
best is to compare the Internet with the telephone system
world wide. The telephone system consists of many different networks
that have worked out how to cooperate with each other. This allows
you to place a call to your local telephone company and have it
handed off to a long distance company then perhaps to a foreign
telephone agency before reaching the party you called. You don't
have to worry about any of it because there are standards that
are adhered to by all the telephone networks participating in
the world wide telephone system.
The physical circuits that make this possible are provided by
government, educational, and commercial organizations. More and
more, Internet data is carried over the same networks that carry
telephone data. Each of the independent networks is managed by
its own administration. There is no one organization called "The
Internet". Contrary to popular understanding, the Internet
is not free. Each of the network organizations The networks all
use a common suite of networking protocols, TCP/IP, which stands
for Transmission Control Protocol/Internet
Protocol. It is because of this commonality of protocols,
this commonality of network functionality, and interoperability
that the networks provide what appears to be a seamless, integrated
virtual network, regardless of the heterogeneity of the underlying
computer hardware or communications transport .
How Do I Connect to the Internet
Up until a few years ago, only organizations such as universities and large
companies had direct connections to the Internet. Now it is increasingly common
for individuals to connect their personal computers to the Internet by signing
up with a commercial Internet access provider. These commercial access vendors
provide dial in modems that allow a user to use the standard telephone connection
to dial into the access provider's modems. The access provider then assigns
an IP address from a block assigned to them, thus connecting the user's system
to the Internet. Most commercial, government, and educational organizations
have dedicated line connections to their Internet access provider. These connections
range from low speed 56K or 28.8K BPS dial up modems to 1.5M BPS, and higher
over dedicated line.
All of the computers attached to the Internet use the TCP/IP protocol
to send messages between various application programs. You really
don't need to know what TCP/IP is to use it, any more than you
need to know what the telephone switching protocols are to make
a telephone call. The TCP part of TCP/IP defines how packets of
information are organized so that they can be safely transported
over the physical networks. If you want to ship something by truck
and the thing is bigger than a truck, you have to disassemble
it, and then provide instructions so that whoever you are sending
it to can put it back together correctly. That is what TCP does
to the data going over the Internet. It defines how it is to be
broken up into standard sized data packets, and how those packets
are to be put back together on the receiving end.
To be attached to the Internet, a system must have an IP address.
To go back to our telephone analogy, IP addresses are like the
main number for a company or organization. IP addresses are 32
bit numbers that are written as four numbers between 0-255 separated
by periods. For example, 127.0.0.1, 199.173.190.2 are IP addresses.
To be connected to the Internet, an organization must apply for
a registered IP address from a central agency called the Internet
Address Naming Authority (IANA). This address is the most significant
part of the IP address of all machines in the organization network
accessible by the Internet. The organization picks a human understandable
name called the Domain name which is registered with IANA so it
can be associated with the IP address. The domain name ends with
a suffix which indicates the type of organization or its geographic
location for international organizations. The following are some
examples of domain names:
- mmcis.com - commercial organization
- mnu.edu - educational organization
- nasa.gov -federal government
- internic.net - network provider
- uniforum.org - organization
- vmark.co.uk - commercial organization in UK
Typically, the domain name is associated with the first 2 or 3
numbers in the IP address which allows the organization to assign
individual system addresses in its domain with the last 2 or 1
numbers. So if a company is assigned the domain name abc.com which
has associated with it a Type C IP address of 111.222.333 then
it can assign 255 systems their own IP addresses using the last
number. So valid IP addresses in that domain would be 111.222.333.1,
111.222.333.2, through 111.222.333.255.
Domain Name Service
Just as the telephone system provides directory services so that
you can find out what the telephone number of a person or company,
the Internet provides directory services that return the IP address
for machines in registered domains. Each Internet access provider
is responsible for maintaining Domain Name service information
for all the addresses assigned to it.
What Can I Do On the Internet
The last piece of the picture we need to talk about before we
can actually use the Internet is the idea of clients and servers.
The TCP/IP protocols make it possible to electronically connect
any two systems on the Internet, but for them to do something
useful with that connection, they must have cooperating applications
running on them. On the telephone system I can have successful
transactions between two people using telephones or two people
using a fax machine. But a person on a telephone talking to a
fax machine is usually not successful. So it is on the Internet.
A successful transaction on the Internet depends on the system
requesting information (the client) using a compatible program
with the system delivering the information (the server).
Session Protocols
Between the low level packet and transport protocols (TCP/IP)
and the high level application protocols discussed below there
are a few standard session protocols that are supported by clients
and servers that are shipped with most implementations of TCP/IP
software. These include:
- ping - any system on the Internet can use the ping command
to test whether it can establish TCP/IP communications with any
other system. There is no way for a system to turn off responding
to a ping if TCP/IP is active
- finger - a user can use the finger command to get information
about a user or users on another system. This information can
include the users real name (as opposed to login name), and other
information that the user can provide in standard files in his
or her home account. A system administrator can decide whether
or not to run the finger daemon. If a system is not running the
finder daemon then other systems cannot obtain finger information
from it.
- telnet - the telnet command is used to login to another system
as if the user were on a locally attached character terminal.
The system accessed by telnet must be running a telnet daemon
in order for telnet to work. The remote user must have a valid
login name and password to successfully login. There are many
systems on the Internet which publish telnet information and allow
anyone on the net to login to the system.
- ftp - the Internet file transfer protocol allows a user on
one system to transfer files (bidirectionally if permissions allow
it) between the local system and a remote system.
Internet Applications
While there are many ways to use the Internet today, and many
more to be discovered or invented in the future, the following
are the primary ways that people use the Internet today.
mail
You can use various editors and mail front end programs to send
a email message to anyone who has an account on any system that
is registered on the Internet. There are people on the Internet
who maintain mailing lists of users who are interested in a particular
subject so that items of interest to that group can be mailed
to everyone by using the list alias as the addressee on an email
message.
news
News is a method for collecting and disseminating messages (called
posts) that are grouped into a hierarchy of subject interests.
Each special interest news group must have a system that acts
as the collector of news items for that group. People post items
of interest to the news group by using a news reader to send their
message (via the Internet address of the collector) to collector
system. The collector system periodically broadcasts the latest
items for that news group to systems that are designated as news
servers. Most Internet access providers provide news servers.
Internet subscribers specify what news groups they want to receive.
All the items in those news groups are transferred as files in
a hierarchy of directories that maps the special interest groups.
Individual users can then use one of the publicly available news
reader programs to select which of the locally available groups
they want to read. Most readers also allow a user to reply to
a post, thus creating a news thread which is a series of posts,
replies to posts, replies to replies, etc., on a particular subject
in the special interest group. News groups are named by a series
of hierarchical nodes separated by periods (.). A group that contains
discussion of Silicon Graphics hardware related stuff is comp.sys.sgi.hardware,
which is pronounced comp dot sys dot sgi dot hardware. News groups
range from fairly serious and technical ones like comp.os.ms-windows.programmer.win32
to the humorous like rec.humor.funny to the frivolous like alt.silly-group.radish-therapy.
electronic distribution
There are many systems on the Internet that maintain file servers
called ftp servers. These systems allow a user to use the ftp
command to login to the system as a user named anonymous. The
user can then download (or upload as permitted by the server system)
files from the server. These files may be textual material, computer
programs, or multimedia documents. An indexing protocol that uses
a client called gopher is often used to index files that are on
ftp servers. World Wide Web servers are rapidly replacing the
rudimentary ftp servers as the protocol used to access and index
these files. There are literally hundreds of thousands of files
that can be accessed this way. Some of the available servers are:
- ftp.microsoft.com - contains product announcement, bug fixes,
knowledge bases, and other Microsoft related information
- wx.atmos.uiuc.edu - University of Illinois at Urbana weather
information gopher server. Contains up to the minute National
Weather Service forecasts, and multimedia satellite maps.
- english-server.hss.cmu.edu - White House papers.
electronic publishing
In addition to many academic journals which are made available
on the Internet via gopher, ftp, or WWW servers, there is an ever
growing number of commercial publishing efforts getting started.
All of these new efforts use World Wide Web technology.
Finally, What is the Web?
The World Wide Web (from here on called simply the Web), is that part of the Internet using a session protocol called http (HyperText Transport Protocol).
The Web merges the techniques of networked information and hypertext to make an easy but powerful global information system. The project represents any information accessible over the network as part of a seamless hypertext information space.
The Web was originally developed CERN (the European High Energy Research Center in Geneva) to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups. Originally aimed at the High Energy Physics community, it has has become the "killer app" of the Internet, spurring explosive growth. It is currently the most advanced information system deployed on the Internet, and embraces within its data model most information in previous networked information systems. In fact, the Web is an architecture which will also embrace any future advances in technology, including new networks, protocols, object types and data formats.
Clients (information readers or browsers) and server (information suppliers)
applications for many platforms exist and are under continual development.
The Information Reader's View
The Web consists of documents, and indexes. Indexes are special documents which,
rather than being read, may be searched. The result of such a search is another
("virtual") document containing links to the documents found. A simple protocol
(HyperText Transport Protocol or HTTP) is used to allow a browser program to request
a keyword search by a remote information server. There are many vendors providing
Web browsers. The most popular are Netscape Navigator by Netscape Communications
and Internet Explorer by Microsoft.
The Web contains documents in many formats. The most effective type of documents are called hypermedia documents which contain links to other documents, or places within documents. All documents, whether real, virtual or indexes, look similar to the reader and are contained within the same addressing scheme. Documents can contain graphic images, sounds, full motion video segments, as well as text.
To follow a link, a reader clicks with a mouse. To search an index, a reader gives keywords (or other search criteria). These are the only operations necessary to access the entire world of data. In addition to the display of hypermedia data, the Web clients support on-line forms which can transmit data from the client to a server, making the Web an interactive medium.
The Information Supplier's View
To provide information to the Web, you must create the hypermedia documents, any supporting indices, and then install Web server software on a machine connected to the Internet. More sophisticated Web servers provide database support for both serving information and collecting information from forms filled in by clients. The Web model gets over the frustrating incompatibilities of data format between suppliers and reader by allowing negotiation of format between a smart browser and a smart server. This provides a basis for extension into multimedia, and allows those who share application standards to make full use of them across the Web. Hypermedia documents on the Web are formatted using a special markup language called HTML for HyperText Markup Language. An HTML document is a simple ASCII file that contains the plain text information along with special markup "tags" that tell the browser how to format the text and where to put what graphics, audio, video, or other media information. There are hundreds of tools for producing HTML documents from existing documents in word processing formats, as well as for authoring new documents directly. The most important markup tag in terms of the power of hypermedia documents is a link that associates another document with part of the text or graphics in a document. An HTML link in a document can point to a document anywhere on the Web.
As with Web browsers, there are many vendors offering Web server software.
The most popular are the freeware Apache Server which is used by many ISPs,
Internet Information Service (IIS) from Microsoft, Netscape's Enterprise Servers,
and O'Reilly' WebSite for NT.
How Do I Point To Information On the Web?
Information on the Web is addressed using a Uniform Resource Locator, or URL. A URL contains a protocol specification, the name of the system containing the information, and the pathname of the file on that server system. The URL http://www.mmcis.com/HomePage.html specifies to use http protocol to access a file named HomePage.html on the system named www.mmcis.com. The URL ftp://ftp.microsoft.com/deskapps/word/winword-public/ia/wordia.exe specifies to transfer the file wordia.exe in the directory path /deskapps/word/winword-public/ia/ on the system ftp.microsoft.com using the FTP protocol.
For the most part, readers never even see a URL. They simply click on specially marked text or graphics in an HTML document and the
browser generates the URL that is linked to that specially marked text or
graphic. The convention for servers on the Web is to name the main Web server
system www.domain.name, and to provide a default menu so that a simple
URL like
http://www.mmcis.com,
or
http://www.whitehouse.gov
will give a reader access to the top of the document tree stored on that organization's Web servers.
Most Web browsers also ship with built in links to several different indexing servers on the Internet so that even the most inexperienced net surfer can plunge in to the Web without a steep learning curve.
What Next?
If you have read this far, you obviously find this Web thing fascinating. The
next thing you need to do is to get yourself an Internet connection and start
exploring the Web. To find out how World Wide Web technology can be a strategic
tool for any organization that has to publish information to a large number of
people, or that wants to provide an interactive medium with a dispersed group
of people, contact Multi-Media Communications. Email
us at cac@mmcis.com, leave us an action
request on our Web site, or even call us at (508) 653-3392 and ask for Charlie
Cornell, and we will be glad to show you how to use the Web to your advantage.
Charles Cornell (cac@mmcis.com),, Multi-Media Communications Internet
Services




