Accessing the Web
It was difficult to decide where to put this topic. You can't have access to the Web without
networking, however, it looses much of it's impact unless you are using a graphical interface like
X. Because the Web is a network of machines accessed in a
common manner, I figured the networking chapter would be the best place to talk about it. I think
this is a good choice since, there are character based programs that do not require X.
So what is the Web. Well, as I just mentioned, it is a network
of machines. Not all machines on the Internet are part of the Web, but we can safely say that all
machines on the Web are part of the Internet. The Web is the shortened version of World Wide Web,
and as its name implies it connects machines all over the world.
Created in 1989 at the internationally renowned CERN research lab in Switzerland, the Web was
originally begun as a means on linking physicists from all over the world. Because it is easy to
use and integrate into an existing network, the Web has grown to a community
of tens of thousands of sites with millions of users accessing it. With the integration of Web
access software, on-line services have opened the Web up to millions of people who couldn't have
used it before.
What the Web really is, is a vast network
of inter-linked documents, or resources. These resources may be pure text,
but can include images, sound and even videos. The links between resources are made through the use
of the concept of hypertext. Now, hypertext is not something new. It has been used for
years in on-line help systems, for example, like those in MS-Windows' programs. Certain words or
phrases are presented in a different format (often a different color or maybe underlined). These
words or phrases are linked to other resources. When you click on them, the resource that is linked
is called up. This resource could be the next page, a graphics image, or even video.
Resources are loaded from their source by means of the hypertext transfer protocol,
In principle, this is very much like ftp,
in that resources are files that are transferred to the requesting site. It is then up to the
requesting application to make use of that resource, such as display and image
or playing an animation. In many cases, files are actually retrieved using ftp instead of HTTP and
the application simply saves the file on the local machine.
that is used to access the Web is called a Web browser. Web resources are provided by Web
Servers. A Web Servers is simply a machine running the HTTP daemon:
Like other network daemons, httpd receives requests from a Web
client (such as Mozilla or Konqueror) and sends it the requested resource.
Like the ftp
daemon, httpd, is a relatively secure means of allowing anonymous
access to your system. You can define a root
directory, which, like ftp, prevents users from going "above" the defined root directory. Access to
files or directories can be defined on machine basis and you can even provided password control over
When httpd starts, it reads its configuration files and begins listening for requests from a
document viewer (one that uses the HTTP protocol). When a document is
requested, httpd checks for the file relative to the DocumentRoot (defined in srm.conf).
Web pages are written in the Hypertext Markup Language (HTML). This is "plain-text" file that
can be edited by any editor, like vi. Recently, as a result of the increasing popularity of the Web,
dozens, if not hundreds of commercially available HTML editors have become available.
The HTML commands are similar, and also simpler, that those used by troff. In addition to formatting
commands, there are build in commands that tell the Web Browser to go out and retrieve a document.
You can also create links to specific locations (labels) within that document. Access to the
document is by means of a Uniform Resource Locator (URL).
There are several types of URLs that perform different functions. Several different program can
be used to access these resources such as ftp,
HTTP, gopher, or even telnet.
If you leave off the program name, the Web browser assumes that it refers to a file on your local
system. However, just like ftp or telnet you can
specifically make references to the local machine.
I encourage using absolute names like that as it makes transferring Web pages that much easier.
All that you need to access the Web is an Internet connection. If you can do
and telnet, then you can probably use the Web. So, assuming you have a Web browser and an Internet
connection. The question is where do you go? The question is comparable to "Given a unlimited value
plane ticket, where do you go on vacation?" The sky is the limit.
As I mentioned, the convention is that the Web server's machine name is www.domain.name.
To access their home page, the URL would be http://www.domain.name. For
example, to get to your home page, the URL is http://www.yourdomain.com. In order to keep
from typing so much, I will simply refer to the domain. name and you can expand
it out the rest of the way. In some cases, where the convention is not followed, I'll give you the
I remember when comet Schumaker-Levy 9 was making history by plowing into the backside of
Jupiter. The Jet Propulsion Laboratory has a Web site, on which they regularly updated the
images of Jupiter. I still remember my friends asking me if I had seen the "lastest" images.
If they were more than three hours old, I would shrug them off as ancient history.
If you are interested in free software (did I say the magic word?), check out
en.downloadastro.com You can download gigabytes worth of games and utilities and GIFs and
ray-traces and source code and full-text copies of Alice in Wonderland. Most of these are available
from sites spread out over the Internet. It's really nice to have them all it one place.
The issue of Usenet newsgroups opens up a whole can of worms. Without oversimplifying
too much, we could say that Usenet was the first, nation-wide on-line bulletin-board. Whereas the
more commercial services like CompuServe store their messages in a central location, Usenet is based
on the "store and forward" principle. That is, messages are stored on a message and forwarded to the
next at regular intervals. If those intervals are not all that often, it may be hours or even days
before messages are propagated to every site.
Messages are organized into a hierarchical, tree structure, very much like many things in
UNIX. (although you don't have to be running a UNIX machine to be accessing
Usenet. Groups range from things like rec.arts.startrek.fandom to alt.sex.bondage to
Although I would love to go into more details, this really goes beyond the scope of this book.
Instead, I would like to recommend Using UUCP and Usenet by Grace Todino
and Dale Dougherty, and Managing UUCP and Usenet by Tim O'Reilly and Grace Todino, both from
O'Reilly and Associates. In addition, there is a relatively new book that goes into more details
about how Usenet is organized, what newsgroups are available and some general information about
behavior and interaction with other when participating in a Usenet sendmail. This is Usenet
Netnews for Everyone by Jenny Fristrup, from Prentice Hall.
Note that the browser that is provided on the CD does not support many of the advanced features of
some of the commercials ones, such as Netscape. Because there is a Netscape version that runs on
Linux, I would suggest that you get it before you get too involved in the Web. You can get the
Netscape Navagator via anonymous FTP from ftp.netscape.com.