Problem-solving starts before you have even installed your system. Because a
detailed knowledge of your system is important to figuring out what's causing
problems, you need to keep track of your system from the very beginning. One
most effective problem-solving tool costs about $2 and can be found in grocery
stores, gas stations, and office supply stores. Interestingly enough, I cant
remember ever seeing it in a store that specialized in either computer hardware
or software. I am talking about a notebook. Although a bound notebook will do
the job, I find a loose-leaf notebook to be more effective because I can add
pages more easily as my system develops.
In the notebook I
include all the configuration information from my system, the make and model of
all my hardware, and every change that I make to my system. This is a running
record of my system, so the information should include the date and time of the
entry, as well as the person making the entry. Every time I make a change, from
adding new software to changing kernel parameters, should
be recorded in my log book.
In putting together your notebook, don't be terse with comments like, "Added SCSI
patch and relinked." This should be detailed, like, "Added patch for Adaptec AIC-7xxx.
Rebuild and reboot successful." Although it seems like busy work, I also believe
things like adding users and making backups should be logged. If messages
appear on your system, these, too, should be recorded with details of the
circumstance. The installation guide should contain an "installation checklist."
I recommend that you complete this before you install and keep a copy of this in
the log book.
Something else that's very important to include in the notebook
is problems that you have encountered and what steps were necessary to correct
that problem. One support engineer with whom I worked told me he calls this his
As you assemble your system, write down everything you can about the hardware
components. If you have access to the invoice, a copy of this can be useful for
keeping track of the components. If you have any control over it, have your
reseller include details about the make and model of all the components. I have
seen enough cases in which the invoice or delivery slip contains generic terms
like Intel 800Mhz CPU, cartridge tape drive, and 5GB hard
disk. Often this doesn't even tell you whether the hard disk is
Next, write down all the settings of all the
cards and other hardware in your machine. The jumpers or switches on hardware
are almost universally labeled. This may be something as simple as J3 but as
detailed as IRQ. Linux installs at the
defaults on a wide range of cards, and generally there are few conflicts unless
you have multiple cards of the same type. However, the world is not perfect and
you may have a combination of hardware that neither I nor Linux developers has
ever seen. Therefore, knowing what all the settings are can become an
One suggestion is to write this information on gummed
labels or cards that you can attach to the machine. This way you have the
information right in front of you every time you work on the machine.
Many companies have a "fax back" service
in which you can call a number and have them fax you documentation of their
products. For most hardware, this is rarely more than a page or two. For
something like the settings on a hard disk, however, this is enough. Requesting
faxed documentation has a couple of benefits. First, you have the phone number
for the manufacturer of each of your hardware components. The time to go hunting
for it is not when your system has crashed. Next, you have (fairly) complete
documentation of your hardware. Last, by collecting the information on your
hardware, you know what you have. I cant count the number of times I have talked
with customers who don't even know what kind of hard disk they have, let alone
what the settings are.
Another great place to get technical information is
the World Wide Web. I recently bought a SCSI
hard disk that did not have any documentation. A couple
of years ago, that might have bothered me. However, when I got home, I quickly
connected to the Web site of the driver manufacturer and got the full drive
specs, as well as a diagram of where the jumpers are. If you are not sure of the
company's name, take a guess, as I did. I tried www.conner.com, and it
worked the first time.
When it comes time to install the
operating system, the first step is to read the release notes and installation
HOWTO and any
documentation that comes with your distribution. I am not suggesting reading
them cover to cover, but look through the table of contents completely to
ensure that there is no mention of potential conflicts with your
host adapter or the particular way your video card needs to
be configured. The extra hour you spend doing that will save you several hours
later, when you cant figure out why your system doesn't reboot when you finish
As you are actually doing the installation,
the process of documenting your system continues. Depending on what type of
installation you choose, you may or may not have the opportunity to see many of
the programs in action. If you choose an automatic installation, many of the
programs run without your interaction, so you never have a chance to see and
therefore document the information.
The information you need to document are
the same kinds of things I talked about in the section on finding out how your
system was configured. It includes the hard disk geometry and partitions
(fdisk), file systems (mount and
/etc/fstab), the hardware settings
(/var/log/messages), and every patch you have ever
installed. You can send the
output to all of these commands to a file that you can print out and stick in
I don't know how many times I have said it and how many articles (both mine and
others) in which it has appeared, some people just don't want to listen. They
often treat their computer systems like a new toy at Christmas. They first want
to get everything installed that is visible to the outside world, such as
terminals and printers. In this age of "Net-in-a-box," often that extends to
getting their system on the Internet as soon as possible.
Although being able to download the synopsis of the next Deep Space Nine
episode is an honorable goal for some, Chief O'Brien is not going to come to
your rescue when your system crashes. (I think even he would have trouble with
the antiquated computer systems of today.)
Once you have finished installing the operating system,
the very first device you need
to install and configure correctly is your tape drive. If you don't have a tape
drive, buy one! Stop reading right now and go out and buy one. It has been
estimated that a "down" computer system costs a company, on the average, $5,000
an hour. You can certainly convince your boss that a tape drive that costs
one-tenth as much is a good investment.
One of the first crash
received while I was in tech support was from the system administrator
at a major airline. After
about 20 minutes, it became clear that the situation was hopeless. I had
discussed the issue with one of the more senior engineers who determined that
the best course of action was to reinstall the OS and restore the data from
I can still remember their system administrator
saying, "What backups? There are no backups."
"Why not?" I asked.
"We don't have a tape drive."
"My boss said it was too expensive."
At that point the only solution was data recovery service.
"You don't understand," he said. "There is more than $1,000,000 worth of
flight information on that machine."
"Not any more."
What is that lost data worth to you? Even
before I started writing my first book, I bought a tape drive for my home
machine. For me, it's not really a question of data but rather, time. I don't
have that much data on my system. Most of it can fit on a half-dozen floppies.
This includes all the configuration files that I have changed since my system
was installed. However, if my system was to crash, the
time I save restoring everything from tape compared to reinstalling
from floppies is worth the money I spent.
As technology progressed, CD writers became cheaper than tape drives. Current
I make backups onto CD-ROMS of the most important data, so I can get to it
quickly, but I use my tape drive to backup all of the data and system
files, as it won't fit on a floppy.
thing to do once the tape drive is installed is to test it. The fact that it
appears at boot says
nothing about its functionality. It has happened enough that it appears to work
fine, all the commands behave correctly, and it even looks as though it is
writing to the tape. However, it is not until the system goes down and the data
is needed that you realize you cannot read the tape.
I suggest first trying the tape drive by backing up a small
subdirectory, such as /etc. There
are enough files to give the tape drive a quick workout, but you don't have to
wait for hours for it to finish. Once you have verified that the basic utilities
work (like tar or cpio), then try backing up the entire system. If you don't
have some third-party back-up software, I recommended that you use cpio.
Although tar can back up most of your system, it cannot backup
Linux commands are too cumbersome (and they are for many newcomers), a couple of
commercial products are available. One such product is Lone-Tar from Cactus
International. I have used Lone-Tar for years on a few systems and have found it
very easy to use. The front end is mostly shell scripts that you can modify to
fit your needs.
In general, Lone-Tar takes a differential approach to
making backups. You create one Master Backup and all subsequent backups contain
those files that have changed since the master was created. I find this the best
approach if your master backup takes more than one tape. However, if it all fits
on one tape, you can configure Lone-Tar always to do masters.
Cactus also produces several other products for Linux, including Kermit, and
some excellent DOS
tools. I suggest you check them out. Demo versions are available from the
cactus Web site.
Like religion, it's a matter of personal preference. I use Lone-Tar for
Linux along with their DOS
Tar product because I have a good relationship with the company
president, Jeff Hyman. Lone-Tar makes
backups easy to make and easy to restore. There is even a Linux demo on the Lone-Tar Web site. The Craftworks distribution has a
demo version of the BRU backup software.
After you are sure that
the tape drive works correctly, you should create a boot/root floppy. A
boot/root floppy is a pair of floppies that you use to
boot your system. The first floppy contains the necessary
files to boot and the root floppy contains the root file system.
Now that you are sure that your tape drive
and your boot/root floppy set work, you can begin to install the rest of your
software and hardware. My preference is to completely install the rest of the
software first, before moving on to the hardware. There is less to go wrong with
the software (at least, little that keeps the system from booting) and you can,
therefore, install several products in succession. When installing hardware, you
should install and test each component before you go on to the next one.
I think it is a
good idea to make a copy of your kernel
source (/usr/src/linux) before you make any changes to your
hardware configuration or add any patches. That way, you can quickly restore the
entire directory and don't have to worry about restoring from tape or the
I suggest that
you use a name that is clearer than /usr/src.BAK. Six months after you create
it, you'll have no idea how old it is or whether the contents are still valid.
If you name it something like /usr/src.06AUG95, it is obvious when it was
Now, make the changes and test the new
kernel. After you are sure that the new kernel
works correctly, make a new copy of the kernel source
and make more changes. Although this is a slow process, it does limit the
potential for problems, plus if you do run into problems, you can easily back
out of it by restoring the backup of the link kit.
As you make the changes, remember to record all the hardware and software
settings for anything you install. Although you can quickly restore the
previous copy of the kernel source if
something goes wrong, writing down the changes can be helpful if you need to
call tech support or post a message to the Internet.
Once the system is configured the way you want, make a backup
of the entire installed system on a different tape than just the base
operating system. I like to
have the base operating system
on a separate tape in case I want to make some major revisions to my
software and hardware configuration. That way, if something major goes wrong, I
don't have to pull out pieces, hoping that I didn't forget something. I have a
known starting point from which I can build.
At this point, you should come up with a back-up schedule. One of the first
things to consider is that you should backup as often as necessary. If you can
only afford to lose one days worth of work, then backing up every night is fine.
Some people back up once during lunch and once at the end of the day. More often
than twice a day may be too great a load on the system. If you feel that you
have to do it more often, you might want to consider
or some other level of RAID.
The latest kernel
versions support RAID 0 (disk striping), which, although it
provides an improvement in performance, has no redundancy. Currently, I am not
aware of any software RAID solutions, though some hardware solutions might work
The type of backup you
do depends on several factors. If it takes 10 tapes to do a backup, then doing a
full backup of the system (that is, backing up everything) every night
is difficult to swallow. You might consider getting a larger tape drive. In a
case where a full backup every night is not possible, you have a few
First, you can make a list of the directories that
change, such as /home and
/etc. You can then use tar just to
backup those directories. This has the disadvantage that you must manually find
the directories that change, and you might miss something or back up too
Next, there are incremental
backups. These start with a master, which is a backup of the entire system. The
next backup only records the things that have changed since the last
incremental. This can be expanded to several levels. Each level backs up
everything that has changed since the last backup of that or the next lower
For example, level 2 backs up everything since the last level 1 or the last level
0 (whichever is more recent). You might do a level 0 backup once a month (which
is a full backup of everything), then a level 1 backup every Wednesday
and Friday and a level 2 backup every other day of the week. Therefore, on
Monday, the level 2 will back up everything that has changed since the level 1
backup on Friday. The level 2 backup on Tuesday will back up everything since
the level 2 backup on Monday. Then on Wednesday, the level 1 backup backs up
everything since the level 1 backup on the previous Friday.
At the end of the month, you do a level 0 backup that backs
up everything. Lets assume this is on a Tuesday. This would normally be a level
2. The level 1 backup on Wednesday backs up everything since the level 0 backup
(the day before) and not since the level 1 backup on the previous Friday.
A somewhat simpler scheme uses differential backups.
Here, there is also a master. However, subsequent backups will record
everything that has changed (is different) from the master. If you do
a master once a week and differentials once a day, then something that is
changed on the day after the master is recorded on every subsequent backup.
A modified version of the differential backup does a complete, level 0 backup
on Friday. Then on each of the other days, a level 1 backup is done. Therefore,
the backup Monday-Thursday will backup everything since the day before. This is
easier to maintain, but you may have to go through five tapes.
The third type, the simplest method, is where you do a master backup every day
and forget about increments and differences. This is the method I prefer if the
whole system fits on one tape because you save time when you have to restore
your system. With either of the other methods, you will probably need to go
through at least two tapes to recover your data, unless the crash occurs on the
day after the last master. If you do a full backup every night, then there is
only one backup to load. If the backup fits on a single tape (or at most, two),
then I highly recommend doing a full backup every night. Remember that the key
issue is getting your people back to work as soon as possible. The average
$5,000 per hour you stand to loose is much more than the cost of a large (8Gb)
This brings up another issue, and that is rotating tapes. If
you are making either incremental or differential backups, then you
must have multiple tapes. It is illogical to make a master then make an
incremental on the same tape. There is no way to get the information from the
If you make a master backup on the same tape very
night, you can run into serious problems as well. What if the system crashes in
the middle of the backup and trashes the tape? Your system is gone and so is
the data. Also, if you discover after a couple of days that the information in a
particular file is garbage and the master is only one day old, then it is
worthless for getting the data back. Therefore, if you do full backups every
night, use at least five tapes, one for each day of the week. (If you run seven
days a week, then seven tapes is likewise a good idea.)
You don't necessarily always have to back up to tape. If the
amount of data that changes is fairly small, you could backup to floppies. This
is probably only valid if your system is acting as a Web server and the data
change at irregular intervals. As with any backup, you need to weigh the time to
recreate the data against the time to make the backup. If your data on the Web
server is also stored elsewhere (like on the development machine), it may be
easier to back up the Web server once after you get your configuration right,
and then skip the backups. However, it's your call.
Other choices for backup media include WORM (Write Once/Read
Many) drive and CD-Recordable. This is only effective if the data isn't going to
change much. You could back up your Web server to one of these media and then
quickly recovered it if your machine crashes. Copy the data to another machine
on the network where a backup is done. (You could also
mount the file system you want to back up via NFS.
Although most people get this far in
thinking about tapes, many forget about the physical safety of the tapes. If
your computer room catches fire and the tapes melt, then the most efficient
backup scheme is worthless. Some companies have fireproof safes in which they
keep the tapes. In smaller operations, the system administrator can take the
tape home from the night before. This is normally only effective when you do
masters every night. If you have a lot of tapes, you might consider companies
that provide off-site storage facilities.
Although some commercial products are available (which I will get
into in a moment), you can use the tools on your system. For example, you can
use tar or cpio. Although
tar is a bit easier to use, cpio
does have a little more functionality. The tar
command has the following basic format:
An example might be
This example would back
up /home and /etc
and write them to the floppy tape device /dev/fd0. The
c option says to create an archive,
v is verbose mode in which all the files are output to
stdout, and f says that
tar should output to the following file. In this
case, you are outputting to the device file /dev/fd0.
If you have a lot of directories, you can use the T
option to specify file containing the directories to backup. For example, if you
had file called file_list that contained the list of directories, the
command might look like this:
To extract files, the syntax is essentially the same, except
that you use the x option to extract. However, you can still
use both the f and T
The GNU version of tar (which comes
with most versions of Linux) has a very large number of options. One option I use is
z, which I use to either compress or uncompress the archive (depending on
which direction I am going). Because the archive is being filtered through
gzip, you need to have gzip on
your system. Although gzip is part of every Linux
distribution, it may not be on your system. Also, if you want to copy the
archive to another UNIX system, that system may not have gzip. Therefore, you
can either skip the compression or use the Z (notice the
uppercase) to use compress and
Although I can imagine situations in which they might
be useful, I have only used a few of them. The best place to look is the
If your backup media can handle more than one set of
backups, you can use the mt command to manage your tape
drive. Among the functions that the mt can do is to write a
"file mark," which is simply a marker on the tape to indicate the end of an
archive. To use this function, you must first back the backup to the no-rewind
tape device (for example, /dev/rft0). When the drive has written all of the
archive to the tape, write the file marker to indicate where the end is.
tar is complete and the tape device is closed, it rewinds. When you use
the no-rewind device, the tar process finishes, but the tape
does not rewind. You can then use the mt command to write
the file mark at the tapes current location, which is at the end of the
tar archive. Even if there are multiple archives on the single tape,
mt will find the specific location. Therefore, whenever you
need to restore, you can access any of the archives. See the mt
for more detail.