If your system crashes, the most important thing is to prevent further
damage. Hopefully, the messages that you see on your screen will give you some
indication of what the problem is so that you can correct it. (For example,
timeout errors on your hard disk might mean it's time to buy a new hard
It may happen that the
you just experienced no longer allows you to boot
your system. What then? The easiest solution (at least the easiest in terms of figuring out what
to do) is reinstalling. If you have a recent backup and your tape drive is
fairly fast, this is a valid alternative, provided there is no hardware problem
causing the crash.
In an article I once wrote, I compared a system crash
to an earthquake. The people who did well after the 1989
earthquake in Santa Cruz, Ca were those who were most prepared. The people who do
well after a system crash
are also those who are best prepared. As with an
earthquake, the first few minutes after a system crash
are crucial. The steps
you take can make the difference between a quick, easy recovery and a forced
In previous sections, I talked about the different kinds of
problems that can happen on your system, so there is no need to go over them
again here. Instead, I will concentrate on the steps to take after you reboot
your system and find that something is wrong. It's possible that when you reboot
all is well and it will be another six months before that exact same set of
circumstances occurs. On the other hand, your screen may be full of messages as
it tries to bring itself up again.
Because of the urgent nature of system
crashes and the potential loss of income, I decided that this was one
troubleshooting topic through which I would hold your hand. There is a set of
common problems that occur after a system crashes that need to be addressed.
Although the cause of the crash
can be a wide range of different events, the
result of the crash
is small by comparison. With this and the importance of
getting your system running again in mind, I am going to forget what I said
about giving you cookbook answers to specific questions for this one
Lets first talk about those cases in which you can no longer
boot at all. Think back to our discussion of starting and stopping the system
and consider the steps the system goes through when it boot.
I talked about them in detail before, so I will only review them here as necessary to describe the problems.
As I mentioned, when you turn on a computer, the first thing is
the Power-On Self-Test, or POST.
If something is amiss during the POST, you will
usually hear a series of beeps. Hopefully, there will be some indication on your
monitor of what the problem is. It can be anything from incorrect entries in
to bad RAM.
If not, maybe the hardware documentation mentions something about what the beeps mean.
When finished with the POST, the
computer executes code that looks for a device from which it can boot. On a
Linux system, this boot
device will more than likely be the hard disk. However,
it could also be a floppy or even a CD-ROM.
The built-in code finds the active
partition on the hard disk and begins to execute the code at the beginning of
the disk. What happens if the computer cannot find a drive from which to boot
depends on your hardware. Often a message will indicate that there is no
bootable floppy in drive A. It is also possible that the system will simply
If your hard disk is installed and it should contain valid
data, your master boot
block is potentially corrupt. If you created the
boot/root floppy set as I told you to do, you can use fdisk from it to recreate
table using the values from your notebook. Load the system from
your boot/root floppy set and run fdisk.
This is done from the hard disk.
With the floppy in the drive, you boot
your system. When you get to the Boot:
prompt, simply press Enter. After loading the kernel,
it prompts you to insert
the root file system floppy. Do that and press Enter. A short time later, you
are brought to a # prompt from which you can begin to issue commands.
you run fdisk, you will probably see an empty table. Because you made a copy of
table in your notebook as I told you to do, simply fill in the
values exactly as they were before. Be sure that you make the partition
as it was previously. Otherwise, you won't be able to boot,
or you could still
boot but you will corrupt your file system. When you exit fdisk, it will write
out a copy of the master boot
block to the beginning of the disk. When you
reboot, things will be back to normal.
(I've talked to at least one
customer who literally laughed at me when I told him to do this. He insisted
that it wouldn't work and that I didn't know what I was talking about. Fortunately
for me, each time I suggested it, it did work. However, I have worked
on many machines where it didn't work. With a success rate of more than 90
percent, it's obviously worth a try.)
Table - Files Used in Problem Solving
|/bin/pstat||Reports system information
|/bin/who||Lists who is on the system
|/bin/whodo||Determines what process each user is running
|/etc/badblock||Checks for bad spots on your hard disk
|/bin/rpm||Displays information about install packages (also used to install and remove software)
|/etc/df||Calculates available disk space on all mounted file|
|/etc/fdisk|| Creates and administers disk partitions
|/etc/fsck|| Checks and cleans file systems
|/etc/fuser|| Indicates which users are using particular files and file systems
|/etc/ifconfig|| Configures network interface parameters
|/etc/ps|| Reports information on all processes
|/usr/adm/hwconfig|| probe for hardware
|/usr/bin/cpio||Creates archives of files
|/usr/bin/last||Indicates last logins of users and teletypes
|/usr/bin/lpstat||Prints information about status of print service
|/usr/bin/netstat||Administers network interfaces
|/usr/bin/sar||Reports on system activity
|/usr/bin/tar||Creates archives of files
|/usr/bin/w||Reports who is on the system and what they are doing