The Boot Process
The process of
turning on your computer and having it jump through hoops to bring up the
operating system is called booting, which derives from the term
bootstrapping. This is an allusion to the idea that a computer pulls
itself up by its bootstraps, in that smaller pieces of simple code start larger,
more complex pieces to get the system running.
The process a computer goes
through is similar among different computer types, whether it is a PC,
Macintosh, or SPARC Workstation. In the next section, I will be talking
specifically about the PC, though the concepts are still valid for other
The first thing that happens is the Power-On Self-Test (POST).
Here the hardware checks itself to see that things are all right. It compares
the hardware settings in the CMOS
(Complementary Metal Oxide Semiconductor) to
what is physically on the system. Some errors, like the floppy types not
matching, are annoying, but your system still can boot.
Others, like the lack of
a video card, can keep the boot process from continuing. Often, there is nothing to
indicate what the problem is, except for a few little "beeps."
Once the POST
completed, the hardware jumps to a specific, predefined location in RAM.
instructions located here are relatively simple and basically tell the hardware
to go look for a boot
device. Depending on how your CMOS
is configured, the
hardware first checks your floppy and then your hard disk.
When a boot
device is found (let's assume that it's a hard disk), the hardware is told to go
to the 0th (first) sector
(cylinder 0, head 0, sector 0), then load and execute
the instructions there. This is the master boot
record, or MBR
for you DOS-heads (sometimes also called the master boot
block.) This code is small enough to fit
into one block but is intelligent enough to read the partition
table (located just past the master boot
block) and find the active partition.
Once it finds the active partition,
it begins to read and execute the instructions contained within the first block.
It is at this point that viruses can
affect/infect Linux systems. The master boot
block has the same format for essentially all PC-based operating systems and it does
is find and execute code at the
beginning of the active partition. But if the master boot
block contains code
that tells it to go to the very last sector of the hard disk
and execute the code there, which then tells the system to execute code at
the beginning of the
active partition, you would never know anything was wrong.
Let's assume that the instructions at the
very end of the disk are larger than a single 512-byte sector.
If the instructions took up a couple of kilobytes, you could get some fairly
complicated code. Because it is at the end of the disk, you would probably never
know it was there. What if that code checked the date in the CMOS
and, if the
day of the week was Friday and the day of the month was 13, it would erase the
first few kilobytes of your hard disk? If that were the case, then your system
would be infected with the Friday the 13th virus, and you could no longer boot
your hard disk.
Viruses that behave in this way are called "boot
viruses," as they affect the master boot
block and can only damage your
system if this is the disk from which you are booting. These kinds of viruses
can affect all PC-based systems. Some computers will allow you to configure them
(more on that later) so that you cannot write to the master boot
block. Although this is a good safeguard against older viruses, the newer ones can
change the CMOS
to allow writing to the master boot
block. So, just because you have enabled this feature does not mean your system is safe.
However, I must point out that boot
viruses can only affect Linux systems if you boot from an
infected disk. This usually will be a floppy, more than likely a DOS
floppy. Therefore, you need to be especially careful when booting from floppies.
Now back to our story...
As I mentioned, the code in the master
boot block finds the active partition
and begins executing the code there. On an
MS-DOS system, these are the IO.SYS and MSDOS.SYS files. On an Linux system,
this is often the LILO or Linux loader "program." Although IO.SYS and
MSDOS.SYS are "real" files that you can look at and even remove if you
want to, the LILO program is not. The LILO program is part of the partition,
but not part of the file system; therefore, it is not a "real" file. Regardless of what
program is booting your system and loading the kernel, it is generally referred to as a
Often, LILO is installed in the master boot
block of the hard disk itself. Therefore, it will be the first code to run when your system is booted.
In this case, LILO can be used to start other operating systems. On one machine,
I have LILO start either Windows 95
or one of two different versions of Linux.
In other cases, LILO is installed in the boot
of a given partition. In this case, it is referred to as a "secondary" boot
loader and is used just to load the Linux installed on that partition.
This is useful if you have another operating system
such as OS/2 or Windows NT and you use the boot
software from that OS to load any others. However, neither of these
was designed with Linux in mind. Therefore, I usually have LILO loaded in the master boot
block and have it do all the work.
Assuming that LILO has been written to the master boot record and is, therefore, the master
boot record, it is loaded by the system BIOS into a specific memory location (0x7C00) and
The primary boot loader then uses the system BIOS to load the secondary boot loader into a
specific memory (0x9B000). The reason that the BIOS is still used at this point is that by
code necessary to access the hardware, the secondary boot loader would be extremely large
(at least by comparison to its current size.) Furthermore, it would need to be able to recognize and
access different hardware types such as IDE and EIDE, as well as SCSI, and so forth.
This limits LILO, because it is obviously dependant on the BIOS. As a result,
the secondary boot loader cannot access sectors on the hard disk that are above 1023. In fact,
this is a problem for other PC-based operating systems, as well. There are two solutions to this problem.
The original solution is simply to create the partitions so that the LILO and the secondary boot loader
are at cylinder 1023 or below. This is one reason for the moving the boot files into the /boot
directory which is often on a separate file system, that lies at the start of the hard disk.
The other solution is something called "Logical Block Addresses" (LBA). With LBA, the BIOS
"thinks" there are less sectors than there actually are. Details on LBA can be found in the
section on hard disks.
Contrary to common belief, it is actually the secondary boot loader that provides the prompt
and accepts the various options. The secondary boot loader is what reads the /boot/map file to
determine the location of kernel image to load.
You can configure LILO with a wide range of options. Not only can you boot
with different operating systems, but with Linux you can boot
different versions of the kernel
as well as use different root file systems. This is useful if you are a developer because you
can have multiple versions of the kernel
on a single system. You can then boot them and test your product in different environments.
We'll go into details about configuring LILO in the section on
Installing your Linux kernel.
In addition, I
always have three copies of my kernel
on the system and have configured LILO to
be able to boot
any one of them. The first copy is the current kernel
I am using. When I rebuild a new kernel
and install it, it gets copied to /vmlinuz.old, which is the second kernel
I can access. I then have a copy called
/vmlinuz.orig, which is the original kernel
from when I installed that
particular release. This, at least, contains the drivers necessary to boot
access my hard disk and CD-ROM.
If I can get that far, I can reinstall what I
Typically on newer Linux versions, the kernel is no longer stored in the root directory, but
rather in the /boot directory. Also, you will find that it is common that the version number of the
respective kernel is added onto the end. For example, /boot/vmlinuz.2.4.18, which would indicate
that this kernel is version 2.4.18. What is important is that the kernel can be located when
the system boots and not what it is called.
During the course of this writing this material, I often had more
than one distribution of Linux installed on my system. It was very useful to see
whether the application software provided with one release
was compatible with the kernel from a different distribution.
Using various options to LILO, I could boot one kernel
but use the root file system from a different version. This was also useful on at least one
occasion when I had one version that didn't have the correct drivers in the
kernel on the hard disk and I couldn't even boot it.
Once your system boots, you will see the kernel
being loaded and started. As it is loaded and begins to execute, you will see screens
of information flash past. For the uninitiated, this is overwhelming, but after
you take a closer look at it, most of the information is very straightforward.
Once you're booted, you can see this information in the file
/usr/adm/messages. Depending on your system, this file might
be in /var/adm or even
/var/log, although /var/log seems to be the most common, as of this writing. In the messages file, as well
as during the boot
process, you'll see several types of information that the
system logging daemon
(syslogd) is writing. The syslogd daemon
usually continues logging as the system is running, although you can turn it off if you
want. To look at the kernel messages messages after the system boots, you can use the
The general format for the entries is:
wheretime is the system time when the message
is generated, hostname
is the host that generated the message, program is the
program that generated the message, and message
is the text
of the message. For example, a message from the kernel
might look like this:
May 13 11:34:23 localhost kernel ide0: do_ide_reset: success
As the system is booting,
all you see are the messages themselves and not the other information. Most of
what you see as the system boots are messages from kernel,
with a few other
things, so you would see this message just as
ide0: do_ide_reset: success
Much of the information that the
writes comes from device
drivers that perform any initialization routines. If you have hardware problems
on your system, this is very useful information. One example I
encountered was with two pieces of hardware that were both
software-configurable. However, in both cases, the software wanted to configure
them as the same IRQ.
I could then change the source code and recompile so that
one assigned a different IRQ.
You will also notice the kernel
existing hardware for specific capability, such as whether an FPU is present,
whether the CPU
has the hlt (halt)
instruction, and so on.
What is logged and where it is logged is based on
the /etc/syslog.conf file. Each entry is
broken down into facility.priority,
where facility is the part of the system such as the kernel
or printer spooler
and priority indicate the severity of the message. The
facility.priority ranges from none, when no messages
are logged, to emerg, which represents
very significant events like kernel
panics. Messages are generally logged to one
file or another, though emergency messages should be displayed to everyone
(usually done by default). See the syslog.conf man-page
One last thing that the kernel
does is start the
init process, which reads the
/etc/inittab file. It looks for any entry that should
be run when the system is initializing (the entry has a
sysinit in the third field) and then executes the
corresponding command. (I'll get into details about different run-levels and
these entries shortly.)
The first thing init
runs out of the
inittab is the script /etc/rc.d/rc.sysinit , which is similar to the bcheckrc
script on other systems. As with everything else under /etc/rc.d, this is a
shell script, so you can take a look at it if
you want. Actually, I feel that looking through and becoming familiar with
which scripts does what and it what order is a good way of learning about
Among the myriad
of things done here are checking and mounting file systems, removing old lock
files, and enabling the swap space.
Note that if the file system
check notes some serious problems, the rc.sysinit will stop and bring you to a
shell prompt, where you can attempt to clean up by hand. Once you exit this
shell, the next command to be executed (aside from an echo) is a reboot. This is
done to ensure the validity of the file systems.
inittab for the line with initdefault in the third field. The initdefault entry
tells the system what run-level
to enter initially, normally run-level 3
Windows) or run-level 5 (with X Windows). Other systems have the
default run-level 1 to bring you into single-user or
Here you can perform certain actions without worrying users or too many other things
happening on your system. (Note: You can keep users out simply by creating the file
/etc/nologin. See the nologin man-page
What kind of actions can you perform here? The action with the most
impact is adding new or updating software. Often, new software will affect old
software in such a way that it is better not to have other users on the system.
In such cases, the installation procedures for that software should keep you
from installing unless you are in maintenance mode.
This is also a good
place to configure hardware that you added or otherwise change the kernel.
Although these actions rarely impact users, you will have to do a kernel
rebuild. This takes up a lot of system resources and degrades overall
performance. Plus, you need to reboot after doing a kernel
rebuild and it takes
longer to reboot from run-level
3 than from run-level 1.
If the changes
you made do not require you to rebuild the kernel
(say, adding new software), you can go directly from single-user to
mode by running
The argument to init is simply the run level you want to go
into, which, for most
purposes, is run-level
3. However, to shut down the system, you could bring the
system to run-level 0 or 6. (See the init
for more details.)
looks for any entry that has a 3 in the second field. This 3 corresponds to the
run-level where we currently are. Run-level 3 is the same as multi-user
Within the inittab, there is a line for every run level that starts the
script /etc/rc.d/rc, passing the run level as an
argument. The /etc/rc.d/rc script,
after a little housekeeping, then starts the scripts for that run level.
For each run level, there is a directory underneath /etc/rc.d,
such as rc3.d,
which contains the scripts that will be run for that run level.
directories, you may find two sets of scripts. The scripts beginning with K are
the kill scripts, which are used to shutdown/stop a particular subsystem. The S
scripts are the start scripts. Note that the kill and start scripts are links to
the files in /etc/rc.d/init.d. If there are K and S scripts with the same
number, these are both linked to the same file.
This is done because the
scripts are started with an argument
of either start or stop. The script itself
then changes its behavior based on whether you told it to start or stop.
Naming them something (slightly) different allows us to start only the K scripts
if we want to stop things and only the S scripts when we want to start things.
the system changes to a particular run level, the first scripts that are started
are the K scripts. This stops any of the processes that should not be running in
that level. Next, the S scripts are run to start the processes that should be
Let's look at an example. On most systems, run-level 3 is
almost the same as run-level 2. The only difference is that in
run-level 2, NFS is not running. If you were to change from
run-level 3 to run-level 2, NFS would go down. In run-level 1 (maintenance mode),
almost everything is stopped.