Checking the Sanity of Your System
Have you ever tried to do something and it didn't behave the way you expected
it to? You read the manual and typed in the example character for a character
only to find it didn't work right. Your first assumption is that the manual is
wrong, but rather than reporting a bug, you try the command on another machine
and to your amazement, it behaves exactly as you expect. The only logical reason
is that your machine has gone insane.
Well, at least that's the attitude I have
had on numerous occasions. Although this personification of the system helps
relieve stress sometimes, it does little to get to the heart of the problem.If
you want, you could check every single file on your system (or at least those
related to your problem) and ensure that permissions are correct, the size is
right, and that all the support files are there. Although this works in many
cases, often figuring out which programs and files are involved is not easy.
Fortunately, help is on the way. Linux provides several useful tools with
which you can not only check the sanity of your system but return it to normal.
I've already talked about the first set of tools. These are the monitoring tools
such as ps and vmstat. Although these programs
cannot correct your problems, they can indicate where problems lie.
If the problem is the result of
a corrupt file (either the contents are corrupt or the permissions
are wrong), the system monitoring tools cannot help much. However, several tools specifically address different aspects of your system.
Linux provides a utility to compute a checksum
on a file, called sum. It provides
three ways of determining the sum. The first is with no options at all, which
reports a 16-bit sum. The next way uses the
On many systems, there is
the md5sum command. Instead of creating a 16-bit sum, md5sum creates a 128-bit
sum. This makes it substantially more difficult to hide the fact that a file
Because of the importance of the file's checksum,
I created a shell script while I was in tech support that would
run on a freshly installed system. As it ran, it would store in a database all
the information provided in the permissions
lists, plus the size of the file (from an
for the customer, much of the information that my script and database provided
was something to which they didn't have access. Now, each system
administrator could write a similar script and call
up that information. However, most administrators do not consider this issue
until it's too late.
We now get to the "sanity checker" with which perhaps most people are familiar:
fsck, the file system checker. Anyone who has lived through a system
crash or had the system shut down improperly has seen fsck.
One unfamiliar aspect of fsck is the fact that it is actually several programs,
one for each of the different file systems. This is done because of the
complexities of analyzing and correcting problems on each file system. As a
result of these complexities, very little of the code can be shared. What can be
shared is found within the fsck program.
When it runs, fsck determines what type of file system you want to check and
runs the appropriate command. For example, if you were checking an ext2fs file
system, the program that would do the actually checking would be
fsck.ext2 (typically in the /sbin directory).
Another very useful sanity checker is the
rpm package manager (assuming that your system uses the RPM
file format) that is the RPM program
itself. As I talked about earlier, the rpm program is used to install additional
software. However, you can use many more options to test the integrity of your
When the system is
installed, all of the file information is stored in several files located in
/var/lib/rpm. These are hashed files that rpm can use but mean very little to us
humans. Therefore, I am not going to go into more detail about these files.
Assuming you know what file is causing the problem, you
can use rpm to determine the package to which this file belongs. The syntax would be
This tells you that xv is part of the package
Now use the -V option to verify the package:
If rpm returns with no response, the package is fine. What if the owner and group
are wrong? You would end up with an output that looks like
Each dot represents a particular characteristic of the
file. These characteristics are
|| File size|
|| Mode (permissions and file type)|
If any of these
characteristics are incorrect, rpm will display the appropriate letter.
If you wanted to check all of the packages you could create a script that looks
The first rpm command simply lists all of the package and pipes it into the read,
which then loops through each package and verifies it. Since each file will be
listed, you should have some seperator between each packages.