Files and Filesystems
Another thing you should monitor is how much space is left on your file
systems. I have seen many instances in which the root file system gets so close
to 100 percent full that nothing more can get done. Because the root file system
is where unnamed pipes are created by default, many processes die terrible
deaths if they cannot create a pipe. If the system does
get that full, it can prevent further logins (because each
login writes to log files). If root is not already logged
in and can remove some files, you have problems.
Fortunately, by default, when mke2fs creates a file system, a little space is
reserved for root. This prevents the system from becoming completely full, so
you have a chance to do something about it.
So the solution is to monitor your file systems to ensure that none of them
get too full, especially the root file system. A rule of thumb, whose origins
are lost somewhere in UNIX mythology, is that you should
make sure that there is at least 15 percent free on your root file system.
Although 15 percent on a 200MB hard disk is one-tenth the amount of free space
as 15 percent on 2Gb drive, it is a value that is easy to monitor. Consider
10-15Mb as a danger sign, and you should be safe. However, you need to be aware
of how much and how fast the system can change. If the system could
change 15Mb in a matter of hours, then 15Mb may be too small a margin.
When you consider
that a 100 GB drive costs about $100 (early 2005), then their really is little reason not to go out and get a new drive before the old one gets too full.
Use df to find out how much free space is on each mounted file system.
Without any options, the output of df is one file system per line, showing how
many blocks and how many inodes are free. Though this is interesting, I am
really more concerned with percentages. Very few administrators know how long it
takes to use 1,000 blocks, though most understand the significance if those
1,000 blocks mean that the file system is 95 percent full.
Because I am less concerned with how many inodes are free, the option I use
most with df is
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda4 4553936 571984 3750616 14% /
/dev/hda2 54447 5968 45668 12% /boot
/dev/hdc1 4335064 3837292 277556 94% /data
/dev/sda2 3581536 154920 3244680 5% /home
/dev/sda1 5162796 1783160 3117380 37% /opt
/dev/hdc2 4335064 1084964 3029884 27% /oracle
/dev/hdc3 5160416 2542112 2356152 52% /usr
/dev/hdc4 5897968 1532972 4065396 28% /usr/vmware
shmfs 192072 0 192072 0% /dev/shm
The shortcoming with df is that it tells you about the entire hard disk but
can't really point to where the problems are located. A full file system can be
caused by one of two things. First, there can be a few large files, which often
happens when log files are not cleaned out regularly.
The other case is when you have a lot of little files. This is similar to
ants at a picnic: individually, they are not very large, but hundreds swarming
over your hotdog is not very appetizing. If the files are scattered all over
your system, then you will have a hard time figuring out where they are. At the
same time, if they are scattered across the system, the odds are that no single
program created them, so you probably want (if not need) them all. Therefore,
you simply need a bigger disk.
If, on the other hand, the files are concentrated in one directory, it is
more likely that a single program is responsible. As with the large log files,
a common culprit are the files in /var/log.
To detect either case, you can use a combination of two commands. First is
find, which, you already know from previous encounters, is used to find files.
Next is du, which is used to determine disk usage. Without any options, du gives
you the disk usage for every file that you specify. If you don't specify any, it
gives you the disk usage for every file from your current directory on down.
Note that this usage is in blocks because even if a block contains a single
byte, that block is used and no longer available for any other file. However,
if you look at a long listing of a file, you see the size in bytes. A 1 byte
file still takes up one data block. The size indicated in a long directory
listing will usually be less than what you get if you multiple the number of
blocks by the size of the block (512 bytes). To get the sum of a directory
without seeing the individual files, use the
To look for directories that are exceptionally large, you can find all the
directories and use
I redirected the output into the file /tmp/fileusage for two reasons. First,
I have a copy of the output that I can use later if I need to. Second, this
command is going to take a very long time. Because I started in /, the
command found this directory (/) first. Therefore, the disk usage for the entire
system (including mounted file system) will be calculated. Only after it has
calculated the disk usage for the entire system does it go on to the individual
You can avoid this problem in a couple of ways. First, use
Personally, this is not very pretty, especially if I were going to be using
the command again. I would much rather create a list of directories and use
this as arguments to du. That way I can filter out those directories that I
don't need to check or only include those that I do want to check. For example,
I already know that /var/log might contain some large files.
On occasion, it's nice to figure out what files a process has open. Maybe the
process is hung and you want some details before you decide to kill it.