Originally, you could only get four partitions on a hard disk. Though this
was not a major issue at first, as people got larger hard disks, there was a
greater need to break things down in a certain structure. In addition, certain
OSes needed a separate space on which to swap. If you had one partition
for your root file system, one for swap, one for user information, and one for common
data, you would have just run out.
To solve the problem and still maintain
backward compatibility, DOS-based machines were able to create an extended
partition that contained logical partitions within it. Other systems, like SCO,
allow you to have multiple file systems within a single partition
to overcome the limitation.
To be able to access data on your hard disk, there has to
be some pre-defined structure. Without structure, the unorganized data end up
looking like my desk, where there are several piles of papers that I have to
look though to find what I am looking for. Instead, the layout of a hard disk
follows a very consistent pattern so consistent that it is even possible for
different operating systems to share the hard disk.
Basic to this structure is the concept of a partition. A partition
defines a portion of the hard disk to be used by one operating system
or another. The partition can
be any size, even the entire hard disk. Near the very beginning of the disk is
the partition table. The partition
table is only 512 bytes but can still define where each partition
begins and how large it is. In addition, the partition table indicates which of the partitions
is active. This decides which partition
the system should go to when looking for an operating system
to boot. The partition
table is outside of any partition.
Once the system has determined which partition
is active, the CPU
knows to go to the very first block of data within that partition and begin executing
the instructions there. However, if LILO is setup to run out of your master
boot block, it doesn't care
about the active partition. It does what you tell it.
Often, special control structures that impose an additional structure are created at the
beginning of the partition.
This structure makes the partition a file system.
There are two control structures at the beginning of the file
system: the superblock and the inode table. The superblock
contains information about the type of file system, its size, how many data
blocks there are, the number of free inodes, free space available, and where the
inode table is. On the ext2 filesystem,
copies of the superblock
are stored at regular intervals for efficiency and in case the original gets
Many users are not aware that different file systems reside on
different parts of the hard disk and, in many cases, on different physical
disks. From the users perspective, the entire directory structure is one unit
from the top (/) down to the deepest
subdirectory. To carry out this deception, the system administrator
needs to mount file systems by mounting the device node associated with the file
system (e.g., /dev/home) onto a mountpoint (e.g., /home). This can be done either manually, with the mount command line,
or by having the system do it for you when it boots. This
is done with entries in
Conceptually, the mountpoint serves as a detour sign for the system.
If there is no file system mounted on the mountpoint, the system can just drive
through and access whats there. If a file system is mounted, when the system
gets to the mountpoint, it sees the detour sign and immediately diverts in
another direction. Just as roads, trees, and houses still exist on the other
side of the detour sign, any file or directory that exists underneath the
mountpoint is still there. You just cant get to it.
Lets look at an example. You have the /dev/home file
system that you are mounting on /home.
Lets say that when you first installed the system and before you first mounted
the /dev/home file system, you created some users with their home directories in /home.
For example, /home/jimmo. When you do finally mount
the /dev/home file system onto the /home directory, you no
longer see /home/jimmo. It is stillthere, but once the system reaches the /homes directory, it is redirected somewhere else.
The way Linux accesses
its file systems is different from the way a lot of people are accustomed to it.
Lets consider what happens when you open a file. All the program needs to know
is the name of the file, which it tells the operating system,
which then has to convert it to a physical location on this disk. This usually
means converting it to an inode first.
Because the conversion between a file name and the
physical location on the disk will be different for different file system types,
Linux has implemented a concept called the Virtual File System (VFS) layer. When a
program makes a system call
that accesses the file system (such as open), the kernel
actually calls a function within the VFS layer. It is then the VFS's
responsibility to call the file-system-specific code to access the data. The figure
below shows what this looks like graphically.
Image - File System Layers (interactive)
Because it has to interact with every file system type,
the VFS has a set of functions that every file system implements. It has to know
about all the normal operations that occur on a file such as opening, reading,
closing, etc., as well as know about file system structures, such as
If you want more details, there is a whole section on the VFS.
One of the newest Linux file system is the Second Extended File System
(ext2fs). This is an enhanced version of the Extended File System (extfs). The
ext2fs was designed to fix some problems in the extfs, as well as add some
features. Linux supports a larger number of other filesystems, but as of this writing,
the ext2fs seems to be the most common. In the following discussion we will be talking
specifically about the ext2fs in order to explain how inodes work. Although the details
are specific to the ext2fs, the concepts apply to many other filesystems.
Among other things that the inode
keeps track of are file types and permissions,
number of links, owner and group, size of the file, and when it was last modified.
In the inode, you will find 15 pointers to the actual data on
the hard disk.
Note that these are pointers to the data and not the
data itself. Each one of the 15 pointers to the data is a block address
on the hard disk. For the following discussion, please refer to the figure below.
Figure - Inodes Pointing to Disk Blocks
Each of these blocks is 1,024 bytes. Therefore,
the maximum file size on a Linux system is 15Kb. Wait a minute! That doesn't
sound right, does it? It isn't. If (and that's a big if) all of these pointers
pointed to data blocks, then you could only have a file up to 15Kb. However,
dozens of files in the /bin directory
alone are larger than 15Kb. Hows that?
The answer is that only 12 of these blocks actually point to data, so there is
really only 12Kb that you can access directly. These are referred to as data
blocks. The thirteenth pointer points to a block on the hard disk that actually
contains the real pointers to the data. These are the indirect
data blocks and contain 4-byte values, so there are 128 of them in each block.
In the figure above, the thirteenth entry is a pointer to block 567. Block 567
contains 128 pointers to indirect data blocks. One of these pointers points to
block 33453, which contains the actual data. Block 33453 is an indirect data
Because the data blocks that the 128 pointers pointed to in block 567 each
contain 512 bytes of data, there is an additional 65K of data. So, with 12K for
the direct data blocks and 65K for the indirect data blocks, we now have a
maximum file size of 77K.
Hmmm. Still not good. There are files on your system larger than 77K. So that
brings us to triplet 12. This points not to data blocks, not to a block of
pointers to data blocks, but to blocks that point to blocks that point to data
blocks. These are the data blocks.
In the figure, the fourteenth pointer contains a pointer to block 5601. Block
5601 contains pointers to other blocks, one of which is block 5151. However,
block 5151 does not contain data, but more pointers. One of these pointers
points to block 56732, and it is block 56732 that finally contains the data.
We have a block of 128 entries that each point to a block that each contains
128 pointers to 512 byte data blocks. This gives us 8Mb, just for the
double-indirect data blocks. At this point, the additional size gained by the
single-indirect and direct data blocks is negligible. Therefore, lets just say
we can access more than 8Mb. Now, that's much better. You would be hard-pressed
to find a system with files larger than 8Mb (unless we are talking about large
database applications). However, were not through yet. We have one pointer left.
So, not to bore you with too many of you, lets do the math quickly. The last pointer
points to a block containing 128 pointers to other blocks, each of which points
to 128 other blocks. At this point, we already have 16,384 blocks. Each of these
16,384 blocks contain 128 pointers to the actual data blocks. Here we have
2,097,152 pointers to data blocks, which gives us a grand total of
1,073,741,824, or 1Gb, of data (plus the insignificant 8MB we get from the
double-indirect data blocks). As you might have guessed, these are the
triple-indirect data blocks.In Figure 0-7 pointer 13 contains a pointer to block
43. Block 42 contains 256 pointers, one of which points to block 1979. Block
1979 also contains 256 pointers, one of which points to block 988. Block 988
also contains 256 pointers, though pointers point to the actual data. For
example, block 911.
If we increase the block size to 4k (4096 bytes), we end up with more
pointers in each of the indirect blocks so they can point to more blocks. In
the end, we have files the size of 4Tb. However, because the size field in the
inode is a 32-bit value, we max out at
If you want more details, there is a whole section on the ext2fs.
Linux's support for file systems is without a doubt the most extensive of any
operating system. In addition to "standard linux" file systems,
there is also support for FAT, VFAT, ISO9660 (CD-ROM), NFS,
plus file systems mounted from Windows machines using Samba (via the SMB
protocol). Is that all? Nope! There are also drivers to support several
compressed formats such as stacker and double-space. The driver for the Windows
NT file system (NTFS) can even circumvent that "annoying"
Warning: I have seen Linux certification prep books that talk about the the inode being a "unique" number. This can be extremely misleading. While it is true that any given inode will only appear once in the inode table, this does not mean that multiple files cannot have the same inode. If they do, then they point to the same data on the hard disk, despite having different names.