The Virtual File System (VFS)
Figure: A Logical Diagram of the Virtual File System<
The figure above
shows the relationship between the Linux kernel's Virtual File System and it's real
file systems. The virtual file system must manage all of the different file systems
that are mounted at any given time. To do this it maintains data structures that
describe the whole (virtual) file system and the real, mounted, file systems.
Rather confusingly, the VFS describes the system's files in terms of superblocks
and inodes in much the same way as the EXT2 file system uses superblocks and inodes.
Like the EXT2 inodes, the VFS inodes describe files and directories within the system;
the contents and topology of the Virtual File System.
From now on, to avoid confusion, I will write about VFS inodes and VFS superblocks to
distinquish them from EXT2 inodes and superblocks.
As each file system is initialised, it registers itself with the VFS.
This happens as the operating system initialises itself at system boot time.
The real file systems are either built into the kernel itself or are built as
loadable modules. File System modules are loaded as the system needs them, so,
for example, if the VFAT file system is implemented as a kernel module,
then it is only loaded when a VFAT file system is mounted. When a block
device based file system is mounted, and this includes the root file system, the
VFS must read its superblock. Each file system type's superblock read routine must
work out the file system's topology and map that information onto a VFS superblock
data structure. The VFS keeps a list of the mounted file systems in the system
together with their VFS superblocks. Each VFS superblock contains information and
pointers to routines that perform particular functions.
So, for example, the superblock representing a mounted EXT2 file system contains
a pointer to the EXT2 specific inode reading routine. This EXT2 inode read routine,
like all of the file system specific inode read routines, fills out the fields in a
VFS inode. Each VFS superblock contains a pointer to the first VFS inode on the file
system. For the root file system, this is the inode that represents the
``/'' directory. This mapping of information is very efficient for the
EXT2 file system but moderately less so for other file systems.
As the system's processes access directories and files, system routines are called
that traverse the VFS inodes in the system.
For example, typing ls for a directory or
cat for a file cause the the
Virtual File System to search through the VFS inodes that represent the file system.
As every file and directory on the system is represented by a VFS inode, then a number
of inodes will be being repeatedly accessed.
These inodes are kept in the inode cache which makes access to them quicker.
If an inode is not in the inode cache, then a file system specific routine must be
called in order to read the appropriate inode.
The action of reading the inode causes it to be put into the inode cache and
further accesses to the inode keep it in the cache.
The less used VFS inodes get removed from the cache.
All of the Linux file systems use a common buffer cache to cache data buffers
from the underlying devices to help speed up access by all of the file systems to the
physical devices holding the file systems.
This buffer cache is independent of the file systems and is integrated into the
mechanisms that the Linux kernel uses to allocate and read and write data buffers.
It has the distinct advantage of making
the Linux file systems independent from the underlying media and from the
device drivers that support them.
All block structured devices register themselves with the Linux kernel and
present a uniform, block based, usually asynchronous interface.
Even relatively complex block devices such as SCSI devices do this.
As the real file systems read data from the underlying physical disks, this
results in requests to the block device drivers to read physical blocks from the
device that they control.
Integrated into this block device interface is the buffer cache.
As blocks are read by the file systems they are saved in the global buffer cache
shared by all of the file systems and the Linux kernel.
Buffers within it are identified by their block number and a unique identifier for
the device that read it.
So, if the same data is needed often, it will be retrieved from the buffer cache
rather than read from the disk, which would take somewhat longer.
Some devices support read ahead where data blocks are speculatively read just in
case they are needed.
The VFS also keeps a cache of directory lookups so that the inodes
for frequently used directories can be quickly found.
As an experiment, try listing a directory that you have not listed recently.
The first time you list it, you may notice a slight pause but the second time
you list its contents the result is immediate.
The directory cache does not store the inodes for the directories itself; these
should be in the inode cache, the directory cache simply stores the mapping between
the full directory names and their inode numbers.
Finding a File in the Virtual File System
To find the VFS inode of a file in the Virtual File System, VFS must resolve the
name a directory at a time, looking up the VFS inode representing each of the
intermediate directories in the name.
Each directory lookup involves calling the file system specific lookup whose address
is held in the VFS inode representing the parent directory.
This works because we always have the VFS inode of the root of each file system
available and pointed at by the VFS superblock for that system.
Each time an inode is looked up by the real file system it checks the directory
cache for the directory.
If there is no entry in the directory cache, the real file system gets the VFS
inode either from the underlying file system or from the inode cache.
Creating a File in the Virtual File System
command is more than just a command; it is also a daemon.
When run as superuser (during system initialisation) it will periodically flush
all of the older dirty buffers out to disk. It does this by calling a system service
routine that does more or less the same thing as bdflush.
Whenever a dirty buffer is finished with, it is tagged with the system time that
it should be written out to its owning disk. Every time that update
runs it looks at all of the dirty buffers in the system looking for ones with an
expired flush time. Every expired buffer is written out to disk.
The /proc File System
The /proc file system really shows the power of the Linux Virtual File System.
It does not really exist (yet another of Linux's conjuring tricks), neither
the /proc directory nor its subdirectories and its files actually exist.
So how can you cat /proc/devices?
The /proc file system, like a real file system, registers
itself with the Virtual File System.
However, when the VFS makes calls to it requesting inodes as its files and directories
are opened, the /proc file system creates those files and
directories from information within the kernel.For example, the kernel's
/proc/devices file is generated from the kernel's
data structures describing its devices.
The /proc file system presents a user readable window into the kernel's
Several Linux subsystems, such as Linux kernel modules described in
the section on kernel modules, create entries in
the /proc file system.