Files and Filesystems
Any time you access a Linux system, whether locally, across a
network, or through any other
means, both files and file systems are involved. Every program that you run
starts out as a file. Most of the time you are also reading or writing a file.
Because files (whether programs or data files) reside on file systems, every
time you access the system, you also access a file system.
Knowing what and how a file is represented on the disk and how the system
interprets the contents of the file is useful for you to understand what the system
is doing. With this understanding, you can evaluate both the system and
application behavior to determine whether it is proper.
Keep in mind that it is a misnomer when someone says that a directory "contains" other files
and directories. What really happens is the directory contains the information necessary to
locate the directories and files. Therefore, when you do a long listing, which shows file
sizes, the sizes of the directories does not include all of the
subdirectory's and files. Instead,
it simply shows how much space this directory takes up on the hard disk. Defined the total
amount of space taken up by a directory and all subdirectories you would use the command:
du -s <directory_name>
this shows you the disk usage for the directory given (the dash and option says to show a
some and not list the individual files and directories). Note that on some systems that du
command displays space in disk blocks of 512 bytes. If you use the -k option to du, you force
it displays size in kilobytes.
In Linux, as it is for Unix, the separate filesystems that the system may use are
not accessed by device identifiers (such as a drive number or a drive name) but instead
they are combined into a single hierarchical tree structure that represents the
filesystem as a single entity.
Linux adds each new filesystem into this single filesystem tree as they are mounted
onto a mount directory, for example /mnt/cdrom.
One of the most important features of Linux is its support for many different filesystems.
This makes it very flexible and well able to coexist with other operating systems.
The most popular filesystem for Linux is the EXT2 filesystem and this is the
filesystem supported by most of the Linux distributions.
A filesystem gives the user a sensible view of files and directories held on the
hard disks of the system regardless of the filesystem type or the characteristics
of the underlying physical device.
Linux transparently supports many different filesystems
(for example MS-DOS and EXT2)
and presents all of the mounted files and filesystems as one integrated virtual
filesystem.
So, in general, users and processes do not need to know what sort of filesystem that any
file is part of, they just use them.
The block device drivers hide the differences between the physical block device
types (for example, IDE and SCSI) and, so far as each filesystem is
concerned, the physical devices are just linear collections of blocks of data.
The block sizes may vary between devices, for example 512 bytes is common for
floppy devices whereas 1024 bytes is common for IDE devices and, again, this
is hidden from the users of the system.
An EXT2 filesystem looks the same no matter what device holds it.
One of the most important features of Linux is its support for many different file systems.
This makes it very flexible and well able to coexist with many other operating systems.
At the time of writing, Linux supports 15 file systems; ext, ext2, xia,
minix, umsdos, msdos, vfat, proc, smb, ncp, iso9660,
sysv, hpfs, affs and ufs, and no doubt, over time more will be added.
All file systems, of whatever type, are mounted onto a directory and
the files of the mounted file system cover up the existing contents of that directory.
This directory is known as the mount directory or mount point.
When the file system is unmounted, the mount directory's own files are once again revealed.
When disks are initialized (using fdisk, say) they have a partition structure
imposed on them that divides the physical disk into a number of logical partitions.
Each partition may hold a single file system, for example an EXT2 file system.
File systems organize files into logical hierarchical structures with directories, soft links and
so on held in blocks on physical devices.
Devices that can contain file systems are known as block devices.
The IDE disk partition /dev/hda1, the first partition of the first IDE disk drive in the
system, is a block device.
The Linux file systems regard these block devices as simply linear collections of blocks,
they do not know or care about the underlying physical disk's geometry.
It is the task of each block device driver to map a request to read a particular
block of its device into terms meaningful to its device; the particular track, sector
and cylinder of its hard disk where the block is kept.
A file system has to look, feel and operate in the same way no matter what device is holding it.
Moreover, using Linux's file systems, it does not matter (at least to the system user) that
these different file systems are on different physical media controlled by
different hardware controllers.
The file system might not even be on the local system, it could just as
well be a disk remotely mounted over a network link.
Consider the following example where a Linux system has its root file system on a SCSI disk:
A E boot etc lib opt tmp usr
C F cdrom fd proc root var sbin
D bin dev home mnt lost+found
Neither the users nor the programs that operate on the files themselves need know that
/C is in fact a mounted VFAT file system that is on the first IDE disk in the system.
In the example (which is actually my home Linux system), /E is the master IDE disk
on the second IDE controller.
It does not matter either that the first IDE controller is a PCI controller and that the second
is an ISA controller which also controls the IDE CDROM.
I can dial into the network where I work using a modem and the PPP network protocol
using a modem and in this case I can remotely mount my Alpha AXP Linux system's
file systems on /mnt/remote.
The files in a file system are collections of data;
the file holding the sources to this chapter is an ASCII file called filesystems.tex.
A file system not only holds the data that is contained within the files of the
file system but also the structure of the file system.
It holds all of the information that Linux users and processes see as files, directories
soft links, file protection information and so on.
Moreover it must hold that information safely and securely, the basic integrity of the
operating system depends on its file systems.
Nobody would use an operating system that randomly lost data and
files1.
Minix, the first file system that Linux had is rather restrictive and lacking
in performance.
Its filenames cannot be longer than 14 characters (which is still better than 8.3 filenames)
and the maximum file size is 64MBytes.
64Mbytes might at first glance seem large enough but large file sizes are necessary
to hold even modest databases.
The first file system designed specifically for Linux, the Extended File system, or EXT, was
introduced in April 1992 and cured a lot of the problems but it was still felt
to lack performance.
So, in 1993, the Second Extended File system, or EXT2, was added.
It is this file system that is described in detail later.
An important development took place when the EXT file system was added into Linux.
The real file systems were separated from the operating system and system services
by an interface layer known as the Virtual File system, or VFS.
VFS allows Linux to support many, often very different, file systems,
each presenting a common software interface to the VFS.
All of the details of the Linux file systems are translated by
software so that all file systems appear identical to the rest of the Linux kernel
and to programs running in the system.
Linux's Virtual File system layer allows you to transparently
mount the many different file systems at the same time.
The Linux Virtual File system is implemented so that access to its files is as fast and
efficient as possible.
It must also make sure that the files and their data are kept correctly.
These two requirements can be at odds with each other.
The Linux VFS caches information in memory from each file system as it is mounted and
used.
A lot of care must be taken to update the file system correctly as data within these
caches is modified as files and directories are created, written to and deleted.
If you could see the file system's data structures within the running kernel,
you would be able to see data blocks being read and written by the file system.
Data structures, describing the files and directories being accessed would be
created and destroyed and all the time the device drivers would be working
away, fetching and saving data.
The most important of these caches is the Buffer Cache, which is integrated into the
way that the individual file systems access their underlying block devices.
As blocks are accessed they are put into the Buffer Cache and kept on various queues
depending on their states.
The Buffer Cache not only caches data buffers, it also helps manage the asynchronous
interface with the block device drivers.
|