So that Linux can manage the processes in the system, each process is represented by a task_struct data structure (task and process are terms that Linux uses interchangeably). The task vector is an array of pointers to every task_struct data structure in the system.
This means that the maximum number of processes in the system is limited by the size of the task vector; by default it has 512 entries. As processes are created, a new task_struct is allocated from system memory and added into the task vector. To make it easy to find, the current, running, process is pointed to by the current pointer.
As well as the normal type of process, Linux supports real time processes. These processes have to react very quickly to external events (hence the term “real time”) and they are treated differently from normal user processes by the scheduler. Although the task_struct data structure is quite large and complex, but its fields can be divided into a number of functional areas:
- State
- As a process executes it changes state according to its circumstances.
Linux processes have the following states:
1
- Running
- The process is either running (it is the current process in the system) or it is ready to run (it is waiting to be assigned to one of the system’s CPUs).
- Waiting
- The process is waiting for an event or for a resource. Linux differentiates between two types of waiting process; interruptible and uninterruptible. Interruptible waiting processes can be interrupted by signals whereas uninterruptible waiting processes are waiting directly on hardware conditions and cannot be interrupted under any circumstances.
- Stopped
- The process has been stopped, usually by receiving a signal. A process that is being debugged can be in a stopped state.
- Zombie
- This is a halted process which, for some reason, still has a task_struct data structure in the task vector. It is what it sounds like, a dead process.
- Scheduling Information
- The scheduler needs this information in order
to fairly decide which process in the system most deserves to run,
- Identifiers
- Every process in the system has a process identifier.
The process identifier is not an index into the
task vector, it is simply a number. Each process also has
User and group identifiers, these are used to control this processes access to the files and
devices in the system,
- Inter-Process Communication
- Linux supports the classic Unix TM IPC mechanisms of
signals, pipes and semaphores and also the System V IPC mechanisms of shared memory,
semaphores and message queues.
The IPC mechanisms supported by Linux are described in the section on IPC.
- Links
- In a Linux system no process is independent of any other process.
Every process in the system, except the initial process has a parent process.
New processes are not created, they are copied, or rather cloned from previous processes.
Every task_struct representing a process keeps pointers to its parent process
and to its siblings (those processes with the same parent process) as well as to its own child
processes.
You can see the family relationship between the running processes in a Linux system
using the pstree command:
init(1)-+-crond(98) |-emacs(387) |-gpm(146) |-inetd(110) |-kerneld(18) |-kflushd(2) |-klogd(87) |-kswapd(3) |-login(160)---bash(192)---emacs(225) |-lpd(121) |-mingetty(161) |-mingetty(162) |-mingetty(163) |-mingetty(164) |-login(403)---bash(404)---pstree(594) |-sendmail(134) |-syslogd(78) `-update(166)
Additionally all of the processes in the system are held in a doubly linked list whose root is the init processes task_struct data structure. This list allows the Linux kernel to look at every process in the system. It needs to do this to provide support for commands such as ps or kill. - Times and Timers
- The kernel keeps track of a processes creation time as well as the
CPU time that it consumes during its lifetime. Each clock tick, the
kernel updates the amount of time in jiffies that the current process
has spent in system and in user mode.
Linux also supports process specific interval timers, processes can use system
calls to set up timers to send signals to themselves when the timers expire.
These timers can be single-shot or periodic timers.
- File system
- Processes can open and close files as they wish and the
processes task_struct
contains pointers to descriptors for each open file as well as pointers to two
VFS inodes.
Each VFS inode uniquely describes a file or directory within a file system and also
provides a uniform interface to the underlying file systems.
How file systems are supported under Linux is described in
the section on the filesystems.
The first is to the root of the process (its home directory) and the second is to its
current or pwd directory. pwd is derived from the Unix TM command pwd,
print working directory.
These two VFS inodes have their count fields incremented to show that
one or more processes are referencing them. This is why you cannot delete
the directory that a process has as its pwd directory set to, or for that
matter one of its sub-directories.
- Virtual memory
- Most processes have some virtual memory (kernel threads and
daemons do not) and the Linux kernel must track how that virtual memory is
mapped onto the system’s physical memory.
- Processor Specific Context
- A process could be thought of as the sum total of the
system’s current state.
Whenever a process is running it is using the processor’s registers, stacks and
so on.
This is the processes context and, when a process is suspended, all of that
CPU specific context must be saved in the task_struct for the process.
When a process is restarted by the scheduler its context is restored from here.