Processes
Processes carry out tasks within the operating system.
A program is a set of machine code instructions and data stored in an executable
image on disk and is, as such, a passive entity;
a process can be thought of as a computer program in action.
A process is a dynamic entity, constantly changing as the machine code instructions are executed by
the processor. As well as the program's instructions and data, the process also includes the
program counter and all of the CPU's
registers as well as the process stacks containing
temporary data such as routine parameters, return addresses and saved variables.
In short a process is an executing program encompassing all of the current activity
in the microprocessor.
Linux is a multiprocessing operating system.
Each process is a separate task with its own rights and responsibilities.
If one process crashes it will not cause another process in the system to crash.
Each individual process runs in its own virtual address space and
is not capable of interacting with another process except through secure,
kernel-managed mechanisms.
During the lifetime of a process it will use many system resources. It will use the CPUs in the
system to run its instructions and the system's physical memory to hold it and its data.
It will open and use files within the filesystems and may directly or indirectly use the
physical devices in the system.
Linux must keep track of the process and its system resources to fairly manage it and the
other processes in the system.
It would not be fair to the other processes in the system if one process
monopolized most of the system's physical memory or its CPUs.
The most precious resource in the system is the CPU, of which
there is usually only one.
Linux is a multiprocessing operating system that maximizes CPU utilization by
ensuring that there is a running process on each CPU in the system at all times.
If there are more processes than CPUs (and there usually are),
the rest of the processes must wait before a CPU becomes free until they can be
run.
Multiprocessing is a simple idea; a process is executed until it must wait,
usually for some system resource. It may resume once the resource becomes available.
In a uniprocessing system like DOS, the CPU
simply sits idly until the system resource becomes available, wasting the waiting time.
In a multiprocessing system many processes are kept in memory at the same time.
Whenever a process has to wait the operating system takes the CPU away from
that process and gives it to another, more deserving process.
Linux' scheduler uses a number of scheduling strategies to ensure fairness, such
as deciding which process to run next.
Linux supports a number of different executable file formats, like ELF
and Java. These must be managed transparently as must the process'
use of the system's shared libraries.
From the user's perspective, perhaps the most obvious aspect of a kernel
is process management. This is the part of the kernel
that ensures that each process gets its turn to run on the CPU.
This is also the part that makes sure that the individual processes don't "trounce" on other
processes by writing to areas of memory that belong to someone else. To do this, the kernel
keeps track of many different structures that are maintained both on a per-user
basis as well as systemwide.
As we talked about in the section on operating system basics,
a process is the running instance of a program (a
program simply being the bytes on the disks). One of the most powerful aspects
of Linux
is its ability not only to keep many processes in memory at once but
also to switch between them fast enough to make it appear as though they were all
running at the same time. (Note: In much of the Linux code, the references are
to tasks, not to processes. Because the term process seems to be more
common in UNIX literature and I am used to that term, I will be using process.
However, there is no difference between a task and a process, so you can
interchange them to your heart's content.)
A process runs within its context.
It is also common to say that the CPU
is operating within the context of a specific process.
The context of a process is all of the characteristics, settings, values, etc.,
that a particular program uses as it runs, as well as those that it needs to run.
Even the internal
state of the CPU
and the contents of all its registers are part of the context
of the process. When a process has finished having its turn on the CPU
and
another process gets to run, the act of changing from one process to another is
called a context switch. This is represented graphically by the figure
below.
Image - Context Switch (interactive)
We can say that a process' context is defined by two
structures: its task structure (also called its
uarea or ublock in some operating system
text)
and its process table entry. These contain the necessary information to
manage the each process, such as the user ID (UID) of the process, the group ID
(GID), the system call
error return value, and dozens of other things. To see
where it is all kept (that is, the structure of the task structure), see the
task_struct in <linux/sched.h>.
There is a special part
of the kernel's private memory that holds the task structure of the currently
running process. When a context switch
occurs, the task structure is switched out. All other parts of the process
remain where they are. The task structure of the next process is copied into
the same place in memory as the task structure for the old process. This
way the kernel does not have to make any adjustments
and knows exactly where to look for the task structure. It
will always be able to access the task structure of the currently running
process by accessing the same area in memory. This is the
current process, which is a pointer of type
task_struct.
One piece of
information that the process table
entry (PTE) contains is the process'
Local Descriptor Table
(LDT).
A descriptor
is a data structure
the process uses to gain access to different parts of the system (that is,
different parts of memory or different segments). Despite a common
misunderstanding, Linux does use a segmented memory architecture. In older CPUs,
segments were a way to get around memory access limitations. By referring to
memory addresses as offsets within a given segment,
more memory could be
addressed than if memory were looked at as a single block. The key difference
with Linux is that each of these segments are 4GB and not the 64K they were
originally.
The descriptors are held in descriptor tables.
The LDT
keeps track of a process' segments, also called a region.
That is, these
descriptors are local to the process. The Global Descriptor Table
(GDT) keeps track of the kernel's segments. Because there are many processes
running, there will be many LDTs. These are part of the process' context.
However, there is only one GDT,
as there is only one kernel.
Within the task structure is a pointer to another key aspect of a process' context: its
Task State Segment (TSS). The TSS
contains all the registers in the CPU.
The contents of all the registers define the state in which the CPU
is currently
running. In other words, the registers say what a given process is doing at any
given moment, Keeping track of these registers is vital to the concept of
multitasking.
By saving the registers in the TSS,
you can reload them when
this process gets its turn again and continue where you left off because all of
the registers are reloaded to their previous value. Once reloaded, the process
simply starts over where it left off as though nothing had happened.
This
brings up two new issues: system calls and stacks. A system call
is a
programming term for a very low-level function, functions that are
"internal" to the operating system
and that are used to access the
internals of the operating system,
such as in the device drivers that ultimately
access the hardware. Compare this to library calls, which are made up of system
calls.
A stack
is a means of keeping track where a process has been. Like
a stack
of plates, objects are pushed onto the stack and popped
off the stack.
Therefore, objects that are pushed onto the stack are then
popped off in reverse. When calling routines, certain
values are pushed onto the stack
for safe-keeping, including the variables to be
passed to the function and the location to which the system should return after
completing the function. When returning from that routine, these values are
retrieved by being popped off the stack.
Part of the task structure is a
pointer to that process' entry in the process table.
The process table, as its
name implies, is a table containing information about all the processes on the
system, whether that process is currently running or not. Each entry in the
process table is defined in <linux/sched.h>.
The principle that a process may be in memory but not
actually running is important and I will get into more detail about the life of
a process shortly.
The size of this table is a set value and is determined by the kernel
parameter NR_TASKS. Though you could change this value, you need to build a new kernel
and reboot for the change to take effect.
If there is a runaway process
that keeps creating more and processes or if you simply have a very busy system,
it is possible that the process table
will fill up. If it were to fill up, root
would be unable to even stop them because it needs to start a new process to do
so (even if root were logged in already.) The nice thing is that there is a set
number of processes reserved for root. This is defined by
MIN_TASKS_LEFT_FOR_ROOT in <
linux/tasks.h>. On my system, this defaults to 4.
Just how is a process created? First, one process uses the fork()
system call. Like a fork
in the road, the fork() system call
starts off as a
single entity and then splits into two. When one process uses the fork() system
call, an exact copy of itself is created in memory and the task
structures are essentially identical. However memory is not copied, but rather
new page tables are made to point to the same
place as the old ones. When
something in those pages changes, a copy is made (copy on write).
The value in each CPU
register is the same, so both copies of this
process are at the exact same place in their code. Each of the variables also
has the exact same value. There are two exceptions: the process ID number and
the return value of the fork() system
call. (You can see the details of the fork()
system call
in kernel/fork.c.)
How the fork()-exec() look graphically you can see in the figure below:
Image - Creating a New Process (interactive)
Like users and their UID, each process is referred to by
its process ID number, or PID, which is a unique number.
Although your system could have approximately 32K processes at a time, on even
the busiest systems it rarely gets that high.
You may, however, find a very large PID
on your system (running ps, for example). This does not mean that there are
actually that many processes. Instead, it demonstrates the fact that the system
does not immediately re-use the PID.
This is to prevent a "race condition", for example where one process sends a signal (message)
to a another process, but before the message arrives, the other process has stopped.
The result is that the wrong process could get the message.
When a fork()
system call is made, the value returned by the
fork() to the calling process is the PID
of the newly created process. Because the new copy didn't actually make the
fork() call, the return value in the copy is 0. This
is how a process spawns or forks a child process. The
process that called the fork() is the parent process of this new process, which is
the child process. Note that I
intentionally said the parent process
and a child process. A process can fork
many child processes, but the process has only one parent.
Almost always, a program will keep track of that return value and will then change its
behavior based on that value. It is very common for the child to issue an exec()
system call. Although it takes the fork()
system call to create the space that will be utilized by the new process, it is the
exec() system call
that causes this space to be
overwritten with the new program.
At the beginning of every executable
program is an area simply called the "header." This header
describes
the contents of the file; that is, how the file is to be interpreted. The header
contains the locations of the text
and data segments. As we talked about before,
a segment
is a portion of the program. The portion of the program that contains
the executable instructions is called the text
segment.
The portion containing
pre-initialized data is the data segment.
Pre-initialized data are
variables, structures, arrays, etc. that have their value already set even
before the program is run. The process is given descriptors for each of the
segments.
In contrast to other operating systems running on Intel-based
CPUs, Linux has only one segment
each for the text,
data, and stack.
I haven't
mentioned the stack
segment
until now because the stack segment is created when
the process is created. Because the stack
is used to keep track of where the
process has been and what it has done, there is no need create it until the
process starts.
Another segment
that I haven't talked about until now is
not always used. This is the shared data segment.
Shared data is an area of
memory that is accessible by more than one process. Do you remember from our
discussion on operating system
basics when I said that part of the job of the
operating system was to keep processes from accessing areas of memory that they
weren't supposed to? So, what if they need to?
What if they are supposed to? That is where the shared data region
comes in.
If one process tells the other where the shared memory segment
is (by giving a pointer to it), then any process can access it. The way
to keep unwanted processes away is simply not to tell them. In this way, each
process that is allowed can use the data and the segment
only goes away when
that last process disappears. Figure 0-3 shows how several processes would look
in memory.
Image - Process Segments (interactive)
In the figure above, we see three processes. In all three
instances, each process has its own data and stack
segments. However, process A
and process B share a text
segment.
That is, process A and process B have called
the same executable off the hard disk. Therefore, they are sharing the same
instructions. Note that in reality, this is much more complicated because the
two processes may be not be executing the exact same instructions at any given
moment.
Each process has at least a text,
data, and stack
segment.
In
addition, each process is created in the same way. An existing process will
(normally) use the fork()-exec() system
call pair to create another process. However, this brings up an interesting
question, similar to "Who or what created God?": If every
process has to be created by another, who or what created the first
process?
When the computer is turned on, it goes through some wild
gyrations that we talk about in the section on the boot process.
At the end of the boot
process, the system loads and executes the /vmlinuz binary,
the kernel itself. One of the last things the kernel
does is "force" the creation of a single
process, which then becomes the great-grandparent of all the other processes.
The first created process
is init,
with a PID
of 1. All other
processes can trace their ancestry back to init
. It is init's job to read the
entries in the file /etc./inittab and
execute different programs. One thing it does is start the
getty program on all the login
terminals, which
eventually provides every user with its shell.
Another system process is
bdflush,
the buffer
flushing daemon.
Its
job is to clean out any "dirty" buffers inside the systems buffer
cache. A dirty
buffer
contains data that has been written to by a program but
hasn't yet been written to the disk. It is the job of
bdflush to write this out to the hard disk (probably)
at regular intervals. These intervals are 30 seconds for data buffers and 5
seconds for metadata buffers. (Metadata is the data used to administer the file
system, such as the superblock.)
You may find on your system that two
daemons are running, bdflush
and
update. Both are used to write back
blocks, but with slightly different functions. The
update daemon writes back modified blocks (including
superblocks and inode tables) after a specific period
of time to ensure that blocks are not kept in memory too long without being
written to the disk. On the other hand, bdflush
writes back a specific number of dirty
blocks buffers. This keeps the ratio of dirty blocks to
total blocks in the buffer
at a "safe" level.
All processes,
including those I described above, operate in one of two modes: user or system
mode (see Figure 0-4 Process Modes). In the section on the CPU
in the hardware
chapter, I will talk about the privilege levels. An Intel 80386 and later has
four privilege levels, 0-3. Linux uses only the two most extreme: 0 and 3.
Processes running in user mode
run at privilege level 3 within the CPU.
Processes running in system mode
run at privilege level 0 (more on this in a
moment).
Process Modes (interactive)
In user mode, a process executes instructions from within its
own text
segment,
references its own data segment, and uses its own stack.
Processes switch from user mode
to kernel
mode by making system calls. Once in
system mode, the instructions within the kernel's text
segment
are executed, the kernel's data segment
is used, and a system stack
is used within the process task
structure.
Although the process goes through a lot of changes when it
makes a system call,
keep in mind that this is not a context switch.
It is still the same process but it is just operating at a higher privilege.
|