In its simplest form a semaphore is a location in memory whose value can be
tested and set by more than one process.
The test and set operation is, so far as each process is concerned, uninterruptible or
atomic; once started nothing can stop it.
The result of the test and set operation is the addition of the current value of
the semaphore and the set value, which can be positive or negative.
Depending on the result of the test and set operation one process may have to sleep
until the semphore's value is changed by another process.
Semaphores can be used to implement critical regions, areas of critical code
that only one process at a time should be executing.
Although a program variable could be considered "a location in memory whose value can be
tested and set", the key different is that with a semaphore is accessible to other processes,
whereas a variable is only accessible to the one process that created it. The fact that
it is accessible from multiple processes is the key feature of a semaphore.
Say you had many cooperating processes reading records from and writing records to a
single data file.
You would want that file access to be strictly coordinated.
You could use a semaphore with an initial value of 1 and, around the file operating
code, put two semaphore operations, the first to test and decrement the semaphore's
value and the second to test and increment it.
The first process to access the file would try to decrement the semaphore's value and
it would succeed, the semaphore's value now being 0.
This process can now go ahead and use the data file but if another process wishing to
use it now tries to decrement the semaphore's value it would fail as the result would
That process will be suspended until the first process has finished with the data file.
When the first process has finished with the data file it will increment the semaphore's
value, making it 1 again.
Now the waiting process can be awakened and this time its attempt to decrement the
semaphore will succeed.
Figure: System V IPC Semaphores
System V IPC semaphore objects each describe a semaphore array and Linux uses the
semid_ds data structure to represent this.
All of the semid_ds data structures in the system are pointed at by
the semary, a vector of pointers.
There are sem_nsems in each semaphore array, each one described by a
sem data structure pointed at by sem_base.
All of the processes that are allowed to manipulate the semaphore array of a System
V IPC semaphore object may make system calls that perform operations on them.
The system call can specify many operations and
each operation is described by three inputs: the semaphore index, the operation value and
a set of flags.
The semaphore index is an index into the semaphore array and the operation value is
a numerical value that will be added to the current value of the semaphore.
First Linux tests whether or not all of the operations would succeed.
An operation will succeed if the operation value added to the semaphore's current
value would be greater than zero or if both the operation value and the semaphore's
current value are zero.
If any of the semaphore operations would fail Linux may suspend the process but
only if the operation flags have not requested that the system call is non-blocking.
If the process is to be suspended then Linux must save the state of
the semaphore operations to be performed and put the current process onto a wait
It does this by building a sem_queue
data structure on the stack and filling it out.
The new sem_queue data structure is put at the end of this semaphore
object's wait queue (using the sem_pending and sem_pending_last
The current process is put on the wait queue in the sem_queue data structure
(sleeper) and the scheduler called to choose another process to run.
If all of the semaphore operations would have succeeded and the current process
does not need to be suspended, Linux goes ahead and applies the operations to the
appropriate members of the semaphore array.
Now Linux must check that any waiting, suspended, processes may now apply their
It looks at each member of the operations pending queue (sem_pending) in
turn, testing to see if the semphore operations will succeed this time.
If they will then it removes the sem_queue data structure from the operations
pending list and applies the semaphore operations to the semaphore array.
It wakes up the sleeping process making it available to be restarted the next time the
Linux keeps looking through the pending list from the start until there is a pass
where no semaphore operations can be applied and so no more processes can be awakened.
There is a problem with semaphores: deadlocks.
These occur when one process has altered the semaphore's value as it enters
a critical region but then fails to leave the critical region because it crashed
or was killed.
Linux protects against this by maintaining lists of adjustments to the semaphore
The idea is that when these adjustments are applied, the semaphores will be put back
to the state that they were in before the a process' set of semaphore operations
These adjustments are kept in sem_undo data
structures queued both on the semid_ds data structure and on the task_struct
data structure for the processes using these semaphore arrays.
Each individual semaphore operation may request that an adjustment be maintained.
Linux will maintain at most one sem_undo data structure per process for each
If the requesting process does not have one, then one is created when it is needed.
The new sem_undo data structure is queued both onto this process' task_struct
data structure and onto the semaphore array's semid_ds data structure.
As operations are applied to the semphores in the semaphore array the negation of
the operation value is added to this semphore's entry in the adjustment array of
this process' sem_undo data structure.
So, if the operation value is 2, then -2 is added to the adjustment entry for this
When processes are deleted, as they exit Linux works through their set of
sem_undo data structures applying the adjustments to the semaphore arrays.
If a semaphore set is deleted, the sem_undo data structures are left queued
on the process' task_struct but the semaphore array identifier is made
In this case the semaphore clean up code simply discards the sem_undo data