Based on the principle of spatial locality, a program is more likely to spend
its time executing code around the same set of instructions. This is
demonstrated by the tests that have shown that most programs spend 80 percent of
their time executing 20 percent of their code. Cache memory takes advantage of
Cache memory, or sometimes just cache,
is a small set of very high-speed memory. Typically, it uses SRAM,
which can be up to ten times more expensive than DRAM,
which usually makes it prohibitive for anything other than cache.
When the IBM PC first came out, DRAM
was fast enough to keep up with even the fastest processor. However, as
CPU technology increased, so did its speed. Soon, the CPU
began to outrun its memory. The advances in CPU technology could not be used
unless the system was filled with the more expensive, faster
The solution to this was a compromise. Using the locality principle,
manufacturers of fast 386 and 486 machines began to include a set of cache
memory consisting of SRAM
but still populated main memory with the slower, less expensive DRAM.
To better understand the advantages of this scheme, lets cover the principle
of locality in a little more detail. For a computer program, we deal with two
types of locality: temporal (time) and spatial (space). Because programs tend to
run in loops (repeating the same instructions), the same set of instructions
must be read over and over. The longer a set of instructions is in memory
without being used, the less likely it is to be used again. This is the
principle of temporal locality. What cache memory does is
enable us to keep those regularly used instructions "closer" to the
CPU, making access to them much faster. This is shown
graphically in Figure 0-10.
Image - Level 1 and Level 2 Caches (interactive)
Spatial locality is the relationship between consecutively executed
instructions. I just said that a program spends more of its time executing the
same set of instructions. Therefore, in all likelihood, the next instruction the
program will execute lies in the next memory location. By filling
cache with more than just one instruction at a time, the
principle of spatial locality can be used.
Is there really such a major advantage to cache
memory? Cache performance is evaluated in terms of cache hits. A hit
occurs when the CPU requests a memory location that is
already in cache (that is, it does not have to go to main memory to get it).
Because most programs run in loops (including the OS), the principle of locality
results in a hit ratio of 85 to 95 percent. Not bad!
On most 486 machines, two levels of cache
are used: level 1 cache and level 2 cache. Level 1 cache is internal to the
CPU. Although nothing (other than cost) prevents it from
being any larger, Intel has limited the level 1 cache in the 486 to 8k.
The level 2 cache is the kind that you buy separately from
your machine. It is often part of the advertisement you see in the paper and is
usually what people are talking about when they
say how much cache is in their systems. Level 2 cache is external to the
CPU and can be increased at any time, whereas level 1 cache
is an integral part of the CPU and the only way to get more is to buy a
different CPU. Typical sizes of level 2 cache range from 64K to 256K, usually in
increments of 64K.
There is one major problem with dealing with cache
memory: the issue of consistency. What happens when main memory is updated and
cache is not? What happens when cache is updated and main memory is not? This
is where the caches write policy comes in.
The write policy
determines if and when the contents of the cache
are written back to memory. The write-through cache simply writes the data through the cache
directly into memory. This slows writes, but the data is consistent. Buffered write-through is a
slight modification of this, in which data are collected and everything is written at once.
Write-back improves cache performance by writing to main memory only when
necessary. Write-dirty is when it writes to main memory only when it has been
Cache (or main memory, for that matter) is referred to as "dirty" when it is written to.
Unfortunately, the system has no way of telling whether anything has changed,
just that it is being written to. Therefore it is possible, but not likely, that
a block of cache is written back to memory even if it is
not actually dirty.
Another aspect of cache
is its organization. Without going into detail (that would take most of a
chapter itself), I can generalize by saying there are four different types of
The first kind is fully associative, which means that every entry in the
cache has a slot in the "cache directory" to indicate where
it came from in memory. Usually these are not individual bytes, but chunks of
four bytes or more. Because each slot in the cache has a separate directory
slot, any location in RAM can be placed anywhere in the
cache. This is the simplest scheme but also the slowest because each cache
directory entry must be searched until a match (if any) is found. Therefore,
this kind of cache is often limited to just 4Kb.
The second type of cache
organization is direct-mapped or one-way set associative cache,
which requires that only a single directory entry be searched. This
speeds up access time considerably. The location in the cache is related on the
location in memory and is usually based on blocks of memory equal to the size of
the cache. For example, if the cache could hold 4K 32-bit (4-byte) entries, then
the block with which each entry is associated is also 4K x 32 bits. The first 32
bits in each block are read into the first slot of the cache, the second 32 bits
in each block are read into the second slot, and so on. The size of each entry,
or line, usually ranges from 4 to 16 bytes.
There is a mechanism called a tag, which tells us which block this came from.
Also, because of the very nature of this method, the
cache cannot hold data from multiple blocks for the same
offset. If, for example, slot 1 was already filled with the data from block 1
and a program wanted to read the data at the same location from block 2, the
data in the cache would be overwritten. Therefore, the shortcoming in this
scheme is that when data is read at intervals that are the size of these blocks,
the cache is constantly overwritten. Keep in mind that this does not occur too
often due to the principle of spatial locality.
The third type of cache
organization is an extension of the one-way set associative cache,
called the two-way set associative. Here, there are two entries per
slot. Again, data can end up in only a particular slot, but there are two places
to go within that slot. Granted, the system is slowed a little because it has to
look at the tags for both slots, but this scheme allows data at the same offset
from multiple blocks to be in the cache at the same time. This is also extended
to four-way set associative cache. In fact, the cache internal to 486 and
Pentium has a four-way set associate cache.
Although this is interesting (at least to me), you may be asking yourself,
"Why is this memory stuff important to me as a system administrator?" First,
knowing about the differences in RAM (main memory) can aide
you in making decisions about your upgrade. Also, as I mentioned earlier, it may
be necessary to set switches on the motherboard if you change memory
Knowledge about cache
memory is important for the same reason because you
may be the one who will adjust it. On many machines, the write policy
adjusted through the CMOS. For example, on my machine, I
have a choice of write-back, write-through, and write-dirty. Depending on the
applications you are running, you may want to change the write policy
to improve performance.