Obviously a page fault involves a heck of a lot of work on the part of the operating system and a very long wait for the user program before it gets to resume. But because of the principle of locality, page faults are relatively rare. If a program is well-behaved and stays within a small region of memory during each phase of its life, page faults will only occur when it changes phase. For example, the initial phase might involve reading and preprocessing an input data file. Then the program goes into the next phase, a more computationally intensive one. If it suffers a page fault here, the program will be slowed down, but since it doesn't have to go back to the previous stage again, no extra page faults will occur. Thus, for well-behaved programs (and most are) the overhead of virtual memory is bearable.
However, there is another more subtle slowdown involved, and that is looking up the page number in the page table during every access to memory. This involves a memory access in itself, in addition to the eventual, desired memory access. Thus, we would expect computers that used virtual memory to run at half the speed of their non-virtual cousins.
Again, the principle of locality saves the day. Since most memory references will be within a small number of pages, say 10, then a tiny cache of memory that is almost as fast as the general purpose registers can be maintained. In that cache the part of the page table that is getting used repeatedly can be stored. If 90% of all requests to translate addresses are within those pages whose translations are pre-computed and stored in this cache, the machine will only be slightly slower than a computer without virtual memory. This tiny cache is called a TLB, or Translation Lookaside Buffer.
The way a TLB works is that those entries from the page table that were most recently referenced are copied into the TLB. Whenever the MMU makes an address translation, it first looks in the TLB. If the page number is there, meaning that it has been translated recently, the MMU pulls the frame number out of the TLB quickly and inserts it into the upper part of the MAR. If the page number is not there, however, the MMU must go out to the page table that is kept in main memory and find it. The MMU copies that entry into the TLB, hoping that addresses out of that same page will need to be translated again soon.
In a way, the TLB acts like real memory while the main memory acts like virtual memory! The TLB is tiny while the main memory is huge, so the spill-over is kept in main memory. In the larger sense, real memory is where we store stuff that we are only working on right now, while disk is the much more gigantic memory where we keep all the items we ever will need.
The speed of access differs among these various systems, along with their size of storage, so that very fast memories tend to be very small, and very large memories (such as disk drives) tend to be very slow, able to hold billions of bytes of information.
A good analogy is an office. There is usually a desk to work at, to spread out one's papers while working. One or more filing cabinets are nearby where a vaster amount of papers is stored. If one needs a paper that is not on the desktop, he or she goes to the filing cabinet, finds it and puts it onto the desktop. Of course, she or he had better put some papers back into the filing cabinet sooner or later, or the desk will begin to look like a mountain range.
Fig. 12.7.1 shows the TLB inserted into circuitry that does dynamic address translation.
The TLB should be extremely fast, faster than main memory and as fast as registers. To achieve the necessary speeds, TLBs are associative memories, which means that items are looked up by their content rather than their address. We humans are very good at associative memory since this seems to be how our minds work. For example, you might vainly be trying to remember the name of the famous actress who starred as the evil sister in the movie "Whatever Happened to Baby Jane?" You try all day to remember it but can't. All of a sudden, you get a call from your friend Betty and it pops into your head that the evil sister was played by Bette Davis. The rest of the name, Davis, is associated with "Bette" so that when you hear part of her name, your mind automatically fills in the rest. (Bette Davis deliberately spelled her name "Bette" rather than "Bettie" or "Betty.")
In the computer, associative memory is a set of flip flops along with extra circuitry that enables each word to be compared at the same time. Instead of an MAR, an associative memory has only an MBR and a mask register. When the read signal is given, the associative memory compares the MBR to every word in the memory all at once. If the contents of the MBR is found, a signal is emitted saying so.
Usually, only part of the MBR is compared, which is the purpose of the mask register. If the part of the MBR corresponding to 1's in the mask register is found anywhere in the memory, the entire word is copied into the MBR. To write to the associative memory, a value is put into the MBR and a write signal is given. The associative memory selects a word of its memory at random and copies the contents of the MBR into it.
The next two figures illustrates the reading process for an associative memory. First, the mask and MBR registers are loaded. The mask tells which bits of the MBR must match with some word in memory. Whereever there is a 1 in the mask, the corresponding bit of the MBR must match a word in memory. If there is a 0, no match is done. Thus, the mask tells which part of the MBR is the "key" and which part is the "value". Fig. 12.7.2 shows this.
Fig. 12.7.2: Setup for associative memory search for 11011100
Fig. 12.7.3 shows that the fourth word of memory matched so the word is copied into the MBR.
What this figure does not show is that if 11011100 was not found in any word, a failure bit would be set so the computer would not erroneously interpret what is left in the MBR as the searched-for contents.
Associative memories are great for finding things quickly because the gates are set up to compare every word with the MBR all at once and to tell if the value is found. The alternative is to sequentially compare every word in the memory, which would take time.
In the context of virtual memory, the page number is what is being searched for in the TLB and the corresponding frame number is what is found and returned if that page number is there.
There is always a cost, some sort of trade-off or dark side, and for associative memories the cost lies in the complexity of the gates and circuits. Associative memories are larger and more complex than regular memories so TLBs in real computers tend to be tiny, usually 8 or 16 words. Studies have shown that such small TLBs are effective, nevertheless. Approximately 90% of the addresses translated can be done by getting values out of the TLB instead of out of the page table. This percentage is often called the hit ratio, and a positive match of an item with a value in the TLB is called a hit.