Section 8.4
An instruction in micro-detail

Now we are in a position to see the actual sequence of micro-events during a CSC-1 machine instruction. These micro-events are nothing more than control wires getting turned on and off by the hardware DFA, yet they direct the flow of data and the selection of processes to be done, thereby implementing what seems to us to be a single but sometimes complex instruction. First, we will examine ADD.

The ADD instruction has the following operand structure:

ADD     x

or in the machine language of 1's of 0's:

0011 xxxxxxxxxxxx

The x value is treated as the address of the second operand of the addition operation, which will come from memory. If x=000000001101, which is 13 in binary, then memory word 13 (all 16 bits) will be retrieved from memory and put into the TMP before the adder is allowed to get started. Since no shifting is desired, the shifter is told to pass through the output of the adder unchanged, which is then redirected back into the A register. This is expressed much more succinctly in the RTL for ADD:

A <- A + m[x]

Following is the sequence of control point assignments that accomplishes this. Each line of the following represents a distinct time step, and all the control points on the same line are turned on (or off) simultaneously. In those cases where a group of wires have the same names, such as the MAR-MUX wires (both 1 and 0), their individual names have been elided so as to save space. Thus, MAR-MUX=10 is shorthand for MAR-MUX1=1 and MAR-MUX0=0. Assume that all control wires contain 0 before step 1.

      1.     MAR-MUX=01
      2.     MAR-LD=1
      3.     MAR-LD=0;  MA=1;  WR=0;  MBR-MUX=10
      4.     MBR-LD=1
      5.     MBR-LD=0;  MA=0;  PC-INCR=1;  IR-LD=1
      6.     IR-LD=0;  PC-LD=1
      7.     PC-INCR=0;  PC-LD=0
      8.     MAR-MUX=00
      9.     MAR-LD=1
     10.     MAR-LD=0;  MA=1;  WR=0;  MBR-MUX=10
     11.     MBR-LD=1
     12.     MBR-LD=0;  MA=0
     13.     TMP-LD=1
     14.     TMP-LD=0;  F=101;  SH=00
     15.     A-MUX=00
     16.     A-LD=1
     17.     A-LD=0

Here is how the four stages of the instruction cycle line up against the above sequence:

Lines 1 through 7 implement the instruction fetch stage.
The instruction decode stage is implicitly done after line 6.
Lines 8 through 13 implement the operand fetch stage.
Line 14 implements the execute stage, telling the adder which of its several functions to perform.
Some architectures make explicit a storeback stage where the result of the computation is put into its destination. In the CSC-1, the values often go back into the A register, so lines 15 through 17 could be thought of as the storeback stage.

Now we will go through the 17-line sequence in painful detail. Take a deep breath before continuing to read. It might be handy to have the block diagram of the CSC-1 (Fig. 8.1.1) in front of you so you can identify the components easily as we progress. (This material also closely follows the detailed look at the ADD instruction in Section 7.6.)

In steps 1-5, the computer is fetching the next instruction from memory and putting it into the MBR. It does this by copying the address from the PC register into the MAR register by setting the appropriate code for MAR-MUX (01₂=1) which routes the lower 12 bits of PC into the MAR.

These actions are accomplished in minute steps as follows. Step 1 puts 1 onto the MAR-MUX so that output of the PC register can be copied into the MAR. This happens when MAR-LD flips from 0 to 1 back to 0, in steps 2 and 3. Then the memory is "turned on" by setting MA=1, while also giving the "read" command to memory by setting WR=0. The computer sets up the MBR's MUX to accept the output of memory by selecting port 2 (10₂=2) of MBR-MUX in step 3. This can happen at the same time as the other settings. Indeed, step 3 may take some time due to the fact that memory accesses might be slow. If the memory is too slow, then time step 3 is repeated, keeping all the wires with the same value.

When the new word is ready from the memory, it is latched into the MBR register by setting MBR-LD to 1, and then back to 0, as shown in step 4 and 5. In step 5 memory is "turned off" by setting MA back to 0.

Also in step 5, we add +1 to the binary number in the PC register by setting PC-INCR to 1. This causes the PC to point to the next memory address for its next fetch, thereby implementing sequential execution of a list of commands kept in memory. If the instruction that is yet to be executed is a jump, it will merely overwrite PC, so it doesn't hurt to add 1 to PC anyway. If that old value is completely overwritten by a jump instruction, so what.

Step 5 also begins the loading of the IR register, whose input is hooked to MBR's output. Step 6 turns off IR-LD, and also strobes PC-LD, allowing the new incremented value to be put back into the PC. The incrementer circuit of the PC register merely computes 1 plus the current value of the PC; it doesn't actually store it back into PC until the PC-LD wire goes high temporarily. Otherwise, the value of PC might be incremented several times.

Step 7 turns off PC-INCR and PC-LD, thereby ending the process of fetching the next instruction and pointing PC to the instruction after that.

Step 8 begins the operand fetch stage. First MAR-MUX is set to 0 so that the output of the IR register is copied into the MAR. Remember that the low 12 bits of the IR register is the operand address. Next MAR-LD is strobed to latch this new value in steps 9 and 10. Memory is turned back on in Step 10 and the MBR is set to accept the memory's output, which is latched into the MBR's flip-flops by Steps 11 and 12.

Once the operand is in the MBR, it must be copied into the TMP register, which is done by strobing TMP-LD in steps 13 and 14. TMP's input comes only from the MBR, reinforcing the notion that TMP might be omitted entirely and the MBR could be used as the second operand. If this were so, Step 13 could be omitted entirely and F=101 and SH=00 could be added to Step 12 thereby saving some time. Nevertheless, we will stay with the diagram as we have it.

In Step 14, the ALU operation is set to add (F=101 does this) and the shifter is told to pass through the result of the ALU unchanged (SH=00 does this).

Step 15 tells the ALU that its next input will be from port 0 of the AMUX, which is connected to the output of the shifter. Steps 16 and 17 strobe this new value back into the A register, thereby effecting the storeback stage.

How did the computer know that this sequence of 17 steps was what it was supposed to do? Every CSC-1 instruction has at least the instruction fetch stage, so steps 1-7 are fixed. But once the instruction has been decoded (after Step 6), the hardware DFA ANDs the instruction wire for ADD with the other time steps and sets the control points appropriately. In short, the fact that the instruction in the IR register as of the end of Step 5 is an ADD instruction determines that Steps 8 through 17 will be done.

There is a lot of parallelism at the lowest hardware level and signals can flow between different components at the same time as long as they do not interfere with each other. Thus, we could conflate the above 17 steps down to the following 13 for the ADD instruction if we utilized more parallelism. The one tricky thing is that the MAR-MUX is set up to receive the output of PC at the end of the previous instruction, thus, when the computer first fires up, we must assume that MAR-MUX=01 before step 1.

      1.     MAR-LD=1
      2.     MAR-LD=0;  MA=1;  WR=0;  MBR-MUX=10
      3.     MBR-LD=1
      4.     MBR-LD=0;  MA=0;  PC-INCR=1;  IR-LD=1
      5.     IR-LD=0;  PC-LD=1
      6.     PC-INCR=0;  PC-LD=0;  MAR-MUX=00
      7.     MAR-LD=1
      8.     MAR-LD=0;  MA=1;  WR=0;  MBR-MUX=10
      9.     MBR-LD=1
     10.     MBR-LD=0;  MA=0;  TMP-LD=1
     11.     TMP-LD=0;  F=101;  SH=00;  A-MUX=00
     12.     A-LD=1
     13.     A-LD=0;  MAR-MUX=01

Again, if the TMP register were deleted and the MBR used for the second operand, another step could be saved in the following way:

      9.     MBR-LD=1
     10.     MBR-LD=0;  MA=0; F=101;  SH=00;  A-MUX=00
     11.     A-LD=1
     12.     A-LD=0;  MAR-MUX=01

It is precisely this kind of intricate thinking that chip designers go through when they strive to build the fastest possible computer.