Section 9.5
Subroutines

Subroutines are one of the major structuring ideas in computer science. The ability to package up a series of instructinos and give a single label or name to the package ties in with how the human mind manages complexity in the world. Psychologists call this "chunking" whereby a single name stands in for an ensemble of related ideas or parts, thereby taxing our limited-capacity short term memories much less than having to keep many different things in mind at once.

It is assumed that the reader knows enough about high-level programming to feel familiar with the idea of a subroutine and how it works. As a matter of terminology, subroutines masquerade under many different names: procedures, functions, methods, subs, among others. The next program illustrates a simple subprogram, one that has two parameters. Some languages use the term procedure, while others, including C and C++ and Java, use function. The term subroutine is more common in assembler language. These names are generally equivalent, denoting a chunk of program code that can be used over and over again when a main program (or another subprogram) calls it.

More precise language stipulates that function is used to denote a subroutine that returns a value, since mathematical functions are similar. Inputs go into a function and an output comes out. Nothing else is changed. Procedure is used to denote a subroutine that does not return a value, but affects the outside world in some other way, such as doing input or output, changing global variables or even the state of the computer. The following C program has a main section and one function, called monus. Monus is a simplified form of subtraction that does not return negative numbers if the first argument is smaller than the second. In that case, it returns 0.

int monus(int a, int b)
{
     int result;
     if (a >= b)
          result = a-b;
     else
          result = 0;
     return result;
}

int main()
{
     int x = 8;
     int y = 5;
     int answer = monus(x, y);
     print(answer);
}

Here's the equivalent CSC-1 assembler code, which is a complete program ready to run in CASM:

MAIN:
;  x and y are already declared and set at the end
;  call the subroutine and leave the result in A register
             LOD   X
             STD   MONUS_PARM1
             LOD   Y
             STD   MONUS_PARM2
             CAL   MONUS
             LOD   MONUS_RETVAL           ; leave it in Acc
             STD   4095             ; print answer
             HLT

; variables for main
X:           NUM   8
Y:           NUM   5
ANSWER:      NUM   0

MONUS:
MONUS_IF1:   LOD   MONUS_PARM1
             SUB   MONUS_PARM2
             JN    MONUS_ELSE1
             STD   MONUS_RETVAL
             JMP   MONUS_ENDIF1
MONUS_ELSE1: LDI    0
             STD   MONUS_RETVAL
MONUS_ENDIF1:NOP
             RET

; variables for monus
MONUS_PARM1: NUM   0
MONUS_PARM2: NUM   0
MONUS_RETVAL:NUM   0

The CAL and RET instructions implement the actual transfer of control to and from the subroutine. There is a fundamental asymmetry here because CAL specifies a target address but RET doesn't. This is because we have to know which subroutine we are calling, but the subroutine returns to its calling place based upon where it was called. The same subroutine may be called many times from many different parts of the program, so it would be impossible to encode these in the return statement.

The CSC-1 saves the return address, which is the address of the instruction directly following the CAL instruction, in the S register. (Refer back to the RTL for CAL in Chapter 8.) When the RET instruction is executed, the computer merely copies the value in S into the PC register, thereby effecting a jump to the instruction that followed the CAL. This works correctly because in the fetch/decode/execute cycle, PC is incremented after fetching the CAL instruction, so by the time CAL gets around to copying PC into S, the value in PC is not the address of the CAL instruction, but rather the one following.

The return statement of C can do two things: it can signal the value that is to be returned from this function, and it can actually cause control to jump back to the caller. The former is accomplished in assembler by storing a value into the return variable slot. The latter is accomplished by executing the RET instruction.

There are three parts to a subroutine invocation in assembler if it is a function:

1.  setup of parameters and local variables
2.  actual CAL instruction
3.  handling the return value

Parameters are filled in with values from the caller. There are distinct places in memory for the temporary variables that the subroutine works on. These are labeled with MONUS_ in this code. For example, MONUS_PARM1. The C program uses more natural variable names like x and y but these can't be confused with an x or y in another block of code due to C's scoping rules. However, in assembler all variables are in the same block and all labels and names are visible everywhere. Some method of distinguishing them has to be used. Modern assemblers use the run-time stack to do this but in our simple example, we take another approach where we prefix the names with the name of the subroutine they belong to. This notation, MONUS_PARM1, is similar to the familiar dot notation used in most higher level languages such as Java: MONUS.PARM1.

The CASM assembler language permits you to give more than one label to the same word of memory. Hence, you could use a generic label such as MONUS_PARM1 as well as the more naturalistic MONUS_X. Here's how that would look:

MONUS_PARM1:
MONUS_X:      NUM   0      ; both labels refer to the same word

But of course, you shouldn't just label it X: because there are likely to be several X's in a program and assembler has no way of disambiguating them. You must do all the work yourself by prefixing them with their subroutine names.

If the subroutine is a function, then it has a return value. Many languages put the return value in a prominent place so that it can be part of an ongoing expression computation. In the above example we put the return value in an explicit variable: MONUS_RETVAL However, it can be placed directly in the A register so that the next step would be to compute with it. For example:

   x = min(a,b) + y;

In this statement, min(a,b) is computed and the result left in the A register, so that the next step is to add y's value to it, before storing the final expression's value into x.

If the subroutine has local variables (our example doesn't), these have to be put back to their initial values to avoid surprises. Programmers are always admonished to assign initial values to their variables and in older compilers this wasn't done automatically so it was possible to get "garbage values" in assembler or FORTRAN variables upon startup. Newer languages always initialize or warn you (as does Java). In assembler, we have to handle every low level detail ourselves! So the following fragment of subroutine must be code carefully:

int somefunc (int x)
{
     int sum = 0;
     int count = 0;
...
}

In this example, sum and count are local variables and must be set to 0 every time somefunc is called. Here's the wrong way to encode the C:

SOMEFUNC:
             ; start computing!
             RET
SUM:         NUM  0
COUNT:       NUM  0

The reason is that SUM and COUNT will be changed by every call to SOMEFUNC. It is only upon the first load that they are correctly set to 0. They must be set to 0 every time SOMEFUNC is called. Here's the right way to to do it:

SOMEFUNC:
             LDI   0
             STD  SUM              ; SUM = 0
             STD  COUNT            ; COUNT = 0
             ; start computing!
             RET
SUM:         NUM  0
COUNT:       NUM  0

There are no safeguards in place in assembler for global variables. Every variable is essentially global! This is why assembler programmers have to be very careful and follow accepted practices to avoid common mistakes.

There are many other issues, such as whether functions can be nested and whether they can be recursive. Throughout the decades, many different hardware architectures have been tried to implement resursive subprograms elegantly, although one has prevailed, the stack approach.

A region of memory called the stack is set aside to hold parameters and return addresses of functions. As functions are called, new regions are allocated on the top of this stack, and when functions return, these regions, called activation records or frames, are popped off. Two new registers keep track of which region is being used:

SP	stack pointer	points to the current top of the stack
FP	frame pointer	points to the current activation record on the top of the stack

Further discussion of these strategies is best left to a programming languages course. Suffice it to note that compilers, again, come to the rescue, hiding so many details of the underlying machine and its rude, crude machine language system from the programmer. A procedure subroutine in CASM is identical to a function except that we do not worry about the return value.