Section 11.3
Approaches to Relocation

In Fig. 11.3.1 a small CSC-1 program that adds up elements of an array is shown. This version uses mnemonic opcodes but no symbolic labels. Instead, absolute addresses are given. The numbers to the left are only for reference, showing where in memory these instructions would be placed:

0:    LOD   1000
1:    ADD   1001
2:    SUB   1002
3:    STD   1003
4:    LOD   2000
5:    SHL
6:    SHL
7:    ADD   1003
8:    STD   1000
9:    (gap)
1000: NUM   47
1001: NUM   3
1002: NUM   149
1003: NUM   0
      (gap)
2000: NUM   99

Fig. 11.3.1: CSC-1 program to compute a = a+b-c+d*4

This program begins at address 0 and references memory locations by giving their absolute addresses. If this program had to be moved to another place in memory, many instructions would have to be modified. This may seem like an easy task for a computer, which never gets bored by mindless, repetitive tasks. However, the computer does not really see STD or ADD or NUM in memory; all it sees are binary numbers. Stored program computers cannot distinguish between instructions and data in memory. The only way the computer knows that the user intends 4 to be a SUB instruction is because the PC happened to contain the address of the word with 4 in it and the instruction decoder sees a 4 in the opcode field. When programs go awry and branch into data sections, as can easily happen, there is nothing to keep the computer from interpreting a sequence of data values as a program.

This blurring of data and instructions is not a mistake or a logical failure of computer designers. Rather, it is just the opposite, a brilliant insight that instructions are just a form of data. John von Neumann is credited with inventing the stored program concept, which is what this method of storing instructions as data in memory along with regular data, although there is now controversy that he only contributed to an idea that several people conceived simultaneously. Nevertheless he saw that a program could write a new program if instructions were but data and that perhaps programs could even learn by rewriting themselves while they were running.

Things didn't work out quite this smoothly. Today instructions and pure data are usually segregated in different parts of memory to ensure that an errant program does not jump into the middle of a number table and begin running the "program" there. However, artificial intelligence investigators still work on programs that learn by modifying themselves "on the fly."

Because of these two facts, namely that programs need to be moved around in memory and that computers cannot logically distinguish between instruction codes and pure data numbers, other relocation schemes had to be developed. They rely upon rewriting each memory address at run time as it is used, usually by adding a new base address to it to get the real address. The original address is sometimes called the virtual address while the calculated address is the real address. Another set of terms is logical address and physical address.

Surprisingly, an early approach to relocation in the nascent years of multiprogramming was to actually rewrite the addresses so the code could use absolute addresses. However, each program had to be rewritten for the slot or region of memory in which it would be placed, which meant that these regions had to be fixed and all the same size. IBM went through a series of operating systems based on these different methods.

Today's approach is to add every address that the user program generates to the contents of a register. This base address register contains the address of the beginning of the user program in real memory. In Fig. 11.2.1, program A starts at location 2500, so the base address register would contain 2500 when A is running. B starts at 3600, so any address it generates has to be added to 3600, which will be the contents of the base address register. In the ALU's hardware, there is only one base address register, necessitating that every time a user program is restarted this base address register has to be reloaded with the base address of the now currently active program. In this way, when program A starts running instructions at logical word 0, the computer really retrieves the instruction from physical word 2500. Compilers and linkers can generate code starting at address 0, oblivious to where it will actually be at run time.

The method of computing physical addresses, i.e. adding logical addresses to the contents of a base address register, as described above is the standard one for certain types of systems, mostly older and simpler one, where all of a user's program and data can reside in main memory and it is stored contiguously, i.e. all together. Later systems break up a user's program into chunks called pages and a much more complex method of translating addresses must be employed. We will look at that in the next chapter.