Section 9.1
The history of assembler programs

Real computers do not run C programs, or C++ or even Java. They run machine programs consisting of millions of simple machine instructions like the ones in the CSC-1 that were discussed in the last two chapters. The reason for this is that machine instructions are closer to the actual hardware in terms of what happens: which registers are copied into which memory slots, which control wires to the ALU and shifter are turned on, and so forth.

Programmers suffered writing machine language programs for about three years (from 1949, the birth of the Binac, to 1952) before they came up with a better system. In 1952, Captain Grace Murray Hopper of the U.S. Navy published a paper describing a program she worked on called a compiler. It was what we today call an assembler and a linker. Later, in 1957, the first modern compiler was written to translate FORTRAN programs to machine language.

An assembler is a program that accepts ASCII text as input and produces a machine language program. Instructions of the computer are given short, mnemonic (memory helping) codes like ADD, SUB, and JMP so that programmers do not have to remember their numeric equivalents. Even better, sections of memory can be given labels that are not only easier to remember but more descriptive and which can "move around" in memory as the program grows or shrinks.

We call this programming language assembly language, assembler, or assembler language. All terms mean the same thing.

Here's an example of an extremely simple assembler program that just adds two integers together and stores the result in another word of memory:

        LOD    A
        ADD    B
        STD    C
        HLT
A:      NUM    5
B:      NUM    7
C:      NUM    0

The computer of course cannot read this directly as characters. Actually, if the computer ran into a sequence of ASCII characters, it would try executing them as instructions because it would not "know" that the values were supposed to be printable characters instead of executable instructions. It's all just bits inside the computer memory!

Since the computer cannot execute these programs directly the assembler must translate the program into a file containing executable instructions. Such a file is called an object file or object program, and the original program is called the source file or source program. Here is the object file corresponding to the assembler program on page 1 as it would appear in the CSC-1's memory, both in binary and as decimal numbers:

    as binary...        as decimal...

0000 0000 0000 0100            4
0011 0000 0000 0101        12293
0001 0000 0000 0110         4102
1111 1001 0000 0000        63744
0000 0000 0000 0101            5
0000 0000 0000 0111            7
0000 0000 0000 0000            0

These values are found in words 0 through 6. Addresses are not shown. Word 0 is first and word 6 is last.

Just looking at the decimal values is very misleading since they do not show the breakdown in binary. Let's cut and the paste the original program so that it is next to the binary:

0000     0000     0000     0100  |      4   |       LOD     A
0011     0000     0000     0101  |  12293   |       ADD     B
0001     0000     0000     0110  |   4102   |       STD     C
1111     1001     0000     0000  |  63744   |       HLT
0000     0000     0000     0101         5     A:    NUM     5
0000     0000     0000     0111         7     B:    NUM     7
0000     0000     0000     0000         0     C:    NUM     0

The underlined bits represent the opcode of the four instructions. LOD has opcode 0 (see Chapter 7) and ADD has opcode 3. HLT is 248 and STD is 1. Following the opcodes in the first three lines are the operands, which refer to memory cells. Since the program was only four words long, the assembler decided to put the memory word assigned to variable A in location 4, right after the HLT instruction. Since the assembler programmer assigned the initial value of 5 to that, 5 (101) is put into this memory word. Similarly, B is assigned to location 5 and has an initial value of 7, whereas C, assigned to word 6, just has 0. In the LOD, ADD and STD instructions of the program, all references to A are replaced by 4, B is replaced by 5 and C by 6.

Since different computers have different sets of instructions and different ways of encoding operands, their assembler languages differ, too. In fact, there is little or no uniformity among the various vendors' computers with regards to assembly language, although there are many similarities. All assemblers have a way to load and store values from memory and all of them have ways of accessing the arithmetic circuits of the ALU, such as ADD, SUB, and so forth. But there are many differences, too, and the assemblers reflect this. Thus, if you know one assembler language for one computer, you may have to invest a lot of time to learn the assembler language of a different computer, although most of the concepts will transfer.

Within a certain vendor's line of computers, the machine code and the assembler languages tend to be identical. For example, the IBM 360 series, which was the world's first line of compatible computers, introduced in 1965, used a common machine language and assembler language. Since the models were upwardly compatible, if a company bought a cheap machine like an IBM 360/30 and wrote programs for it, those same programs would run without any modifications on a bigger model like am IBM 360/60, although the converse was not true. Some of the larger computers added extra machine instructions to improve performance or handle special tasks, which allowed IBM to charge more for them. Most computer manufacturers do something of the same thing even today.

Have no illusions about assembler! Though the elements of this programming look easy, almost too simplistic, programming in assembler is very hard and often not much fun. Everything is done at such a low level that the amount of detail can be overwhelming. Programming in C or C++ is hard enough, but to expand the amount that has to be remembered by a factor of ten makes assembler almost impossible, especially when the programs get large. This is of course why high level languages like C and FORTRAN were invented.

The trend today is away from assembler. Java, a programming language that started out as OAK around 1989 and meant only for controlling toasters and cellular phones, quickly found a niche on the World Wide Web. Touting the write once, run anywhere philosophy, Java's developers claim that it may actually replace operating systems such as Windows 95.

But there is still a need today for assembler programmers because certain time-critical parts of programs, like operating systems and embedded control programs, need to be written as carefully as possible and made to run as fast as possible, and many compilers simply cannot do a good enough job. The Unix (or Linux) kernel has roughly one thousand lines of assembler to do a few tasks that cannot be expressed in C. Those lucky few who can and do enjoy writing assembler code find themselves very employable.

In this course, we will look at a few assembler programs simply to get the flavor of this style of interacting with the computer and to better understand how complex programs that do the fancy tasks we want done today can be written for a glob of wires and registers.