Section 21.4
Real Addition

Once we know how to represent very large, very small and fractional numbers, we need to know how to manipulate them. Obviously, the standard operations, addition, subtraction, multiplication and division, must be accommodated. Some chips also include reciprocal (1/x) and square root instructions.

A CPU that can perform floating point operations must have special hardware circuits, since the standard adder cannot do the job, at least not without some help. Also, new kinds of errors crop up. We will consider only addition and multiplication herein since subtraction and division are very similar.

When adding two floating point numbers, they must have the same exponent. If they don't, then the numbers have to be modified so they do have the same exponent. This is done by using the larger of the two exponents as the exponent of the result, and adjusting the operand which has the smaller exponent. For example, suppose we wanted to add:

           0.56740 x 10⁵
     +     0.38400 x 10²
     --------------------

If we were to force the top number's exponent to the match the lower's, we would end up with a mantissa that is greater than 1 which is not representable in our system. That is, we would have to replace 0.56740&tiems;10⁵ with 567.40×10², getting a mantissa greater than 1. Instead we can change the bottom number by shifting its mantissa right while adding 1 to its exponent, until its exponent is equal to 5, ending up with 0.00038×10⁵. Note that we lose the 4 because it is impossible in our scheme to store more than 5 digits of mantissa. Even if we increase the number of digits, we will always have some limit.

This process of shifting the mantissa and adding to the exponent is called exponent adjustment, or decimal point alignment.

Our addition problem becomes:

           0.56740 x 10⁵
     +     0.00038 x 10⁵
     -------------------
           0.56778 x 10⁵

What we did was to line up the decimal points. Then we go ahead and add.

Sometimes we get a result whose mantissa is greater than 1.0:

           0.56740 x 10⁵
     +     0.48293 x 10⁵
     -------------------
           1.05033 x 10⁵

When this happens, it might appear that we have overflow, but the sum's mantissa can be shifted to the right and 1 added to the exponent:

          0.10503 x 10⁶

When we do this, we will lose some precision because the rightmost (least significant) digit of the mantissa will be lost. As always with floating point numbers, there is a trade-off between representation of magnitude and precision, since there is only a fixed number of bits to work with.

What if the exponent is already very large, like 99? Then we are in trouble since we cannot add 1 to the exponent without overflowing it. When this happens we signal overflow and stop the computation:

           0.56740 x 10⁹⁹                                OVERFLOW!
     +     0.48293 x 10⁹⁹
     --------------------
           1.05033 x 10⁹⁹            --->????          0.10503 x 10¹⁰⁰

In the notation we are using in this chapter, we can only store 2-digit exponents, so we would have to signal an overflow if it ran into 3 digits. The same sort of thing happens with underflow, when the two exponents are negative and near the smallest allow exponent.

Of course, in the setup we described earlier, we were using excess 50 notation, so 10⁹⁹ isn't even representable. 10⁵⁰ is actually 10⁰ and 10⁹⁹ is actually 10⁴⁹. Thus overflow occurs sooner.

What happens if the exponents are far apart? Then, when the adjustment is made prior to addition, the smaller will get turned into 0. For example:

           0.56740 x 10¹⁰
     +     0.48293 x 10²
     -------------------

0.48293×10² becomes 0.04829×10³, 0.00482×10⁴, 0.00048×10⁵, 0.00004×10⁶ and finally 0.00000×10⁷, but we need to shift until the exponent is 10. Clearly, the first number is so much larger than the second that it is almost like adding 0 to it in real life, and in the world of representable floating point numbers, it is exactly like adding 0 to it.

With subtraction, the exponents have to be brought into alignment and the mantissas are subtracted. If the first number's mantissa is smaller than the second's, then a negative number will occur. Floating point numbers use sign-magnitude form to store the sign of the entire number.