Floating Point Arithmetic

Floating point numbers are represented with a sign, mantissa, and exponent.
Arithmetic operations must take into account these three components.

sign-mantissa-exponent-1 Sections of a floating point number

Ensure the numbers have the same exponent before performing arithmetic This might involve shifting the decimal point of one number and adjusting its exponent until both numbers have matching exponents. Example:
- Number A: $1.101 \times 2^{3}$
- Number B: $1.010 \times 2^{2}$
- Number A has an exponent of $2^{3}$ and B has an exponent of $2^{2}$ , we need to adjust B to have the same exponent as A
- This is achieved by moving the point one space to the left in Number B and increasing the exponent by 1
- Resulting in: $0.101 \times 2^{3}$
Perform the binary addition or subtraction on the mantissa
- $1.10 1_{2} + 0.10 1_{2} = 10.01 0_{2}$
Ensure the result is in a normalised form
- The sum 10.010 exceeds the normal range for mantissa (1.0 to 1.111… in binary)
- To normalise it, we shift the mantissa one position to the right and increment the exponent by 1
- New Mantissa: 1.0010
- New Exponent: Increment the exponent from $2^{3}$ to $2^{4}$
- The final result would be $1.0010 \times 2^{4}$
Determine Sign
- For addition: If both numbers are positive or negative, the result takes the common sign
- If they have different signs, the result’s sign depends on the larger absolute value
- For subtraction: The sign is determined by the sign of the number you’re subtracting from and the result of the subtraction

CS Notes