Floating Point Arithmetic
How do you represent floating point numbers?
- Floating point numbers are represented with a sign, mantissa, and exponent.
- Arithmetic operations must take into account these three components.
Sections of a floating point number
Steps for adding or subtracting floating point numbers
-
Ensure the numbers have the same exponent before performing arithmetic This might involve shifting the decimal point of one number and adjusting its exponent until both numbers have matching exponents. Example:
- Number A:
- Number B:
- Number A has an exponent of and B has an exponent of , we need to adjust B to have the same exponent as A
- This is achieved by moving the point one space to the left in Number B and increasing the exponent by 1
- Resulting in:
-
Perform the binary addition or subtraction on the mantissa
-
Ensure the result is in a normalised form
- The sum 10.010 exceeds the normal range for mantissa (1.0 to 1.111… in binary)
- To normalise it, we shift the mantissa one position to the right and increment the exponent by 1
- New Mantissa: 1.0010
- New Exponent: Increment the exponent from to
- The final result would be
-
Determine Sign
- For addition: If both numbers are positive or negative, the result takes the common sign
- If they have different signs, the result’s sign depends on the larger absolute value
- For subtraction: The sign is determined by the sign of the number you’re subtracting from and the result of the subtraction
Example addition
-
+
-
Align exponents: +
-
Add mantissa:
-
Normalise (if required) and determine the sign.
Example subtraction
-
-
-
Align exponents: -
-
Subtract mantissas:
-
Normalise (if required) and determine the sign.