Floating Point Numbers - Yr 2 Only
Floating point numbers are a method of dynamic binary numerical representation, allowing for a customizable range and accuracy using the same number of digits. Floating point consists of 2 parts, a mantissa which contains the binary value of the represented number, and the exponent which shifts the decimal point according to the size of the number. For a floating point number to be normalized and make the best use of available memory, it must begin with "0.1" for a positive number and "1.0" for a negative number. Any deviation with this could be a waste of bits, as the same number could be represented with a smaller mantissa.
For example, the number 32 could be represented by a floating point number with an 8 bit mantissa and a 5 bit exponent.
The mantissa would be as follows: 0.1000000
The exponent must shift the decimal point to shift 1 into the value of 32, it must therefore have a value of 6: 00110
Converting from Binary to Denary
- . Write down the mantissa, with the point inserted after the sign bit. (Miss off trailing 0’s)
- . If the mantissa is negative (sign bit = 1) then
- .find the twos complement of the mantissa
- . If the exponent is negative (sign bit = 1) then
- .find the twos complement of the exponent
- . Calculate the value of the exponent in denary
- . If the exponent is positive then
- .move the point in the mantissa to the right the number of places given by the exponent
- .else {if the exponent is negative}
- .move the point in the mantissa to the left the number of places given by the exponent
- . Convert the mantissa to denary to obtain the answer
Converting from Denary to Binary
- Convert the denary number to an unsigned binary number (the mantissa)
- Normalise this (move the point to in front of the leading 1)
- If the number is negative then
- represent it as its twos complement equivalent
- Count the number of places the point has been moved to give exponent
- If point moved left then
- exponent is positive
- else {if point moved right}
- exponent is negative
- Convert exponent to twos complement binary (6-bits in this case)
- Add 0’s to the mantissa if necessary (to give 10 bits in this case)