0️⃣

Real Number

Created time
TagsA2C16Notes

浮点数:不确定位数

Fixed-point representation

  1. al overall number of bits is chosen
  1. with a defined number of bits for the whole number part
  1. and the remainder for the fractional part

Floating point representation

  1. The form:

    Mantissa x 2^exponent

  1. examples

binary number to denary number:

Step1: 计算ex部分的二进制数,转换为十进制数。算出的十进制数为小数点位数n

Step2: mantissa的默认小数点位于左数第一数位和第二数位,将小数点向右移动n位

Step3: 小数点左侧为整数部分,即1,2,4,8…..,小数点右侧为小数部分:即2^1/2,2^1/4…..

Step4: 相加

Precision

Remember we have to decide:

  1. the total number of bits used.
  1. the number of bits used for matissa
  1. the number of bits used for exponent.

Note:

  1. 前后两个部分都是two’s compliment,前面表示整数部分,后面是exponential
  1. mantissa表示精度,exponent表示范围,
    mantissa分到的数位越多,所表示的数字越精确;exponential的数位越多,所表示的数字范围越大

  1. A binary representation is only an approximation of a real number.
  1. Increasing the number of bits for mantissa gives a better precision
  1. But leaves less number of bits for exponent, which reduces the range of numbers can be represented

Where should the point be?

When the mantissa has the implied binary point immediately following the sign bit, a smaller spacing is produced between the values that can be represented.

Normalization:


What if 8 bits are used, 4 for mantissa, 4 for exponent?
Normalization is to achieve maximum precision by using all the bits fully.
Normalization:

For a positive number, the bits in the mantissa are shifted left until the most significant bits are 0 followed by 1.
For a negative number, the bits in the mantissa are shifted left until the most significant bits are 1 followed by 0.

Move to normalize form: 左侧不能出现连续的0。可将整体移位:

  1. 小数点向右移动n位
  1. 在小数点后exponential 部分binary number 减去n

Conversion of representations

2 → 10:

10 → 2

yellow 10 bits for mantissa and 4 bits for exponent

Conversion of repersentations

Problems with using floating-point numbers

Most decimal fractions cannot be represented exactly as binary fractions, i.e., floating-point numbers are just approximation of decimal fractions.

  1. Overflow is a condition that occurs when an calculation produces a result that is greater than what given number of bits can store or represent
  1. Underflow is a condition that occurs when a calculation produces a result that is less than what given number of bits can stored or represent.