ABSTRACT

We live in a continuous world with infinitely many real numbers. However, a computer has only a finite number of bits. This requires an approximate representation. In the past, several different representations of real numbers have been suggested, but now the most widely used by far is the floating point representation. Each floating point representations has a base β (which is always assumed to be even) which is typically 2 (binary), 8 (octal), 10 (decimal), or 16 (hexadecimal), and a precision p which is the number of digits (of base β) held in a floating point number. For example, if β = 10 and p = 5, the number 0.1 is represented as 1.0000 × 10−1. On the other hand, if β = 2 and p = 20, the decimal number 0.1 cannot be represented exactly but is approximately 1.1001100110011001100 × 2−4. We can write the representation as ±d 0.d 1 ⋯ d p − 1 × β e , where d 0.d 1 ⋯ d p − 1 is called the significand (or mantissa) and has p digits and e is the exponent. If the leading digit d 0 is non-zero, the number is said to be normalized. More precisely ±d 0.d1 ⋯ d p − 1 × β e is the number ± ( d 0 + d 1 β − 1 + d 2 β − 2 + ⋯ + d p − 1 β − ( p − 1 ) ) β e , 0 ≤ d i < β . https://s3-euw1-ap-pe-df-pch-content-public-p.s3.eu-west-1.amazonaws.com/9781315370217/379fdee0-f6da-47eb-84a6-8ddc64039f18/content/eq1.tif"/>