IEEE 754-1985

<h2 id="representation-of-numbers">Representation of numbers</h2>

<p>Floating-point numbers in IEEE 754 format consist of three fields: a <a href="/facts/Sign_bit/hLvapzSs">sign bit</a>, a <a href="/facts/Exponent_bias/u5MIKgKP">biased exponent</a>, and a fraction. The following example illustrates the meaning of each.
</p><p>The decimal number 0.1562510 represented in binary is 0.001012 (that is, 1/8 + 1/32). (Subscripts indicate the number <a href="/facts/Radix/J5RGWRJF">base</a>.) Analogous to <a href="/facts/Scientific_notation/YgyYEmq8">scientific notation</a>, where numbers are written to have a single non-zero digit to the left of the decimal point, we rewrite this number so it has a single 1 bit to the left of the "binary point". We simply multiply by the appropriate power of 2 to compensate for shifting the bits left by three positions:
</p>

0.00101
          
            2
          
        
        =
        
          1.01
          
            2
          
        
        ×
        
          2
          
            −
            3
          
        
      
    
    {\displaystyle 0.00101_{2}=1.01_{2}\times 2^{-3}}

<p>Now we can read off the fraction and the exponent: the fraction is .012 and the exponent is −3.
</p><p>As illustrated in the pictures, the three fields in the IEEE 754 representation of this number are:
</p>
<i>sign</i> = 0, because the number is positive. (1 indicates negative.)
<i>biased exponent</i> = −3 + the "bias". In single precision, the bias is 127, so in this example the biased exponent is 124; in double precision, the bias is 1023, so the biased exponent in this example is 1020.
<i>fraction</i> = .01000…2.
<p>IEEE 754 adds a <a href="/facts/Offset_binary/4B81wxxe">bias</a> to the exponent so that numbers can in many cases be compared conveniently by the same hardware that compares signed <a href="/facts/2%2527s-complement/xhjYpnsc">2's-complement</a> integers. Using a biased exponent, the lesser of two positive floating-point numbers will come out "less than" the greater following the same ordering as for <a href="/facts/Sign_and_magnitude/cE0sEMau">sign and magnitude</a> integers. If two floating-point numbers have different signs, the sign-and-magnitude comparison also works with biased exponents. However, if both biased-exponent floating-point numbers are negative, then the ordering must be reversed. If the exponent were represented as, say, a 2's-complement number, comparison to see which of two numbers is greater would not be as convenient.
</p><p>The leading 1 bit is omitted since all numbers except zero start with a leading 1; the leading 1 is implicit and doesn't actually need to be stored which gives an extra bit of precision for "free."
</p>
<h3>Zero</h3>
<p>The number zero is represented specially:
</p>
<i>sign</i> = 0 for <a href="/facts/Signed_zero/uX4d8zzV">positive zero</a>, 1 for <a href="/facts/Signed_zero/uX4d8zzV">negative zero</a>.
<i>biased exponent</i> = 0.
<i>fraction</i> = 0.
<h3>Denormalized numbers</h3>
<p>The number representations described above are called <i>normalized,</i> meaning that the implicit leading binary digit is a 1. To reduce the loss of precision when an <a href="/facts/Arithmetic_underflow/7FM8yHvs">underflow</a> occurs, IEEE 754 includes the ability to represent fractions smaller than are possible in the normalized representation, by making the implicit leading digit a 0. Such numbers are called <a href="/facts/Denormal_numbers/6qB24xrS">denormal</a>. They don't include as many <a href="/facts/Significant_digits/RUfYtpgo">significant digits</a> as a normalized number, but they enable a gradual loss of precision when the result of an <a href="/facts/Floating-point_arithmetic/eIckahxe">operation</a> is not exactly zero but is too close to zero to be represented by a normalized number.
</p><p>A denormal number is represented with a biased exponent of all 0 bits, which represents an exponent of −126 in single precision (not −127), or −1022 in double precision (not −1023).<a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a> In contrast, the smallest biased exponent representing a normal number is 1 (see examples below).
</p>
<h2 id="representation-of-non-numbers">Representation of non-numbers</h2>
<p>The biased-exponent field is filled with all 1 bits to indicate either infinity or an invalid result of a computation.
</p>
<h3>Positive and negative infinity</h3>
<p><a href="/facts/Extended_real_line/65Gs7QS0">Positive and negative infinity</a> are represented thus:
</p>
<i>sign</i> = 0 for positive infinity, 1 for negative infinity.
<i>biased exponent</i> = all 1 bits.
<i>fraction</i> = all 0 bits.
<h3>NaN</h3>
<p>Some operations of <a href="/facts/Floating-point_arithmetic/eIckahxe">floating-point arithmetic</a> are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point <i>exception.</i> An exceptional result is represented by a special code called a NaN, for "<a href="/facts/Not_a_Number/DNnYJ4FU">Not a Number</a>". All NaNs in IEEE 754-1985 have this format:
</p>
<i>sign</i> = either 0 or 1.
<i>biased exponent</i> = all 1 bits.
<i>fraction</i> = anything except all 0 bits (since all 0 bits represents infinity).
<h2 id="range-and-precision">Range and precision</h2>

<p>Precision is defined as the minimum difference between two successive mantissa representations; thus it is a function only in the mantissa; while the gap is defined as the difference between two successive numbers.<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a>
</p>
<h3>Single precision</h3>
<p><a href="/facts/Single-precision/e6Fez7BM">Single-precision</a> numbers occupy 32 bits. In single precision:
</p>
<ul><li>The positive and negative numbers closest to zero (represented by the denormalized value with all 0s in the exponent field and the binary value 1 in the fraction field) are
±2−23 × 2−126 ≈ ±1.40130×10−45</li>
<li>The positive and negative normalized numbers closest to zero (represented with the binary value 1 in the exponent field and 0 in the fraction field) are
±1 × 2−126 ≈ ±1.17549×10−38</li>
<li>The finite positive and finite negative numbers furthest from zero (represented by the value with 254 in the exponent field and all 1s in the fraction field) are
±(2−2−23) × 2127<a class="footnote-ref" id="fnref:6" href="#fn:6"><sup>6</sup></a> ≈ ±3.40282×1038</li></ul>
<p>Some example range and gap values for given exponents in single precision:
</p>
<table><tbody><tr><th>Actual Exponent (unbiased)</th><th>Exp (biased)</th><th>Minimum</th><th>Maximum</th><th>Gap</th></tr><tr><td>−1</td><td>126</td><td>0.5</td><td>≈ 0.999999940395</td><td>≈ 5.96046e-8</td></tr><tr><td>0</td><td>127</td><td>1</td><td>≈ 1.999999880791</td><td>≈ 1.19209e-7</td></tr><tr><td>1</td><td>128</td><td>2</td><td>≈ 3.999999761581</td><td>≈ 2.38419e-7</td></tr><tr><td>2</td><td>129</td><td>4</td><td>≈ 7.999999523163</td><td>≈ 4.76837e-7</td></tr><tr><td>10</td><td>137</td><td>1024</td><td>≈ 2047.999877930</td><td>≈ 1.22070e-4</td></tr><tr><td>11</td><td>138</td><td>2048</td><td>≈ 4095.999755859</td><td>≈ 2.44141e-4</td></tr><tr><td>23</td><td>150</td><td>8388608</td><td>16777215</td><td>1</td></tr><tr><td>24</td><td>151</td><td>16777216</td><td>33554430</td><td>2</td></tr><tr><td>127</td><td>254</td><td>≈ 1.70141e38</td><td>≈ 3.40282e38</td><td>≈ 2.02824e31</td></tr></tbody></table>
<p>As an example, 16,777,217 cannot be encoded as a 32-bit float as it will be rounded to 16,777,216. However, all integers within the representable range that are a power of 2 can be stored in a 32-bit float without rounding.
</p>
<h3>Double precision</h3>
<p><a href="/facts/Double-precision/JYyXXYFM">Double-precision</a> numbers occupy 64 bits. In double precision:
</p>
<ul><li>The positive and negative numbers closest to zero (represented by the denormalized value with all 0s in the Exp field and the binary value 1 in the Fraction field) are
±2−52 × 2−1022 ≈ ±4.94066×10−324</li>
<li>The positive and negative normalized numbers closest to zero (represented with the binary value 1 in the Exp field and 0 in the fraction field) are
±1 × 2−1022 ≈ ±2.22507×10−308</li>
<li>The finite positive and finite negative numbers furthest from zero (represented by the value with 2046 in the Exp field and all 1s in the fraction field) are
±(2−2−52) × 21023<a class="footnote-ref" id="fnref:7" href="#fn:7"><sup>7</sup></a> ≈ ±1.79769×10308</li></ul>
<p>Some example range and gap values for given exponents in double precision:
</p>
<table><tbody><tr><th>Actual Exponent (unbiased)</th><th>Exp (biased)</th><th>Minimum</th><th>Maximum</th><th>Gap</th></tr><tr><td>−1</td><td>1022</td><td>0.5</td><td>≈ 0.999999999999999888978</td><td>≈ 1.11022e-16</td></tr><tr><td>0</td><td>1023</td><td>1</td><td>≈ 1.999999999999999777955</td><td>≈ 2.22045e-16</td></tr><tr><td>1</td><td>1024</td><td>2</td><td>≈ 3.999999999999999555911</td><td>≈ 4.44089e-16</td></tr><tr><td>2</td><td>1025</td><td>4</td><td>≈ 7.999999999999999111822</td><td>≈ 8.88178e-16</td></tr><tr><td>10</td><td>1033</td><td>1024</td><td>≈ 2047.999999999999772626</td><td>≈ 2.27374e-13</td></tr><tr><td>11</td><td>1034</td><td>2048</td><td>≈ 4095.999999999999545253</td><td>≈ 4.54747e-13</td></tr><tr><td>52</td><td>1075</td><td>4503599627370496</td><td>9007199254740991</td><td>1</td></tr><tr><td>53</td><td>1076</td><td>9007199254740992</td><td>18014398509481982</td><td>2</td></tr><tr><td>1023</td><td>2046</td><td>≈ 8.98847e307</td><td>≈ 1.79769e308</td><td>≈ 1.99584e292</td></tr></tbody></table>
<h3>Extended formats</h3>
<p>The standard also recommends extended format(s) to be used to perform internal computations at a higher precision than that required for the final result, to minimise round-off errors: the standard only specifies minimum precision and exponent requirements for such formats. The <a href="/facts/X87/y8X9HnLL">x87</a> <a href="/facts/Extended_precision/CVpdQYeC">80-bit extended format</a> is the most commonly implemented extended format that meets these requirements.
</p>
<h2 id="examples">Examples</h2>
<p>Here are some examples of single-precision IEEE 754 representations:
</p>
<table><tbody><tr><th>Type</th><th>Sign</th><th>Actual Exponent</th><th>Exp (biased)</th><th>Exponent field</th><th>Fraction field</th><th>Value</th></tr><tr><td>Zero</td><td>0</td><td>−126</td><td>0</td><td>0000 0000</td><td>000 0000 0000 0000 0000 0000</td><td>0.0</td></tr><tr><td><a href="/facts/Negative_zero/uX4d8zzV">Negative zero</a></td><td>1</td><td>−126</td><td>0</td><td>0000 0000</td><td>000 0000 0000 0000 0000 0000</td><td>−0.0</td></tr><tr><td>One</td><td>0</td><td>0</td><td>127</td><td>0111 1111</td><td>000 0000 0000 0000 0000 0000</td><td>1.0</td></tr><tr><td>Minus One</td><td>1</td><td>0</td><td>127</td><td>0111 1111</td><td>000 0000 0000 0000 0000 0000</td><td>−1.0</td></tr><tr><td>Smallest <a href="/facts/Denormal_number/6qB24xrS">denormalized number</a></td><td>*</td><td>−126</td><td>0</td><td>0000 0000</td><td>000 0000 0000 0000 0000 0001</td><td>±2−23 × 2−126 = ±2−149 ≈ ±1.4×10−45</td></tr><tr><td>"Middle" denormalized number</td><td>*</td><td>−126</td><td>0</td><td>0000 0000</td><td>100 0000 0000 0000 0000 0000</td><td>±2−1 × 2−126 = ±2−127 ≈ ±5.88×10−39</td></tr><tr><td>Largest denormalized number</td><td>*</td><td>−126</td><td>0</td><td>0000 0000</td><td>111 1111 1111 1111 1111 1111</td><td>±(1−2−23) × 2−126 ≈ ±1.18×10−38</td></tr><tr><td>Smallest normalized number</td><td>*</td><td>−126</td><td>1</td><td>0000 0001</td><td>000 0000 0000 0000 0000 0000</td><td>±2−126 ≈ ±1.18×10−38</td></tr><tr><td>Largest normalized number</td><td>*</td><td>127</td><td>254</td><td>1111 1110</td><td>111 1111 1111 1111 1111 1111</td><td>±(2−2−23) × 2127 ≈ ±3.4×1038</td></tr><tr><td>Positive infinity</td><td>0</td><td>128</td><td>255</td><td>1111 1111</td><td>000 0000 0000 0000 0000 0000</td><td>+∞</td></tr><tr><td>Negative infinity</td><td>1</td><td>128</td><td>255</td><td>1111 1111</td><td>000 0000 0000 0000 0000 0000</td><td>−∞</td></tr><tr><td><a href="/facts/Not_a_number/DNnYJ4FU">Not a number</a></td><td>*</td><td>128</td><td>255</td><td>1111 1111</td><td>non zero</td><td>NaN</td></tr><tr><td colspan="7">*  Sign bit can be either 0 or 1 .</td></tr></tbody></table>
<h2 id="comparing-floating-point-numbers">Comparing floating-point numbers</h2>
<p>Every possible bit combination is either a NaN or a number with a unique value in the <a href="/facts/Affinely_extended_real_number_system/65Gs7QS0">affinely extended real number system</a> with its associated order, except for the two combinations of bits for negative zero and positive zero, which sometimes require special attention (see below).  The binary representation has the special property that, excluding NaNs, any two numbers can be compared as <a href="/facts/Sign_and_magnitude/cE0sEMau">sign and magnitude</a> integers (<a href="/facts/Endianness/wbrQnyu3">endianness</a> issues apply).  When comparing as <a href="/facts/2%2527s-complement/xhjYpnsc">2's-complement</a> integers:  If the sign bits differ, the negative number precedes the positive number, so 2's complement gives the correct result (except that negative zero and positive zero should be considered equal).  If both values are positive, the 2's complement comparison again gives the correct result.  Otherwise (two negative numbers), the correct FP ordering is the opposite of the 2's complement ordering.
</p><p>Rounding errors inherent to floating point calculations may limit the use of comparisons for checking the exact equality of results.  Choosing an acceptable range is a complex topic.  A common technique is to use a comparison epsilon value to perform approximate comparisons.<a class="footnote-ref" id="fnref:8" href="#fn:8"><sup>8</sup></a>  Depending on how lenient the comparisons are, common values include 1e-6 or 1e-5 for single-precision, and 1e-14 for double-precision.<a class="footnote-ref" id="fnref:9" href="#fn:9"><sup>9</sup></a><a class="footnote-ref" id="fnref:10" href="#fn:10"><sup>10</sup></a>  Another common technique is ULP, which checks what the difference is in the last place digits, effectively checking how many steps away the two values are.<a class="footnote-ref" id="fnref:11" href="#fn:11"><sup>11</sup></a>
</p><p>Although negative zero and positive zero are generally considered equal for comparison purposes, some <a href="/facts/Programming_language/cWTYbgWa">programming language</a> <a href="/facts/Relational_operator/nTXrhxN2">relational operators</a> and similar constructs treat them as distinct. According to the <a href="/facts/Java_(programming_language)/9ScgFyAL">Java</a> Language Specification,<a class="footnote-ref" id="fnref:12" href="#fn:12"><sup>12</sup></a> comparison and equality operators treat them as equal, but Math.min() and Math.max() distinguish them (officially starting with Java version 1.1 but actually with 1.1.1), as do the comparison methods equals(), compareTo() and even compare() of classes Float and Double.
</p>
<h2 id="rounding-floating-point-numbers">Rounding floating-point numbers</h2>
<p>The IEEE standard has four different rounding modes; the first is the default; the others are called <i><a href="/facts/Directed_rounding/rCLYXN00">directed roundings</a></i>.
</p>
<ul><li>Round to Nearest – rounds to the nearest value; if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit, which means it is rounded up 50% of the time (in <a href="/facts/IEEE_754-2008/gqqJPoYC">IEEE 754-2008</a> this mode is called <i>roundTiesToEven</i> to distinguish it from another round-to-nearest mode)</li>
<li>Round toward 0 – directed rounding towards zero</li>
<li>Round toward +∞ – directed rounding towards positive infinity</li>
<li>Round toward −∞ – directed rounding towards negative infinity.</li></ul>
<h2 id="extending-the-real-numbers">Extending the real numbers</h2>
<p>The IEEE standard employs (and extends) the <a href="/facts/Affinely_extended_real_number_system/65Gs7QS0">affinely extended real number system</a>, with separate positive and negative infinities. During drafting, there was a proposal for the standard to incorporate the <a href="/facts/Projectively_extended_real_number_system/9epfpLw2">projectively extended real number system</a>, with a single unsigned infinity, by providing programmers with a mode selection option. In the interest of reducing the complexity of the final standard, the projective mode was dropped, however. The <a href="/facts/Intel_8087/YGDyGKAS">Intel 8087</a> and <a href="/facts/Intel_80287/y8X9HnLL">Intel 80287</a> floating point co-processors both support this projective mode.<a class="footnote-ref" id="fnref:13" href="#fn:13"><sup>13</sup></a><a class="footnote-ref" id="fnref:14" href="#fn:14"><sup>14</sup></a><a class="footnote-ref" id="fnref:15" href="#fn:15"><sup>15</sup></a>
</p>
<h2 id="functions-and-predicates">Functions and predicates</h2>
<h3>Standard operations</h3>
<p>The following functions must be provided:
</p>
<ul><li><a href="/facts/Arithmetic_operations/hvHwkbMV">Add, subtract, multiply, divide</a></li>
<li><a href="/facts/Square_root/AfrzfBdQ">Square root</a></li>
<li>Floating point remainder. This is not like a normal <a href="/facts/Modulo_operation/V5xPUluD">modulo operation</a>, it can be negative for two positive numbers. It returns the exact value of x–(round(x/y)·y).</li>
<li><a href="/facts/Rounding_to_integer/rCLYXN00">Round to nearest integer</a>. For undirected rounding when halfway between two integers the even integer is chosen.</li>
<li>Comparison operations. Besides the more obvious results, IEEE 754 defines that −∞ = −∞, +∞ = +∞ and x ≠ NaN for any x (including NaN).</li></ul>
<h3>Recommended functions and predicates</h3>
<ul><li>copysign(x,y) returns x with the sign of y, so abs(x) equals copysign(x,1.0). This is one of the few operations which operates on a NaN in a way resembling arithmetic. The function copysign is new in the C99 standard.</li>
<li>−x returns x with the sign reversed. This is different from 0−x in some cases, notably when x is 0. So −(0) is −0, but the sign of 0−0 depends on the rounding mode.</li>
<li>scalb(y, N)</li>
<li>logb(x)</li>
<li>finite(x) a <a href="/facts/Predicate_(mathematics)/5WazAvKF">predicate</a> for "x is a finite value", equivalent to −Inf < x < Inf</li>
<li>isnan(x) a predicate for "x is a NaN", equivalent to "x ≠ x"</li>
<li>x <> y (x .LG. y in <a href="/facts/Fortran/m1KqjcMU">Fortran</a>), which turns out to have different behavior than NOT(x = y) (x .NE. y in Fortran, x != y in <a href="/facts/C_(programming_language)/Ky2No763">C</a>)<a class="footnote-ref" id="fnref:16" href="#fn:16"><sup>16</sup></a> due to NaN.</li>
<li>unordered(x, y) is true when "x is unordered with y", i.e., either x or y is a NaN.</li>
<li>class(x)</li>
<li>nextafter(x,y) returns the next representable value from x in the direction towards y</li></ul>
<h2 id="history">History</h2>
<p>In 1976, <a href="/facts/Intel/SMF0gJJX">Intel</a> was starting the development of a floating-point <a href="/facts/Coprocessor/WsVXzAGo">coprocessor</a>.<a class="footnote-ref" id="fnref:17" href="#fn:17"><sup>17</sup></a><a class="footnote-ref" id="fnref:18" href="#fn:18"><sup>18</sup></a> Intel hoped to be able to sell a chip containing good implementations of all the operations found in the widely varying maths software libraries.<a class="footnote-ref" id="fnref:19" href="#fn:19"><sup>19</sup></a><a class="footnote-ref" id="fnref:20" href="#fn:20"><sup>20</sup></a>
</p><p>John Palmer, who managed the project, believed the effort should be backed by a standard unifying floating point operations across disparate processors. He contacted <a href="/facts/William_Kahan/J1GNQDyN">William Kahan</a> of the <a href="/facts/University_of_California/M9AWowUd">University of California</a>, who had helped improve the accuracy of <a href="/facts/Hewlett-Packard/abuNSYNu">Hewlett-Packard</a>'s calculators. Kahan suggested that Intel use the floating point of <a href="/facts/Digital_Equipment_Corporation/ztxs6keA">Digital Equipment Corporation</a>'s (DEC) VAX. The first VAX, the <a href="/facts/VAX-11%2f780/prkm7jvA">VAX-11/780</a> had just come out in late 1977, and its floating point was highly regarded. However, seeking to market their chip to the broadest possible market, Intel wanted the best floating point possible, and Kahan went on to draw up specifications.<a class="footnote-ref" id="fnref:21" href="#fn:21"><sup>21</sup></a> Kahan initially recommended that the floating point base be decimal<a class="footnote-ref" id="fnref:22" href="#fn:22"><sup>22</sup></a>[<i>unreliable source?</i>] but the hardware design of the coprocessor was too far along to make that change.
</p><p>The work within Intel worried other vendors, who set up a standardization effort to ensure a "level playing field". Kahan attended the second IEEE 754 standards working group meeting, held in November 1977. He subsequently received permission from Intel to put forward a draft proposal based on his work for their coprocessor; he was allowed to explain details of the format and its rationale, but not anything related to Intel's implementation architecture. The draft was co-written with Jerome Coonen and <a href="/facts/Harold_S._Stone/0gWpQDn3">Harold Stone</a>, and was initially known as the "Kahan-Coonen-Stone proposal" or "K-C-S format".<a class="footnote-ref" id="fnref:23" href="#fn:23"><sup>23</sup></a><a class="footnote-ref" id="fnref:24" href="#fn:24"><sup>24</sup></a><a class="footnote-ref" id="fnref:25" href="#fn:25"><sup>25</sup></a><a class="footnote-ref" id="fnref:26" href="#fn:26"><sup>26</sup></a>
</p><p>As an 8-bit exponent was not wide enough for some operations desired for double-precision numbers, e.g. to store the product of two 32-bit numbers,<a class="footnote-ref" id="fnref:27" href="#fn:27"><sup>27</sup></a> both Kahan's proposal and a counter-proposal by DEC therefore used 11 bits, like the time-tested <a href="/facts/CDC_6600/QpnT1Ltl">60-bit floating-point format</a> of the <a href="/facts/CDC_6600/QpnT1Ltl">CDC 6600</a> from 1965.<a class="footnote-ref" id="fnref:28" href="#fn:28"><sup>28</sup></a><a class="footnote-ref" id="fnref:29" href="#fn:29"><sup>29</sup></a><a class="footnote-ref" id="fnref:30" href="#fn:30"><sup>30</sup></a> Kahan's proposal also provided for infinities, which are useful when dealing with division-by-zero conditions; not-a-number values, which are useful when dealing with invalid operations; <a href="/facts/Denormal_number/6qB24xrS">denormal numbers</a>, which help mitigate problems caused by underflow;<a class="footnote-ref" id="fnref:31" href="#fn:31"><sup>31</sup></a><a class="footnote-ref" id="fnref:32" href="#fn:32"><sup>32</sup></a><a class="footnote-ref" id="fnref:33" href="#fn:33"><sup>33</sup></a> and a better balanced <a href="/facts/Exponent_bias/u5MIKgKP">exponent bias</a>, which can help avoid overflow and underflow when taking the reciprocal of a number.<a class="footnote-ref" id="fnref:34" href="#fn:34"><sup>34</sup></a><a class="footnote-ref" id="fnref:35" href="#fn:35"><sup>35</sup></a>
</p><p>Even before it was approved, the draft standard had been implemented by a number of manufacturers.<a class="footnote-ref" id="fnref:36" href="#fn:36"><sup>36</sup></a><a class="footnote-ref" id="fnref:37" href="#fn:37"><sup>37</sup></a> The Intel 8087, which was announced in 1980, was the first chip to implement the draft standard.
</p>

<p>In 1980, the <a href="/facts/Intel_8087/YGDyGKAS">Intel 8087</a> chip was already released,<a class="footnote-ref" id="fnref:38" href="#fn:38"><sup>38</sup></a> but DEC remained opposed, to denormal numbers in particular, because of performance concerns and since it would give DEC a competitive advantage to standardise on DEC's format.
</p><p>The arguments over <a href="/facts/Gradual_underflow/6qB24xrS">gradual underflow</a> lasted until 1981 when an expert hired by <a href="/facts/Digital_Equipment_Corporation/ztxs6keA">DEC</a> to assess it sided against the dissenters. DEC had the study done in order to demonstrate that gradual underflow was a bad idea, but the study concluded the opposite, and DEC gave in. In 1985, the standard was ratified, but it had already become the de facto standard a year earlier, implemented by many manufacturers.<a class="footnote-ref" id="fnref:39" href="#fn:39"><sup>39</sup></a><a class="footnote-ref" id="fnref:40" href="#fn:40"><sup>40</sup></a><a class="footnote-ref" id="fnref:41" href="#fn:41"><sup>41</sup></a>
</p>

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/IEEE_754/dD3e7zAl">IEEE 754</a></li>
<li><a href="/facts/Minifloat/jpegMrQx">Minifloat</a> for simple examples of properties of IEEE 754 floating point numbers</li>
<li><a href="/facts/Fixed-point_arithmetic/pd3RU72P">Fixed-point arithmetic</a></li></ul>
<h2 id="notes">Notes</h2>

<h2 id="further-reading">Further reading</h2>
<ul><li><a href="/facts/Charles_Severance_(computer_scientist)/drJssmsu">Charles Severance</a> (March 1998). <a href="https://web.archive.org/web/20090823204921/http://www.freecollab.com/dr-chuck/papers/columns/r3114.pdf">"IEEE 754: An Interview with William Kahan"</a> (PDF). <i><a href="/facts/IEEE_Computer/WyatS2sN">IEEE Computer</a></i>. 31 (3): 114–115. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1109%2FMC.1998.660194">10.1109/MC.1998.660194</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:33291145">33291145</a>. Archived from <a href="http://www.freecollab.com/dr-chuck/papers/columns/r3114.pdf">the original</a> (PDF) on 2009-08-23. Retrieved 2008-04-28.</li>
<li>David Goldberg (March 1991). <a href="http://www.validlab.com/goldberg/paper.pdf">"What Every Computer Scientist Should Know About Floating-Point Arithmetic"</a> (PDF). <i><a href="/facts/ACM_Computing_Surveys/18Itftvp">ACM Computing Surveys</a></i>. 23 (1): 5–48. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1145%2F103162.103163">10.1145/103162.103163</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:222008826">222008826</a>. Retrieved 2008-04-28.</li>
<li>Chris Hecker (February 1996). <a href="https://web.archive.org/web/20070203082451/http://www.d6.com/users/checker/pdfs/gdmfp.pdf">"Let's Get To The (Floating) Point"</a> (PDF). <i>Game Developer Magazine</i>: 19–24. <a href="/facts/ISSN_(identifier)/DPAflDvU">ISSN</a> <a href="https://search.worldcat.org/issn/1073-922X">1073-922X</a>. Archived from <a href="http://www.d6.com/users/checker/pdfs/gdmfp.pdf">the original</a> (PDF) on 2007-02-03.</li>
<li>David Monniaux (May 2008). <a href="http://hal.archives-ouvertes.fr/hal-00128124/en/">"The pitfalls of verifying floating-point computations"</a>. <i><a href="/facts/ACM_Transactions_on_Programming_Languages_and_Systems/nk6hTKJI">ACM Transactions on Programming Languages and Systems</a></i>. 30 (3): 1–41. <a href="/facts/ArXiv_(identifier)/H6EtgnBe">arXiv</a>:<a href="https://arxiv.org/abs/cs/0701192">cs/0701192</a>. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1145%2F1353445.1353446">10.1145/1353445.1353446</a>. <a href="/facts/ISSN_(identifier)/DPAflDvU">ISSN</a> <a href="https://search.worldcat.org/issn/0164-0925">0164-0925</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:218578808">218578808</a>.: A compendium of non-intuitive behaviours of floating-point on popular architectures, with implications for program verification and testing.</li></ul>
<h2 id="external-links">External links</h2>
<ul><li><a href="http://www.cygnus-software.com/papers/comparingfloats/Obsolete%20comparing%20floating%20point%20numbers.htm">Comparing floats</a></li>
<li><a href="https://web.archive.org/web/20070314154031/http://www.coprocessor.info/">Coprocessor.info: x87 FPU pictures, development and manufacturer information</a></li>
<li><a href="http://speleotrove.com/decimal/854mins.html">IEEE 854-1987</a> — History and minutes</li>
<li><a href="http://www.binaryconvert.com/convert_float.html">IEEE754 (Single and Double precision) Online Converter</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>IEEE Standard for Binary Floating-Point Arithmetic. 1985. doi:10.1109/IEEESTD.1985.82928. ISBN 0-7381-1165-1. <a href="0-7381-1165-1" target="_blank">0-7381-1165-1</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>"ANSI/IEEE Std 754-2019". 754r.ucbtest.org. Retrieved 2019-08-06. <a href="http://754r.ucbtest.org/background/" target="_blank">http://754r.ucbtest.org/background/</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Precision: The number of decimal digits precision is calculated via number_of_mantissa_bits * Log10(2). Thus ~7.2 and ~15.9 for single and double precision respectively. <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Hennessy (2009). Computer Organization and Design. Morgan Kaufmann. p. 270. ISBN 9780123744937. <a href="9780123744937" target="_blank">9780123744937</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Hossam A. H. Fahmy; Shlomo Waser; Michael J. Flynn, Computer Arithmetic (PDF), archived from the original (PDF) on 2010-10-08, retrieved 2011-01-02 <a href="https://web.archive.org/web/20101008203307/http://arith.stanford.edu/~hfahmy/webpages/arith_class/arith.pdf" target="_blank">https://web.archive.org/web/20101008203307/http://arith.stanford.edu/~hfahmy/webpages/arith_class/arith.pdf</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>William Kahan (October 1, 1997). "Lecture Notes on the Status of IEEE 754" (PDF). University of California, Berkeley. Retrieved 2007-04-12. <a href="/wiki/William_Kahan" target="_blank">/wiki/William_Kahan</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>William Kahan (October 1, 1997). "Lecture Notes on the Status of IEEE 754" (PDF). University of California, Berkeley. Retrieved 2007-04-12. <a href="/wiki/William_Kahan" target="_blank">/wiki/William_Kahan</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
<li id="fn:8"><p>"Godot math_funcs.h". GitHub.com. 30 July 2022. <a href="https://github.com/godotengine/godot/blob/master/core/math/math_funcs.h#L302" target="_blank">https://github.com/godotengine/godot/blob/master/core/math/math_funcs.h#L302</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></p></li>
<li id="fn:9"><p>"Godot math_defs.h". GitHub.com. 30 July 2022. <a href="https://github.com/godotengine/godot/blob/master/core/math/math_defs.h#L34" target="_blank">https://github.com/godotengine/godot/blob/master/core/math/math_defs.h#L34</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></p></li>
<li id="fn:10"><p>"Godot MathfEx.cs". GitHub.com. <a href="https://github.com/godotengine/godot/blob/master/modules/mono/glue/Managed/Files/MathfEx.cs#L18" target="_blank">https://github.com/godotengine/godot/blob/master/modules/mono/glue/Managed/Files/MathfEx.cs#L18</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></p></li>
<li id="fn:11"><p>"Comparing Floating Point Numbers, 2012 Edition". randomascii.wordpress.com. 26 February 2012. <a href="https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/" target="_blank">https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></p></li>
<li id="fn:12"><p>"Java Language and Virtual Machine Specifications". Java Documentation. <a href="http://java.sun.com/docs/books/jls/" target="_blank">http://java.sun.com/docs/books/jls/</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></p></li>
<li id="fn:13"><p>John R. Hauser (March 1996). "Handling Floating-Point Exceptions in Numeric Programs" (PDF). ACM Transactions on Programming Languages and Systems. 18 (2): 139–174. doi:10.1145/227699.227701. S2CID 9820157. <a href="http://www.jhauser.us/publications/1996_Hauser_FloatingPointExceptions.html" target="_blank">http://www.jhauser.us/publications/1996_Hauser_FloatingPointExceptions.html</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></p></li>
<li id="fn:14"><p>David Stevenson (March 1981). "IEEE Task P754: A proposed standard for binary floating-point arithmetic". IEEE Computer. 14 (3): 51–62. doi:10.1109/C-M.1981.220377. S2CID 15523399. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></p></li>
<li id="fn:15"><p>William Kahan and John Palmer (1979). "On a proposed floating-point standard". SIGNUM Newsletter. 14 (Special): 13–21. doi:10.1145/1057520.1057522. S2CID 16981715. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:15" class="footnote-back-ref">↩</a></p></li>
<li id="fn:16"><p>ISO/IEC 9899:1999 - Programming languages - C. Iso.org. §7.12.14. <a href="#fnref:16" class="footnote-back-ref">↩</a></p></li>
<li id="fn:17"><p>"Intel and Floating-Point - Updating One of the Industry's Most Successful Standards - The Technology Vision for the Floating-Point Standard" (PDF). Intel. 2016. Archived from the original (PDF) on 2016-03-04. Retrieved 2016-05-30. (11 pages) <a href="https://web.archive.org/web/20160304114859/http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/floating-point-case-study.pdf" target="_blank">https://web.archive.org/web/20160304114859/http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/floating-point-case-study.pdf</a> <a href="#fnref:17" class="footnote-back-ref">↩</a></p></li>
<li id="fn:18"><p>"An Interview with the Old Man of Floating-Point". cs.berkeley.edu. 1998-02-20. Retrieved 2016-05-30. <a href="https://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html" target="_blank">https://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html</a> <a href="#fnref:18" class="footnote-back-ref">↩</a></p></li>
<li id="fn:19"><p>"Intel and Floating-Point - Updating One of the Industry's Most Successful Standards - The Technology Vision for the Floating-Point Standard" (PDF). Intel. 2016. Archived from the original (PDF) on 2016-03-04. Retrieved 2016-05-30. (11 pages) <a href="https://web.archive.org/web/20160304114859/http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/floating-point-case-study.pdf" target="_blank">https://web.archive.org/web/20160304114859/http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/floating-point-case-study.pdf</a> <a href="#fnref:19" class="footnote-back-ref">↩</a></p></li>
<li id="fn:20"><p>Woehr, Jack, ed. (1997-11-01). "A Conversation with William Kahan". Dr. Dobb's. drdobbs.com. Retrieved 2016-05-30. <a href="http://www.drdobbs.com/architecture-and-design/a-conversation-with-william-kahan/184410314" target="_blank">http://www.drdobbs.com/architecture-and-design/a-conversation-with-william-kahan/184410314</a> <a href="#fnref:20" class="footnote-back-ref">↩</a></p></li>
<li id="fn:21"><p>"Intel and Floating-Point - Updating One of the Industry's Most Successful Standards - The Technology Vision for the Floating-Point Standard" (PDF). Intel. 2016. Archived from the original (PDF) on 2016-03-04. Retrieved 2016-05-30. (11 pages) <a href="https://web.archive.org/web/20160304114859/http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/floating-point-case-study.pdf" target="_blank">https://web.archive.org/web/20160304114859/http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/floating-point-case-study.pdf</a> <a href="#fnref:21" class="footnote-back-ref">↩</a></p></li>
<li id="fn:22"><p>W. Kahan 2003, pers. comm. to Mike Cowlishaw and others after an IEEE 754 meeting <a href="/wiki/Mike_Cowlishaw" target="_blank">/wiki/Mike_Cowlishaw</a> <a href="#fnref:22" class="footnote-back-ref">↩</a></p></li>
<li id="fn:23"><p>"Intel and Floating-Point - Updating One of the Industry's Most Successful Standards - The Technology Vision for the Floating-Point Standard" (PDF). Intel. 2016. Archived from the original (PDF) on 2016-03-04. Retrieved 2016-05-30. (11 pages) <a href="https://web.archive.org/web/20160304114859/http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/floating-point-case-study.pdf" target="_blank">https://web.archive.org/web/20160304114859/http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/floating-point-case-study.pdf</a> <a href="#fnref:23" class="footnote-back-ref">↩</a></p></li>
<li id="fn:24"><p>"An Interview with the Old Man of Floating-Point". cs.berkeley.edu. 1998-02-20. Retrieved 2016-05-30. <a href="https://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html" target="_blank">https://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html</a> <a href="#fnref:24" class="footnote-back-ref">↩</a></p></li>
<li id="fn:25"><p>Woehr, Jack, ed. (1997-11-01). "A Conversation with William Kahan". Dr. Dobb's. drdobbs.com. Retrieved 2016-05-30. <a href="http://www.drdobbs.com/architecture-and-design/a-conversation-with-william-kahan/184410314" target="_blank">http://www.drdobbs.com/architecture-and-design/a-conversation-with-william-kahan/184410314</a> <a href="#fnref:25" class="footnote-back-ref">↩</a></p></li>
<li id="fn:26"><p>"IEEE 754: An Interview with William Kahan" (PDF). dr-chuck.com. Retrieved 2016-06-02. <a href="http://www.dr-chuck.com/dr-chuck/papers/columns/r3114.pdf" target="_blank">http://www.dr-chuck.com/dr-chuck/papers/columns/r3114.pdf</a> <a href="#fnref:26" class="footnote-back-ref">↩</a></p></li>
<li id="fn:27"><p>"IEEE vs. Microsoft Binary Format; Rounding Issues (Complete)". Microsoft Support. Microsoft. 2006-11-21. Article ID KB35826, Q35826. Archived from the original on 2020-08-28. Retrieved 2010-02-24. <a href="https://www.betaarchive.com/wiki/index.php/Microsoft_KB_Archive/35826#IEEE_vs._Microsoft_Binary_Format.3B_Rounding_Issues_.28Complete.29" target="_blank">https://www.betaarchive.com/wiki/index.php/Microsoft_KB_Archive/35826#IEEE_vs._Microsoft_Binary_Format.3B_Rounding_Issues_.28Complete.29</a> <a href="#fnref:27" class="footnote-back-ref">↩</a></p></li>
<li id="fn:28"><p>"An Interview with the Old Man of Floating-Point". cs.berkeley.edu. 1998-02-20. Retrieved 2016-05-30. <a href="https://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html" target="_blank">https://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html</a> <a href="#fnref:28" class="footnote-back-ref">↩</a></p></li>
<li id="fn:29"><p>"IEEE 754: An Interview with William Kahan" (PDF). dr-chuck.com. Retrieved 2016-06-02. <a href="http://www.dr-chuck.com/dr-chuck/papers/columns/r3114.pdf" target="_blank">http://www.dr-chuck.com/dr-chuck/papers/columns/r3114.pdf</a> <a href="#fnref:29" class="footnote-back-ref">↩</a></p></li>
<li id="fn:30"><p>Thornton, James E. (1970).  Written at Advanced Design Laboratory, Control Data Corporation. Design of a Computer: The Control Data 6600 (PDF) (1 ed.). Glenview, Illinois, USA: Scott, Foresman and Company. LCCN 74-96462. Archived (PDF) from the original on 2020-08-28. Retrieved 2016-06-02. (1+13+181+2+2 pages) <a href="http://ygdes.com/CDC/DesignOfAComputer_CDC6600.pdf" target="_blank">http://ygdes.com/CDC/DesignOfAComputer_CDC6600.pdf</a> <a href="#fnref:30" class="footnote-back-ref">↩</a></p></li>
<li id="fn:31"><p>"IEEE 754: An Interview with William Kahan" (PDF). dr-chuck.com. Retrieved 2016-06-02. <a href="http://www.dr-chuck.com/dr-chuck/papers/columns/r3114.pdf" target="_blank">http://www.dr-chuck.com/dr-chuck/papers/columns/r3114.pdf</a> <a href="#fnref:31" class="footnote-back-ref">↩</a></p></li>
<li id="fn:32"><p>Kahan, William Morton. "Why do we need a floating-point arithmetic standard?" (PDF). cs.berkeley.edu. Retrieved 2016-06-02. <a href="/wiki/William_Morton_Kahan" target="_blank">/wiki/William_Morton_Kahan</a> <a href="#fnref:32" class="footnote-back-ref">↩</a></p></li>
<li id="fn:33"><p>Kahan, William Morton; Darcy, Joseph D. "How Java's Floating-Point Hurts Everyone Everywhere" (PDF). cs.berkeley.edu. Retrieved 2016-06-02. <a href="/wiki/William_Morton_Kahan" target="_blank">/wiki/William_Morton_Kahan</a> <a href="#fnref:33" class="footnote-back-ref">↩</a></p></li>
<li id="fn:34"><p>Turner, Peter R. (2013-12-21). Numerical Analysis and Parallel Processing: Lectures given at The Lancaster …. Springer. ISBN 978-3-66239812-8. Retrieved 2016-05-30. <a href="978-3-66239812-8" target="_blank">978-3-66239812-8</a> <a href="#fnref:34" class="footnote-back-ref">↩</a></p></li>
<li id="fn:35"><p>"Names for Standardized Floating-Point Formats" (PDF). cs.berkeley.edu. Retrieved 2016-06-02. <a href="https://www.cs.berkeley.edu/~wkahan/ieee754status/Names.pdf" target="_blank">https://www.cs.berkeley.edu/~wkahan/ieee754status/Names.pdf</a> <a href="#fnref:35" class="footnote-back-ref">↩</a></p></li>
<li id="fn:36"><p>Charles Severance (20 February 1998). "An Interview with the Old Man of Floating-Point". <a href="/wiki/Charles_Severance_(computer_scientist)" target="_blank">/wiki/Charles_Severance_(computer_scientist)</a> <a href="#fnref:36" class="footnote-back-ref">↩</a></p></li>
<li id="fn:37"><p>Charles Severance. "History of IEEE Floating-Point Format". Connexions. Archived from the original on 2009-11-20. <a href="/wiki/Charles_Severance_(computer_scientist)" target="_blank">/wiki/Charles_Severance_(computer_scientist)</a> <a href="#fnref:37" class="footnote-back-ref">↩</a></p></li>
<li id="fn:38"><p>"Molecular Expressions: Science, Optics & You - Olympus MIC-D: Integrated Circuit Gallery - Intel 8087 Math Coprocessor". micro.magnet.fsu.edu. Retrieved 2016-05-30. <a href="http://micro.magnet.fsu.edu/optics/olympusmicd/galleries/chips/intel8087.html" target="_blank">http://micro.magnet.fsu.edu/optics/olympusmicd/galleries/chips/intel8087.html</a> <a href="#fnref:38" class="footnote-back-ref">↩</a></p></li>
<li id="fn:39"><p>"An Interview with the Old Man of Floating-Point". cs.berkeley.edu. 1998-02-20. Retrieved 2016-05-30. <a href="https://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html" target="_blank">https://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html</a> <a href="#fnref:39" class="footnote-back-ref">↩</a></p></li>
<li id="fn:40"><p>"IEEE 754: An Interview with William Kahan" (PDF). dr-chuck.com. Retrieved 2016-06-02. <a href="http://www.dr-chuck.com/dr-chuck/papers/columns/r3114.pdf" target="_blank">http://www.dr-chuck.com/dr-chuck/papers/columns/r3114.pdf</a> <a href="#fnref:40" class="footnote-back-ref">↩</a></p></li>
<li id="fn:41"><p>William Kahan (October 1, 1997). "Lecture Notes on the Status of IEEE 754" (PDF). University of California, Berkeley. Retrieved 2007-04-12. <a href="/wiki/William_Kahan" target="_blank">/wiki/William_Kahan</a> <a href="#fnref:41" class="footnote-back-ref">↩</a></p></li>
</ol>

IEEE 754-1985 open-in-new

IEEE 754-1985