The IEEE Standard for FloatingPoint Arithmetic (IEEE 754) is the most widelyused standard for floatingpoint computation, and is followed by many hardware (CPU and FPU) and software implementations. Many computer languages allow or require that some or all arithmetic be carried out using IEEE 754 formats and operations. The current version is IEEE 7542008, which was published in August 2008; it includes nearly all of the original IEEE 7541985 (which was published in 1985) and the IEEE Standard for RadixIndependent FloatingPoint Arithmetic (IEEE 8541987).
The standard defines
The standard also includes extensive recommendations for advanced exception handling, additional operations (such as trigonometric functions), expression evaluation, and for achieving reproducible results.
The standard is derived from and replaces IEEE 7541985, the previous version, following a sevenyear revision process, chaired by Dan Zuras and edited by Mike Cowlishaw. The binary formats in the original standard are included in the new standard along with three new basic formats (one binary and two decimal). To conform to the current standard, an implementation must implement at least one of the basic formats as both an arithmetic format and an interchange format.
Contents 
Formats in IEEE 754 describe sets of floatingpoint data and encodings for interchanging them.
A given format comprises:
The possible finite values that can be represented in a given format are determined by the base (b), the number of digits in the significand (precision, p), and the exponent parameter emax:
Hence (for the example parameters) the smallest nonzero positive number that can be represented is 1×10^{−101} and the largest is 9999999×10^{90} (9.999999×10^{96}), and the full range of numbers is −9.999999×10^{96} through 9.999999×10^{96}. The numbers closest to the inverse of these bounds (−1×10^{−95} and 1×10^{−95}) are considered to be the smallest (in magnitude) normal numbers; nonzero numbers between these smallest numbers are called subnormal numbers.
Zero values are finite values with significand 0. These are signed zeros, the sign bit specifies if a zero is +0 (positive zero) or −0 (negative zero).
The standard defines five basic formats, named using their base and the number of bits used to encode them. A conforming implementation must fully implement at least one of the basic formats. There are three binary floatingpoint basic formats (which can be encoded using 32, 64 or 128 bits) and two decimal floatingpoint basic formats (which can be encoded using 64 or 128 bits). The binary32 and binary64 formats are the single and double formats of IEEE 7541985.
The precision of the binary formats is one greater than the width of its significand, because there is an implied (hidden) 1 bit.
Name  Common name  Base  Digits  E min  E max  Notes 

binary16  Half precision  2  10+1  14  +15  storage, not basic 
binary32  Single precision  2  23+1  126  +127  
binary64  Double precision  2  52+1  1022  +1023  
binary128  Quadruple precision  2  112+1  16382  +16383  
decimal32  10  7  95  +96  storage, not basic  
decimal64  10  16  383  +384  
decimal128  10  34  6143  +6144 
All the basic formats are available in both hardware and software implementations.
A format that is just to be used for arithmetic and other operations need not have an encoding associated with it (that is, an implementation can use whatever internal representation it chooses); all that needs to be defined are its parameters (b, p, and emax). These parameters uniquely describe the set of finite numbers (combinations of sign, significand, and exponent) that it can represent.
Interchange formats are intended for the exchange of floatingpoint data using a fixedlength bitstring for a given format.
For the exchange of binary floatingpoint numbers, interchange formats of length 16 bits, 32 bits, 64 bits, and any multiple of 32 bits ≥128 are defined. The 16bit format is intended for the exchange or storage of small numbers (e.g., for graphics).
The encoding scheme for these binary interchange formats is the same as that of IEEE 7541985: a sign bit, followed by w exponent bits that describe the exponent offset by a bias, and p−1 bits that describe the significand. The width of the exponent field for a kbit format is computed as w = round(4×log2(k))−13. The existing 64 and 128bit formats follow this rule, but the 16 and 32bit formats have more exponent bits (5 and 8) than this formula would provide (3 and 7, respectively).
As with IEEE 7541985, there is some flexibility in the encoding of signaling NaNs.
For the exchange of decimal floatingpoint numbers, interchange formats of any multiple of 32 bits are defined.
The encoding scheme for the decimal interchange formats similarly encodes the sign, exponent, and significand, but uses a more complex approach to allow the significand to be encoded as a compressed sequence of decimal digits (using Densely Packed Decimal) or as a binary integer. In either case the set of numbers (combinations of sign, significand, and exponent) that may be encoded is identical, and signaling NaNs have a unique encoding (and the same set of possible payloads).
The standard defines five rounding algorithms. The first two round to a nearest value; the others are called directed roundings:
Required operations for a supported arithmetic format (including the basic formats) include:
The standard defines five exceptions, each of which has a corresponding status flag that (except in certain cases of underflow) is raised when the exception occurs. No other action is required, but alternatives are recommended (see below).
The five possible exceptions are:
These are the same five exceptions as were defined in IEEE 7541985.
The standard recommends optional exception handling in various forms, including traps (exceptions that change the flow of control in some way) and other exception handling models which interrupt the flow, such as try/catch. The traps and other exception mechanisms remain optional, as they were in IEEE 7541985.
A new clause in the standard recommends fifty operations, including log, power, and trigonometric functions, that language standards should define. These are all optional (none are required in order to conform to the standard). The operations include some on dynamic modes for attributes, and also a set of reduction operations (sum, scaled product, etc.). All are required to supply a correctly rounded result, but they do not have to detect or report inexactness.
The standard recommends how language standards should specify the semantics of sequences of operations, and points out the subtleties of literal meanings and optimizations that change the value of a result.
The IEEE 7541985 allowed many variations in implementations (such as the encoding of some values and the detection of certain exceptions). IEEE 7542008 has tightened up many of these, but a few variations still remain (especially for binary formats). The reproducibility clause recommends that language standards should provide a means to write reproducible programs (i.e., programs that will produce the same result in all implementations of a language), and describes what needs to be done to achieve reproducible results.
The standard requires operations to convert between basic formats and external character sequence formats. Conversions to and from a decimal character format are required for all formats. Conversion to an external character sequence must be such that conversion back using round to even will recover the original number. There is no requirement to preserve the payload of a NaN or signaling NaN, and conversion from the external character sequence may turn a signaling NaN into a quiet NaN.
Correctly rounded results can be obtained converting to decimal and back again to the binary format using:
For other binary formats the required number of decimal digits is
where p is the number of significant bits in the binary format, e.g. 24 bits for binary32.
The decimal representation will be preserved using:
Correct rounding is only guaranteed for these numbers of decimal digits plus 3. For instance a conversion from an decimal external sequence with 8 decimal digits is guaranteed to be correctly rounded when converted to binary16, but conversion of a sequence of 9 decimal digits is not.

