Concrete binary floating-point types, part 1
Introduction
The Swift standard library provides, depending on the platform, either two or
three concrete floating-point types. Again, these are all defined as value types
wrapping LLVM primitive types from the Builtin
module:
@_fixed_layout
public struct Float {
public // @testable
var _value: Builtin.FPIEEE32
/* ... */
}
Two floating-point types, Float
and Double
, are available on all platforms
supported by Swift. They are 32-bit and 64-bit types, respectively, and type
aliases are provided so that users can instead refer to these types
as Float32
and Float64
.
For the i386
and x86_64
architectures, the extended-precision floating-point
type Float80
is also supported, except on Windows. (In C/C++ programming with
the Win32 API, the long double
data type maps to double
.)
IEEE 754
Swift, like many other languages, attempts to provide a floating-point implementation faithful to IEEE 754.
Background:
For floating-point types, the IEEE 754 technical standard defines basic and interchange formats, rounding rules, required operations, and exception handling that are meant to enable reliability and portability.
A full overview of IEEE 754 is well beyond the scope of this article; some key aspects of the standard are as follows:
Data types are able to represent NaN (“not a number”), positive and negative infinity, and subnormal numbers that are very close to zero.
Addition, subtraction, multiplication, division, and square root are required operations that must be correctly rounded; that is, the result must be the representable value closest to the exact mathematical answer, rounded according to the chosen rounding mode.
There are five types of exceptions—invalid, division by zero, overflow, underflow, and inexact—which (controversially) are to be logged using global flags.
A large set of functions (such as sine and cosine) are recommended but not required.
Until recently, LLVM lacked constrained floating-point intrinsics to support the use of dynamic rounding modes or floating-point exception behavior. By default, the rounding mode is assumed to be round-to-nearest and floating-point exceptions are ignored. Swift does not expose any APIs to change the rounding mode or floating-point exception behavior, nor is it possible to interrogate floating-point status flags. (Such limitations are also found in Rust.)
Note that the rounding mode, or the rounding rule used to fit a result to the precision of a given floating-point format (IEEE 754-2008 §4.3), is to be distinguished from the rounding rule used to round a value to the nearest integer (IEEE 754-2008 §5.9). In Swift, it is not possible to change the former, but it is possible to choose any rule for the latter.
Note that floating-point exceptions are to be distinguished from Swift errors and from runtime traps.
C mathematical functions
IEEE 754 recommends, but does not require, implementations to provide elementary functions such as sine, arctangent, and binary logarithm. The Swift standard library does not provide native implementations of such functions.
Nonetheless, users have access to these operations through the C standard
library, which can be imported on macOS as part of the Darwin
module and on
Linux as part of the Glibc
module; alternatively, users can choose to import
the Foundation
module instead. Swift provides an “overlay” that makes some
changes to improve the user experience of working with C mathematical functions
and disables certain incompatible functions.
Note that not all functions are implemented with identical precision in
Darwin
andGlibc
.
When imported, the C standard library provides implementations for required
operations in IEEE 754 that are duplicative of those provided by the Swift
standard library, often with distinct names–for example, round(x)
and
x.rounded()
. Since C library functions may be more familiar to many users,
the Swift overlay allows those who import the C standard library to call such
functions using their C names.
LLVM provides intrinsics that are equivalent to some C mathematical functions, including sine and cosine (but not tangent). The Swift standard library does expose those functions, but using names prefixed with an underscore to indicate that they are not intended for public use. The Swift overlay for C mathematical functions actually substitutes the LLVM intrinsic for the corresponding C library function where possible.
A comparison of IEEE 754 required operations, their Swift standard library names, and their C standard library overlay names is presented below.
IEEE 754 | Swift standard library | C standard library overlay |
---|---|---|
Not shown: conversion and comparison operations Not available in Swift: conformance predicates and operations on subsets of flags |
||
Homogeneous general computational operations | ||
roundToIntegralTiesToEven(x) | x.rounded(.toNearestOrEven) |
|
roundToIntegralTiesToAway(x) | x.rounded() or x.rounded(.toNearestOrAwayFromZero) |
round(x) |
roundToIntegralTowardZero(x) | x.rounded(.towardZero) |
trunc(x) |
roundToIntegralTowardPositive(x) | x.rounded(.up) |
ceil(x) |
roundToIntegralTowardNegative(x) | x.rounded(.down) |
floor(x) |
roundToIntegralExact(x) | ||
nextUp(x) | x.nextUp |
nextafter(x, .infinity) |
nextDown(x) | x.nextDown |
nextafter(x, -.infinity) |
remainder(x, y) | x.remainder(dividingBy: y) |
remainder(x, y) |
minNum(x, y) | T.minimum(x, y) |
fmin(x, y) |
maxNum(x, y) | T.maximum(x, y) |
fmax(x, y) |
minNumMag(x, y) | T.minimumMagnitude(x, y) |
|
maxNumMag(x, y) | T.maximumMagnitude(x, y) |
|
Scaling operations | ||
scaleB(x, n) | [*] | scalbn(x, n) |
logB(x) | x.exponent |
ilogb(x) |
Arithmetic operations (excluding conversion operations) | ||
addition(x, y) | x + y |
|
subtraction(x, y) | x - y |
|
multiplication(x, y) | x * y |
|
division(x, y) | x / y |
|
squareRoot(x) | x.squareRoot() |
sqrt(x) |
fusedMultiplyAdd(x, y, z) | z.addingProduct(x, y) |
fma(x, y, z) |
Sign bit operations | ||
copy(x) | x |
|
negate(x) | -x |
|
abs(x) | abs(x) or x.magnitude |
fabs(x) |
copySign(x, y) | T(signOf: y, magnitudeOf: x) |
copysign(x, y) |
General non-computational operations | ||
class(x) | x.floatingPointClass |
Unavailable: fpclassify(x) |
isSignMinus(x) | x.sign == .minus |
Unavailable: signbit(x) |
isNormal(x) | x.isNormal |
Unavailable: isnormal(x) |
isFinite(x) | x.isFinite |
Unavailable: isfinite(x) |
isZero(x) | x.isZero |
|
isSubnormal(x) | x.isSubnormal |
|
isInfinite(x) | x.isInfinite |
Unavailable: isinf(x) |
isNaN(x) | x.isNaN |
Unavailable: isnan(x) |
isSignaling(x) | x.isSignalingNaN |
|
isCanonical(x) | x.isCanonical |
|
radix(x) | T.radix |
|
totalOrder(x, y) | x.isTotallyOrdered(belowOrEqualTo: y) |
|
totalOrderMag(x, y) |
[*]: T(sign: x.sign, exponent: x.exponent + n, significand: x.significand)
Finite constants
Similarly, some finite constants defined in the C standard library have equivalent static properties in the Swift standard library with clarified names.
Swift | C (float ) |
C (double ) |
---|---|---|
greatestFiniteMagnitude |
FLT_MAX |
DBL_MAX |
leastNormalMagnitude |
FLT_MIN |
DBL_MIN |
leastNonzeroMagnitude |
FLT_TRUE_MIN |
DBL_TRUE_MIN |
pi |
M_PI |
|
ulpOfOne |
FLT_EPSILON |
DBL_EPSILON |
The use of “max” and “min” can be misleading. Even within the Swift project
itself, users have mistaken FLT_MIN
for the minimum representable
value (by analogy with Int.min
). However, FLT_MIN
is not even negative. Nor
is it the least representable positive value if the platform supports
subnormal values: in C, that value is known as FLT_TRUE_MIN
.
Note that .pi
is rounded toward zero for reasons discussed later.
Consequently, Float(M_PI) != .pi
.
The use of “epsilon” was avoided because that term has varying definitions among other programming languages and suggests that it might be appropriate for use as a tolerance for floating-point comparisons, which is generally inadvisable.
For more information on the rationale for names chosen in Swift, see the Swift Evolution proposal SE-0067: Enhanced floating-point protocols.
Previous:
Concrete integer types, part 2
Next:
Concrete binary floating-point types, part 2
27 February–3 March 2018