Notes on numerics in Swift

Concrete binary floating-point types, part 1

Introduction

The Swift standard library provides, depending on the platform, either two or three concrete floating-point types. Again, these are all defined as value types wrapping LLVM primitive types from the Builtin module:

@frozen
public struct Float {
  public // @testable
  var _value: Builtin.FPIEEE32
  /* ... */
}

Two floating-point types, Float and Double, are available on all platforms supported by Swift. They are 32-bit and 64-bit types, respectively, and type aliases are provided so that users can instead refer to these types as Float32 and Float64.

For the i386 and x86_64 architectures, the extended-precision floating-point type Float80 is also supported, except on Windows. On supported platforms, C’s long double data type is mapped to Float80 in Swift 4.2+, which makes it possible to use the full set of math functions for Float80 that are available on the platform. (In C/C++ programming with the Win32 API, the long double data type maps to double.)

LLVM does support half-precision (16-bit) and quadruple-precision (128-bit) binary floating-point types, but this support is not surfaced by the Swift standard library. Some platforms offer limited native support for arithmetic using these formats.

A future version of Swift may include support for a 16-bit IEEE 754 binary floating-point type, which would likely be named Float16.

IEEE 754

Swift, like many other languages, attempts to provide a floating-point implementation faithful to the IEEE 754 technical standard.

Background:

For floating-point types, IEEE 754 defines basic and interchange formats, rounding rules, required operations, and exception handling that are meant to enable reliability and portability.

A full overview of IEEE 754 is well beyond the scope of this article; some key aspects of the standard are as follows:

  • Data types are able to represent NaN (“not a number”), positive and negative infinity, and subnormal numbers that are very close to zero.

  • Addition, subtraction, multiplication, division, and square root are required operations that must be correctly rounded; that is, the result must be the representable value closest to the exact mathematical answer, rounded according to the chosen rounding mode.

  • There are five types of floating-point exceptions–invalid, division by zero, overflow, underflow, and inexact–which (controversially) are to be logged using global flags.

  • A large set of functions (such as sine and cosine) are recommended but not required.

Until recently, LLVM lacked constrained floating-point intrinsics to support the use of dynamic rounding modes or floating-point exception behavior. By default, the rounding mode is assumed to be round-to-nearest and floating-point exceptions are ignored.

Swift does not expose any APIs to change the rounding mode or floating-point exception behavior, nor is it possible to interrogate floating-point status flags. (Such limitations are also found in Rust.)

Note that the rounding mode, or the rounding rule used to fit a result to the precision of a given floating-point format (IEEE 754-2008 §4.3), is not necessarily the same as the rounding rule used to round a value to the nearest integer (IEEE 754-2008 §5.9). In Swift, it is not possible to change the former, but it is possible to choose any rule for the latter.

Note that floating-point exceptions are to be distinguished from Swift errors and from runtime traps.

C mathematical functions

The Swift standard library provides an “overlay” that makes some changes to improve the user experience of working with C mathematical functions and disables certain incompatible functions.

IEEE 754 recommends, but does not require, implementations to provide elementary functions such as sine, arctangent, and binary logarithm. The Swift standard library will offer APIs for such operations in a future version of Swift. For now, the Swift standard library provides only IEEE 754 required operations such as square root; for other functions, users need to use the C standard library, which can be imported on macOS as part of the Darwin module and on Linux as part of the Glibc module (alternatively, users can import the Foundation module instead).

LLVM provides intrinsics that are equivalent to some C mathematical functions, including sine and cosine. The Swift overlay substitutes the LLVM intrinsic for the corresponding C library function where possible.

Note that not all functions are implemented with identical precision in Darwin and Glibc, and the same discrepancies among platforms are applicable to functions provided by the Swift standard library.

A comparison of IEEE 754 required and recommended operations, their Swift standard library names, and their C standard library overlay names is presented below (where x, y, z are values of floating-point type T, n is a value of type Int, and “Swift x” is a future version of Swift).

IEEE 754 Swift standard library C standard library overlay
Not shown: conversion and comparison operations
Not available in Swift: conformance predicates and operations on subsets of flags
Homogeneous general computational operations
roundToIntegral​TiesToEven(x) x.rounded(​.toNearestOrEven)
roundToIntegral​TiesToAway(x) x.rounded()
  or
x.rounded(​.toNearestOrAwayFromZero)
round(x)
roundToIntegral​TowardZero(x) x.rounded(​.towardZero) trunc(x)
roundToIntegral​TowardPositive(x) x.rounded(.up) ceil(x)
roundToIntegral​TowardNegative(x) x.rounded(.down) floor(x)
roundToIntegral​Exact(x)
nextUp(x) x.nextUp Unavailable (Swift x):
nextafter(x, .infinity)
nextDown(x) x.nextDown Unavailable (Swift x):
nextafter(x, -.infinity)
remainder(x, y) x.remainder(​dividingBy: y) remainder(x, y)
x.truncatingRemainder(​dividingBy: y) fmod(x, y)
minNum(x, y) T.minimum(x, y) Unavailable (Swift x):
fmin(x, y)
maxNum(x, y) T.maximum(x, y) Unavailable (Swift x):
fmax(x, y)
Swift.max(x - y, 0) fdim(x, y)
minNumMag(x, y) T.minimumMagnitude(​x, y)
maxNumMag(x, y) T.maximumMagnitude(​x, y)
Scaling operations
scaleB(x, n) T(sign: .plus, exponent: n, significand: x) scalbn(x, n)
logB(x) x.exponent Unavailable (Swift 4.2):
ilogb(x)
Arithmetic operations (excluding conversion operations)
addition(x, y) x + y
subtraction(x, y) x - y
multiplication(x, y) x * y
division(x, y) x / y
squareRoot(x) x.squareRoot()
  or (Swift x)
T.sqrt(x)
sqrt(x)
fusedMultiplyAdd(x, y, z) z.addingProduct(x, y) fma(x, y, z)
Sign bit operations
copy(x) x
negate(x) -x
abs(x) abs(x)
  or
x.magnitude
Unavailable (Swift 4.2):
fabs(x)
copySign(x, y) T(signOf: y, magnitudeOf: x) copysign(x, y)
General non-computational operations
class(x) x.floatingPointClass Unavailable:
fpclassify(x)
isSignMinus(x) x.sign == .minus Unavailable:
signbit(x)
isNormal(x) x.isNormal Unavailable:
isnormal(x)
isFinite(x) x.isFinite Unavailable:
isfinite(x)
isZero(x) x.isZero
isSubnormal(x) x.isSubnormal
isInfinite(x) x.isInfinite Unavailable:
isinf(x)
isNaN(x) x.isNaN Unavailable:
isnan(x)
isSignaling(x) x.isSignalingNaN
isCanonical(x) x.isCanonical
radix(x) T.radix
totalOrder(x, y) x.isTotallyOrdered(​belowOrEqualTo: y)
totalOrderMag(x, y)
Additional elementary functions (Swift x)
sin T.sin(x) sin(x)
cos T.cos(x) cos(x)
tan T.tan(x) tan(x)
sinPi
cosPi
asin T.asin(x) asin(x)
acos T.acos(x) acos(x)
atan T.atan(x) atan(x)
atanPi
sinh T.sinh(x) sinh(x)
cosh T.cosh(x) cosh(x)
tanh T.tanh(x) tanh(x)
asinh T.asinh(x) asinh(x)
acosh T.acosh(x) acosh(x)
atanh T.atanh(x) atanh(x)
exp T.exp(x) exp(x)
exp2 T.exp2(x) exp2(x)
exp10 T.exp10(x)
  or
exp10(x)
expm1 T.expm1(x) expm1(x)
exp2m1
exp10m1
log T.log(x) log(x)
log2 T.log2(x) log2(x)
T.log2(x)​.rounded(.down) Unavailable (Swift x):
logb(x)
log10 T.log10(x) log10(x)
logp1 T.log1p(x) log1p(x)
log2p1
log10p1
compound(x, n)
pow(x, y) T.pow(x, y) pow(x, y)
powr(x, y)
pown(x, n) T.pow(x, n)
  or
pow(x, n)
rootn(x, n) T.root(x, n)
  or
root(x, n)
T.root(x, 3)
  or
root(x, 3)
cbrt(x)
rSqrt
Additional real operations (Swift x)
atan2(y, x) T.atan2(y: y, x: x)
  or
atan2(y: y, x: x)
atan2(y, x)
atan2Pi(y, x)
hypot(x, y) T.hypot(x, y) hypot(x, y)
T.erf(x) erf(x)
T.erfc(x) erfc(x)
T.gamma(x)
  or
gamma(x)
tgamma(x)
(T.logGamma(x), T.signGamma(x) == .plus ? 1 : -1)
  or
(logGamma(x), signGamma(x) == .plus ? 1 : -1)
lgamma(x)

Current implementations of T.pow(x, n) and T.root(x, n) give inaccurate results if n is so large that conversion to T would round.

For more information on the additional elementary functions and real operations to be added in a future version of Swift, see the Swift Evolution proposal SE-0246: Generic math(s) functions.

Finite constants

Similarly, some finite constants defined in the C standard library have equivalent static properties in the Swift standard library with clarified names.

Swift C (float) C (double)
greatestFiniteMagnitude FLT_MAX DBL_MAX
leastNormalMagnitude FLT_MIN DBL_MIN
leastNonzeroMagnitude FLT_TRUE_MIN DBL_TRUE_MIN
pi   M_PI
ulpOfOne FLT_EPSILON DBL_EPSILON

The use of “max” and “min” can be misleading. Even within the Swift project itself, users have mistaken FLT_MIN for the minimum representable value (by analogy with Int.min). However, FLT_MIN is not even negative. Nor is it the least representable positive value if the platform supports subnormal values: in C, that value is known as FLT_TRUE_MIN.

Note that .pi is rounded toward zero for reasons discussed later. Consequently, Float(M_PI) != .pi.

The use of “epsilon” was avoided because that term has varying definitions among other programming languages and suggests that it might be appropriate for use as a measure of tolerance for floating-point comparisons, which is generally inadvisable.

For more information on the rationale for names chosen in Swift, see the Swift Evolution proposal SE-0067: Enhanced floating-point protocols.


Previous:
Concrete integer types, part 2

Next:
Concrete binary floating-point types, part 2

27 February–3 March 2018
Updated 28 July 2019