Notes on numerics in Swift

Concrete binary floating-point types, part 3

Integer literals (redux)

All floating-point types in Swift conform to the protocol ExpressibleByIntegerLiteral. Therefore, it’s possible to create a new floating-point value using an integer literal.

Earlier, nuances about the use of integer literals were discussed that apply equally when they are used to express floating-point values. One issue is especially salient for our purposes here:

Recall that integer literals do not support signed zero (in other words, -0 as Float evaluates to positive zero). Either use parentheses, as in -(0 as Float), or use a float literal as discussed below, to obtain the desired value.

Float literals

A float literal in Swift is similar to analogous expressions in other “C family” languages. It can be written in either base 10 or base 16 (hexadecimal).

Background:

In Swift, as in other “C family” languages, the whole part of a base 10 float literal can be followed by a fractional part beginning with a decimal separator dot (.), a decimal exponent beginning with e or E, or both (in that order). The digits of the exponent can optionally be preceded by - or +.

As with integer literals, float literals can be prepended with the hyphen-minus character (-) to indicate a negative value.

In Swift, a float literal cannot begin or end with a decimal separator dot. For example:

let x = .5
// error: '.5' is not a valid floating point literal; it must be written '0.5'
let y = 5.
// error: expected member name following '.'

(Instead, Swift uses leading dot syntax for implicit member lookup.)

The same requirement does not apply to conversions from String. For example, Double(".5")! == 0.5.

Hexadecimal float literals

Unfamiliar to some users, hexadecimal float literals (also specified in C99 and and C++17) are supported in Swift. They can be useful when you want to represent the intended binary floating-point value exactly and a decimal literal is impractical or impossible for the purpose.

Hexadecimal float literals use the base prefix 0x. Then, the whole part (in base 16) can optionally be followed by a fractional part (also in base 16) beginning with the separator dot (.). Finally, hexadecimal float literals must end with a binary exponent beginning with p or P. The digits of the exponent can optionally be preceded by - or +. For example:

let x = 0x1p2
// 1.0 * (2 ** 2) == 4
// Here, we use `**` to represent exponentiation.

let y = 0x1p-2
// 1.0 * (2 ** -2) == 0.25 

let z = 0x1.8p-1
// (1.0 + 8/16) * (2 ** -1) == 0.75

let a = 0xf.fffp-3
// (15.0 + 15/16 + 15/256 + 15/4096) * (2 ** -3)
//   == 1.999969482421875 

In C, the binary exponent is not optional to avoid ambiguity between the hexadecimal digit f and a suffix f indicating that the constant has type float. In Swift, the binary exponent isn’t optional even though there is no possibility of ambiguity.

As with integer literals, hexadecimal float literals can be prepended with the hyphen-minus character (-) to indicate a negative value.

In Swift, the portion of a hexadecimal float literal between the required base prefix and the required binary exponent cannot begin or end with the separator dot (.). At the time of writing, error messages are not particularly helpful in diagnosing the issue:

let x = 0x1.p2
// error: value of type 'Int' has no member 'p2'
let y = 0x.1p2
// error: '.' is not a valid hexadecimal digit (0-9, A-F) in integer literal
// error: 'p' is not a valid digit in integer literal
// error: consecutive statements on a line must be separated by ';'
// error: expected identifier after '.' expression

Again, the same requirement does not apply to conversions from String.

Type inference

As previously discussed, literals have no type of their own in Swift. Instead, the type checker attempts to infer the type of a literal expression based on other available information such as explicit type annotations.

Besides using an explicit type annotation, the type coercion operator as (which is to be distinguished from dynamic cast operators as?, as!, and is) can be used to provide information for type inference:

let x = 42.0 as Float

In the absence of other available information, the inferred type of a float literal expression defaults to FloatLiteralType, which is a type alias for Double unless it is shadowed by the user.

The following caveat is no longer applicable since changes described in SE-0213: Integer initialization via coercion were implemented in July 2018 and shipped:

A frequent misunderstanding found even in the Swift project itself concerns the use of a type conversion initializer to indicate the desired type of a literal expression. For example:

// Avoid writing such code.
let x = Float(42.0)

This usage frequently gives the intended result, but the function call does not provide information for type inference. Instead, this statement creates an instance of type FloatLiteralType (which again, by default, is a type alias for Double) with the value 42.0, then converts that value to Float.

Since Float has less precision than Double, a literal value is rounded twice when that statement is evaluated, which can lead to double-rounding error:

let correct = 8388608.5000000001 as Float
// 8388609
let incorrect = Float(8388608.5000000001)
// 8388608

Since Float80 has more precision than Double, the same misunderstanding causes loss of precision in floating-point values analogous to omission of the suffix l in C/C++ (which must be used to indicate that a constant should have long double type):

let precise = 3.14159265358979323846 as Float80
// 3.14159265358979323851
let imprecise = Float80(3.14159265358979323846)
// 3.141592653589793116

Float literal precision

Notionally, a numeric literal isn’t limited by the precision of any type because it has no type.

Under the hood, however, a float literal is first used to create an internal value of type _MaxBuiltinFloatType, which is then converted to the intended type. As of the time of writing, float literals may be incorrectly rounded because _MaxBuiltinFloatType is a type alias for Float80 if supported and Double otherwise. Consequently, float literals that cannot be represented exactly as a value of type _MaxBuiltinFloatType are subject to double-rounding error just as though the value were created using a converting initializer.

Hexadecimal float literals of no more than the maximum supported precision can be used to avoid this double-rounding error for binary floating-point types.

Previously, an integer literal was likewise first used to create an internal 2048-bit value (of type _MaxBuiltinIntegerType), which was then converted to the intended type. In November 2018, the integer literal type was changed from a fixed 2048-bit type to an arbitrary width type.

Double rounding of float literals is tracked by Swift bug SR-7124: Double rounding in floating-point literal conversion.

Since _MaxBuiltinFloatType is a binary floating-point type, a decimal floating-point type that conforms to the protocol ExpressibleByFloatLiteral cannot distinguish between two values that have the same binary floating-point representation when rounded to fit _MaxBuiltinFloatType:

import Foundation

(0.1 as Decimal).description
// "0.1"
(0.10000000000000001 as Decimal).description
// "0.1"

Decimal(string: "0.1")!.description
// "0.1"
Decimal(string: "0.10000000000000001")!.description
// "0.10000000000000001"

Conversions among floating-point types

Two different initializers are provided for conversions between standard library binary floating-point types. A value of source of type T can be converted to a value of type U as follows:

  1. U(source)
    Converts the given value to a representable value of type U.
    The result of an inexact conversion is rounded to the nearest representable value.
    The result of an overflowing conversion is infinite.
    The result of an underflowing conversion is zero.
    The result of converting NaN is some encoding of NaN that varies based on the underlying architecture; any signaling NaN is always converted to a quiet NaN.

  2. U(exactly: source)
    Failable initializer.
    Converts the given value if it can be represented “exactly” as a value of type U; any result that is not nil can be converted back to a value of type T that compares equal to source.
    The result of an inexact conversion is nil.
    The result of an overflowing conversion is nil.
    The result of an underflowing conversion is nil.
    The result of converting NaN (however encoded) is nil, since NaN never compares equal to NaN.

Other initializers

Conversions between integer types and floating-point types

Incomplete

Creating from a string

Standard library binary floating-point types provide an unlabeled failable initializer that creates a binary floating-point value based on a given string:

let pi = Double("3.14159265358979323846")!
// 3.1415926535897931

Any spelling that’s valid as an integer or float literal is valid as a string for conversion to a binary floating-point type. Likewise, any value obtained from the description or debugDescription property of a binary floating-point value is valid for conversion. Specifically:

Until the behavior is changed in a future version of Swift, any string that would cause a range error when it is used as the argument of the C function strtof or strtod causes Float.init?(_: String) or Double.init?(_: String) (respectively) to return nil. Therefore:

Any invalid character, even if whitespace, causes the entire string to be invalid for conversion.
The result of an inexact conversion is rounded to the nearest representable value.
The result of an overflowing conversion is nil (in the future, it will be infinite).
The result of an underflowing conversion is nil (in the future, it will be zero).
The result of converting NaN is encoded with the NaN payload (truncated if needed) if such a payload is specified.

Although an integer literal beginning with a leading zero isn’t considered to be written in base 8 (octal), a NaN payload beginning with a leading zero is considered to be written in base 8:

let x = Double("nan(123)")!
let y = Double("nan(0123)")!

String(x.bitPattern, radix: 16) // "7ff800000000007b"
String(y.bitPattern, radix: 16) // "7ff8000000000053"

String(123, radix: 16)          // "7b"
String(0o123, radix: 16)        // "53"

Some rules are more relaxed for string conversion than for float literals: a digit is not required to precede or follow the separator dot, and a binary exponent is not required to end a hexadecimal value:

let x = Double(".5")!
// 0.5
let y = Double("5.")!
// 5
let z = Double("0x1.p2")!
// 4
let a = Double("0x.1p2")!
// 0.25
let b = Double("0x1.")!
// 1

Creating from a sign, exponent, and significand

As the name suggests, the initializer init(sign:exponent:significand:) creates a new floating-point value from the given sign, exponent, and significand. Therefore, you can use this initializer to recover a floating-point value that’s been decomposed into its sign, exponent, and significand:

let x = 42.0
Double(
  sign: x.sign,
  exponent: x.exponent,
  significand: x.significand)
// 42.0

However, significand doesn’t have to be limited to the subset of floating-point values that have positive sign and zero exponent; consequently, this initializer doesn’t always create values that have the same sign as sign or the same exponent as exponent. Rather, the result is notionally equivalent to multiplying three terms derived from the given arguments in a single operation without intermediate rounding:

let sign = FloatingPointSign.minus
let exponent = 2
let significand = -42.0

Double(
  sign: sign,
  exponent: exponent,
  significand: significand)
// 168.0

(sign == .minus ? -1 : 1)
  * exp2(Double(exponent))
  * significand
// 168.0

As shown above, the sign of the result depends on both sign and significand.sign, and the exponent of the result depends on both exponent and significand.exponent. In other words, the initializer scales the value given as significand; indeed, this initializer is Swift’s implementation of the IEEE 754 required operation scaleB, known in C/C++ as scalbn.


Previous:
Concrete binary floating-point types, part 2

Next:
Concrete binary floating-point types, part 4

Draft: 27 February–14 March 2018
Updated 29 July 2019