fp-ieee-0.1.0.5: IEEE 754-2019 compliant operations
Safe HaskellSafe-Inferred
LanguageHaskell2010

Numeric.Floating.IEEE

Description

This module provides IEEE 754-compliant operations for floating-point numbers.

The functions in this module assume that the given floating-point type conform to IEEE 754 format.

Since RealFloat constraint is insufficient to query properties of a NaN, the functions here assumes all NaN as positive, quiet. If you want better treatment for NaNs, use the module Numeric.Floating.IEEE.NaN.

Since floating-point exceptions cannot be accessed from Haskell, the operations provided by this module ignore exceptional behavior. This library assumes the default exception handling is in use.

If you are using GHC <= 8.8 on i386 target, you may need to set -msse2 option to get correct floating-point behavior.

Synopsis

Standard Haskell classes

This library assumes that some of the standard numeric functions correspond to the operations specified by IEEE. The rounding attribute should be roundTiesToEven and the exceptional behavior should be the default one.

Num

  • (+), (-), and (*) should be correctly-rounding.
  • negate should comply with IEEE semantics.
  • abs should comply with IEEE semantics (GHC <= 9.4 did not handle the sign bit of NaN for Float and Double on via-C backend and SPARC NCG backend. See GHC's #21043).
  • fromInteger should be correctly-rounding, but some third-party floating-point types may fail to do so (also, GHC <= 9.0 failed to do so for Float and Double. See GHC's #17231). This module provides an always-correctly-rounding alternative: fromIntegerTiesToEven.

Fractional

  • (/) should be correctly-rounding.
  • fromRational should be correctly-rounding, but some third-partiy floating-point types fail to do so.

Floating

  • sqrt should be correctly-rounding.

RealFrac

  • truncate: IEEE 754 convertToIntegerTowardZero operation.
  • round: IEEE 754 convertToIntegerTiesToEven operation; the Language Report says that this should choose the even integer if the argument is the midpoint of two successive integers.
  • ceiling: IEEE 754 convertToIntegerTowardPositive operation.
  • floor: IEEE 754 convertToIntegerTowardNegative operation.

To complete these, roundAway is provided by this library. Note that Haskell's round is specified to be ties-to-even, whereas C's round is ties-to-away.

RealFloat

This class provides information on the IEEE-compliant format.

  • floatRadix: The base \(b\). IEEE 754 radix operation.
  • floatDigits: The precision \(p\).
  • floatRange: The exponent range offset by 1: \((\mathit{emin}+1,\mathit{emax}+1)\)
  • decodeFloat x: The exponent part returned is in the range \([\mathit{emin}+1-p,\mathit{emax}+1-p]\) if x is normal, or in \([\mathit{emin}-2p+2,\mathit{emin}-p]\) if x is subnormal.
  • encodeFloat should accept the significand in the range [0, floatRadix x ^ floatDigits x]. This library does not assume a particular rounding behavior when the result cannot be expressed in the target type.
  • exponent x: The exponent offset by 1: \(\mathrm{logB}(x)+1\). Returns an integer in \([\mathit{emin}+1,\mathit{emax}+1]\) if x is normal, or in \([\mathit{emin}-p+2,\mathit{emin}]\) if x is subnormal.
  • significand x: Returns the significand of x as a value between \([1/b,1)\).
  • scaleFloat: Rounding may occur when the result is subnormal. This library does not assume a particular rounding behavior when the result is subnormal.
  • isNaN
  • isInfinite
  • isDenormalized
  • isNegativeZero
  • isIEEE should return True if you are using the type with this library.

Other functions

Here is a list of known issues with other floating-point functions on GHC 8.6 or later:

5.3 Homogeneous general-computational operations

5.3.1 General operations

round' :: RealFloat a => a -> a Source #

round' x returns the nearest integral value to x; the even integer if x is equidistant between two integers.

IEEE 754 roundToIntegralTiesToEven operation.

\(x :: Double) -> isFinite x ==> (round' x == fromInteger (round x))
>>> round' (-0.5)
-0.0

roundAway' :: RealFloat a => a -> a Source #

roundAway' x returns the nearest integral value to x; the one with larger magnitude is returned if x is equidistant between two integers.

IEEE 754 roundToIntegralTiesToAway operation.

\(x :: Double) -> isFinite x ==> roundAway' x == fromInteger (roundAway x)
>>> roundAway' (-0.5)
-1.0
>>> roundAway' (-0.4)
-0.0

truncate' :: RealFloat a => a -> a Source #

truncate' x returns the integral value nearest to x, and whose magnitude is not greater than that of x.

IEEE 754 roundToIntegralTowardZero operation.

\(x :: Double) -> isFinite x ==> truncate' x == fromInteger (truncate x)
>>> truncate' (-0.5)
-0.0

ceiling' :: RealFloat a => a -> a Source #

ceiling' x returns the least integral value that is not less than x.

IEEE 754 roundToIntegralTowardPositive operation.

\(x :: Double) -> isFinite x ==> ceiling' x == fromInteger (ceiling x)
>>> ceiling' (-0.8)
-0.0
>>> ceiling' (-0.5)
-0.0

floor' :: RealFloat a => a -> a Source #

floor' x returns the greatest integral value that is not greater than x.

IEEE 754 roundToIntegralTowardNegative operation.

\(x :: Double) -> isFinite x ==> floor' x == fromInteger (floor x)
>>> floor' (-0.1)
-1.0
>>> floor' (-0)
-0.0

nextUp :: RealFloat a => a -> a Source #

Returns the smallest value that is larger than the argument.

IEEE 754 nextUp operation.

>>> nextUp 1 == (0x1.000002p0 :: Float)
True
>>> nextUp 1 == (0x1.0000_0000_0000_1p0 :: Double)
True
>>> nextUp (1/0) == (1/0 :: Double)
True
>>> nextUp (-1/0) == (- maxFinite :: Double)
True
>>> nextUp 0 == (0x1p-1074 :: Double)
True
>>> nextUp (-0) == (0x1p-1074 :: Double)
True
>>> nextUp (-0x1p-1074) :: Double -- returns negative zero
-0.0

nextDown :: RealFloat a => a -> a Source #

Returns the largest value that is smaller than the argument.

IEEE 754 nextDown operation.

>>> nextDown 1 == (0x1.ffff_ffff_ffff_fp-1 :: Double)
True
>>> nextDown 1 == (0x1.fffffep-1 :: Float)
True
>>> nextDown (1/0) == (maxFinite :: Double)
True
>>> nextDown (-1/0) == (-1/0 :: Double)
True
>>> nextDown 0 == (-0x1p-1074 :: Double)
True
>>> nextDown (-0) == (-0x1p-1074 :: Double)
True
>>> nextDown 0x1p-1074 -- returns positive zero
0.0
>>> nextDown 0x1p-1022 == (0x0.ffff_ffff_ffff_fp-1022 :: Double)
True

nextTowardZero :: RealFloat a => a -> a Source #

Returns the value whose magnitude is smaller than that of the argument, and is closest to the argument.

This operation is not in IEEE, but may be useful to some.

>>> nextTowardZero 1 == (0x1.ffff_ffff_ffff_fp-1 :: Double)
True
>>> nextTowardZero 1 == (0x1.fffffep-1 :: Float)
True
>>> nextTowardZero (1/0) == (maxFinite :: Double)
True
>>> nextTowardZero (-1/0) == (-maxFinite :: Double)
True
>>> nextTowardZero 0 :: Double -- returns positive zero
0.0
>>> nextTowardZero (-0 :: Double) -- returns negative zero
-0.0
>>> nextTowardZero 0x1p-1074 :: Double
0.0

remainder :: RealFloat a => a -> a -> a Source #

remainder x y returns \(r=x-yn\), where \(n\) is the integer nearest the exact number \(x/y\); i.e. \(n=\mathrm{round}(x/y)\).

IEEE 754 remainder operation.

5.3.2 Decimal operations (not supported)

Not supported.

5.3.3 logBFormat operations

scaleFloatTiesToEven :: RealFloat a => Int -> a -> a Source #

IEEE 754 scaleB operation, with each rounding attributes.

scaleFloatTiesToAway :: RealFloat a => Int -> a -> a Source #

IEEE 754 scaleB operation, with each rounding attributes.

scaleFloatTowardPositive :: RealFloat a => Int -> a -> a Source #

IEEE 754 scaleB operation, with each rounding attributes.

scaleFloatTowardNegative :: RealFloat a => Int -> a -> a Source #

IEEE 754 scaleB operation, with each rounding attributes.

scaleFloatTowardZero :: RealFloat a => Int -> a -> a Source #

IEEE 754 scaleB operation, with each rounding attributes.

The Haskell counterpart for IEEE 754 logB operation is exponent. Note that logB and exponent are different by one: logB x = exponent x - 1

exponent :: RealFloat a => a -> Int #

exponent corresponds to the second component of decodeFloat. exponent 0 = 0 and for finite nonzero x, exponent x = snd (decodeFloat x) + floatDigits x. If x is a finite floating-point number, it is equal in value to significand x * b ^^ exponent x, where b is the floating-point radix. The behaviour is unspecified on infinite or NaN values.

5.4 formatOf general-computational operations

5.4.1 Arithmetic operations

For IEEE-compliant floating-point types, (+), (-), (*), (/), and sqrt from Prelude should be correctly-rounding. fusedMultiplyAdd is provided by this library. This library also provides "generic" version of the arithmetic operations, which can be useful if the target type is narrower than source.

(+) :: Num a => a -> a -> a infixl 6 #

(-) :: Num a => a -> a -> a infixl 6 #

(*) :: Num a => a -> a -> a infixl 7 #

(/) :: Fractional a => a -> a -> a infixl 7 #

Fractional division.

sqrt :: Floating a => a -> a #

fusedMultiplyAdd :: RealFloat a => a -> a -> a -> a Source #

fusedMultiplyAdd a b c computes a * b + c as a single, ternary operation. Rounding is done only once.

May make use of hardware FMA instructions if the target architecture has it; set fma3 package flag on x86 systems.

IEEE 754 fusedMultiplyAdd operation.

\(a :: Double) (b :: Double) (c :: Double) -> fusedMultiplyAdd a b c == fromRational (toRational a * toRational b + toRational c)

genericAdd :: (RealFloat a, RealFloat b) => a -> a -> b infixl 6 Source #

IEEE 754 addition operation.

genericSub :: (RealFloat a, RealFloat b) => a -> a -> b infixl 6 Source #

IEEE 754 subtraction operation.

genericMul :: (RealFloat a, RealFloat b) => a -> a -> b infixl 7 Source #

IEEE 754 multiplication operation.

genericDiv :: (RealFloat a, RealFloat b) => a -> a -> b infixl 7 Source #

IEEE 754 division operation.

genericSqrt is not implemented yet.

genericFusedMultiplyAdd :: (RealFloat a, RealFloat b) => a -> a -> a -> b Source #

IEEE 754 fusedMultiplyAdd operation.

fromIntegerTiesToEven :: RealFloat a => Integer -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegerTiesToAway :: RealFloat a => Integer -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegerTowardPositive :: RealFloat a => Integer -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegerTowardNegative :: RealFloat a => Integer -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegerTowardZero :: RealFloat a => Integer -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegralTiesToEven :: (Integral i, RealFloat a) => i -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegralTiesToAway :: (Integral i, RealFloat a) => i -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegralTowardPositive :: (Integral i, RealFloat a) => i -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegralTowardNegative :: (Integral i, RealFloat a) => i -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromIntegralTowardZero :: (Integral i, RealFloat a) => i -> a Source #

IEEE 754 convertFromInt operation, with each rounding attributes.

fromRationalTiesToEven :: RealFloat a => Rational -> a Source #

Conversion from a rational number to floating-point value, with each rounding attributes.

fromRationalTiesToAway :: RealFloat a => Rational -> a Source #

Conversion from a rational number to floating-point value, with each rounding attributes.

fromRationalTowardPositive :: RealFloat a => Rational -> a Source #

Conversion from a rational number to floating-point value, with each rounding attributes.

fromRationalTowardNegative :: RealFloat a => Rational -> a Source #

Conversion from a rational number to floating-point value, with each rounding attributes.

fromRationalTowardZero :: RealFloat a => Rational -> a Source #

Conversion from a rational number to floating-point value, with each rounding attributes.

round :: (RealFrac a, Integral b) => a -> b #

round x returns the nearest integer to x; the even integer if x is equidistant between two integers

roundAway :: (RealFrac a, Integral b) => a -> b Source #

roundAway x returns the nearest integer to x; the integer with larger magnitude is returned if x is equidistant between two integers.

IEEE 754 convertToIntegerTiesToAway operation.

>>> roundAway 4.5
5

truncate :: (RealFrac a, Integral b) => a -> b #

truncate x returns the integer nearest x between zero and x

ceiling :: (RealFrac a, Integral b) => a -> b #

ceiling x returns the least integer not less than x

floor :: (RealFrac a, Integral b) => a -> b #

floor x returns the greatest integer not greater than x

5.4.2 Conversion operations for floating-point formats and decimal character sequences

Unfortunately, realToFrac does not have a good semantics, and behaves differently with rewrite rules (consider realToFrac (0/0 :: Float) :: Double). As an alternative, this library provides realFloatToFrac, with well-defined semantics on signed zeroes, infinities and NaNs. Like realToFrac, realFloatToFrac comes with some rewrite rules for particular types, but they should not change behavior.

realFloatToFrac :: (RealFloat a, Fractional b) => a -> b Source #

Converts a floating-point value into another type.

Similar to realToFrac, but treats NaN, infinities, negative zero even if the rewrite rule is off.

IEEE 754 convertFormat operation.

canonicalize :: RealFloat a => a -> a Source #

A specialized version of realFloatToFrac.

The resulting value will be canonical and non-signaling.

convertFromDecimalCharacter: not implemented.

convertToDecimalCharacter: not implemented.

5.4.3 Conversion operations for binary formats

convertFromHexCharacter: not implemented.

convertToHexCharacter: showHFloat from Numeric can be used.

5.5 Quiet-computational operations

5.5.1 Sign bit operations

For IEEE-compliant floating-point types, negate from Prelude should comply with IEEE semantics. On GHC 9.6 or later, abs should also comply with IEEE semantics (GHC <= 9.4 did not handle the sign bit of NaN on via-C backend and SPARC NCG backend).

negate :: Num a => a -> a #

Unary negation.

abs :: Num a => a -> a #

Absolute value.

See Numeric.Floating.IEEE.NaN for copySign.

5.5.2 Decimal re-encoding operations (not supported)

Not supported.

5.6 Signaling-computational operations

5.6.1 Comparisons (not supported)

This library does not support floating-point exceptions.

5.7 Non-computational operations

5.7.1 Conformance predicates (not supported)

Not supported.

5.7.2 General operations

Functions in this module disregards the content of NaNs: sign bit, signaling-or-quiet, and payload. All NaNs are treated as quiet, positive. To properly handle NaNs, use the typeclass and functions from Numeric.Floating.IEEE.NaN.

data Class Source #

The classification of floating-point values.

classify :: RealFloat a => a -> Class Source #

Classifies a floating-point value.

Since RealFloat constraint is insufficient to query signaling status of a NaN, this function treats all NaNs as quiet. See also Numeric.Floating.IEEE.NaN.

isSignMinus :: RealFloat a => a -> Bool Source #

Returns True if the argument is negative (including negative zero).

Since RealFloat constraint is insufficient to query the sign of NaNs, this function treats all NaNs as positive. See also Numeric.Floating.IEEE.NaN.

IEEE 754 isSignMinus operation.

isNormal :: RealFloat a => a -> Bool Source #

IEEE 754 isNormal operation.

isFinite :: RealFloat a => a -> Bool Source #

Returns True if the argument is normal, subnormal, or zero.

IEEE 754 isFinite operation.

isZero :: RealFloat a => a -> Bool Source #

Returns True if the argument is zero.

IEEE 754 isZero operation.

isDenormalized :: RealFloat a => a -> Bool #

True if the argument is too small to be represented in normalized format

isInfinite :: RealFloat a => a -> Bool #

True if the argument is an IEEE infinity or negative infinity

isNaN :: RealFloat a => a -> Bool #

True if the argument is an IEEE "not-a-number" (NaN) value

See Numeric.Floating.IEEE.NaN for isSignaling.

isCanonical: not supported.

floatRadix :: RealFloat a => a -> Integer #

a constant function, returning the radix of the representation (often 2)

compareByTotalOrder :: RealFloat a => a -> a -> Ordering Source #

Comparison with IEEE 754 totalOrder predicate.

Floating-point numbers are ordered as, \(-\infty < \text{negative reals} < -0 < +0 < \text{positive reals} < +\infty < \mathrm{NaN}\).

Since RealFloat constraint is insufficient to query the sign and payload of NaNs, this function treats all NaNs as positive and does not make distinction between them. See also Numeric.Floating.IEEE.NaN.

Also, for the same reason, this function cannot distinguish the members of a cohort.

compareByTotalOrderMag :: RealFloat a => a -> a -> Ordering Source #

Comparison with IEEE 754 totalOrderMag predicate.

Equivalent to compareByTotalOrder (abs x) (abs y).

5.7.3 Decimal operation (not supported)

Not supported.

5.7.4 Operations on subsets of flags (not supported)

Not supported.

9. Recommended operations

9.5 Augmented arithmetic operations

augmentedAddition :: RealFloat a => a -> a -> (a, a) Source #

IEEE 754 augmentedAddition operation.

The first return value is the approximation of the sum, and the second return value is the error.

fst (augmentedAddition x y) == roundTiesTowardZero (fromRationalR (toRational x + toRational y)) `const` (x :: Double)
let (u, v) = augmentedAddition x y in toRational u + toRational v == toRational x + toRational y `const` (x :: Double)

augmentedSubtraction :: RealFloat a => a -> a -> (a, a) Source #

IEEE 754 augmentedSubtraction operation.

The first return value is the approximation of the difference, and the second return value is the error.

fst (augmentedSubtraction x y) == roundTiesTowardZero (fromRationalR (toRational x - toRational y)) `const` (x :: Double)
let (u, v) = augmentedSubtraction x y in toRational u + toRational v == toRational x - toRational y `const` (x :: Double)

augmentedMultiplication :: RealFloat a => a -> a -> (a, a) Source #

IEEE 754 augmentedMultiplication operation.

The first return value is the approximation of the product, and the second return value is the error.

fst (augmentedMultiplication x y) == roundTiesTowardZero (fromRationalR (toRational x * toRational y)) `const` (x :: Double)
let (u, v) = augmentedMultiplication x y in toRational u + toRational v == toRational x * toRational y `const` (x :: Double)

9.6 Minimum and maximum operations

minimum' :: RealFloat a => a -> a -> a Source #

IEEE 754 minimum operation. -0 is smaller than +0. Propagates NaNs.

minimumNumber :: RealFloat a => a -> a -> a Source #

IEEE 754 minimumNumber operation. -0 is smaller than +0. Treats NaNs as missing data.

maximum' :: RealFloat a => a -> a -> a Source #

IEEE 754 maximum operation. -0 is smaller than +0. Propagates NaNs.

maximumNumber :: RealFloat a => a -> a -> a Source #

IEEE 754 maximumNumber operation. -0 is smaller than +0. Treats NaNs as missing data.

minimumMagnitude :: RealFloat a => a -> a -> a Source #

IEEE 754 minimumMagnitude operation.

minimumMagnitudeNumber :: RealFloat a => a -> a -> a Source #

IEEE 754 minimumMagnitudeNumber operation.

maximumMagnitude :: RealFloat a => a -> a -> a Source #

IEEE 754 maximumMagnitude operation.

maximumMagnitudeNumber :: RealFloat a => a -> a -> a Source #

IEEE 754 maximumMagnitudeNumber operation.

Floating-point constants

minPositive :: RealFloat a => a Source #

The smallest positive value expressible in an IEEE floating-point format. This value is subnormal.

>>> (minPositive :: Float) == 0x1p-149
True
>>> (minPositive :: Double) == 0x1p-1074
True
>>> nextDown (minPositive :: Float)
0.0
>>> nextDown (minPositive :: Double)
0.0

minPositiveNormal :: RealFloat a => a Source #

The smallest positive normal value expressible in an IEEE floating-point format.

>>> (minPositiveNormal :: Float) == 0x1p-126
True
>>> (minPositiveNormal :: Double) == 0x1p-1022
True
>>> isDenormalized (minPositiveNormal :: Float)
False
>>> isDenormalized (minPositiveNormal :: Double)
False
>>> isDenormalized (nextDown (minPositiveNormal :: Float))
True
>>> isDenormalized (nextDown (minPositiveNormal :: Double))
True

maxFinite :: RealFloat a => a Source #

The largest finite value expressible in an IEEE floating-point format.

>>> (maxFinite :: Float) == 0x1.fffffep+127
True
>>> (maxFinite :: Double) == 0x1.ffff_ffff_ffff_fp+1023
True