Copyright | Copyright (c) 2007--2015 wren gayle romano |
---|---|

License | BSD3 |

Maintainer | wren@community.haskell.org |

Stability | stable |

Portability | portable (with CPP, FFI) |

Safe Haskell | None |

Language | Haskell98 |

This module presents a type for storing numbers in the log-domain.
The main reason for doing this is to prevent underflow when
multiplying many small probabilities as is done in Hidden Markov
Models and other statistical models often used for natural
language processing. The log-domain also helps prevent overflow
when multiplying many large numbers. In rare cases it can speed
up numerical computation (since addition is faster than
multiplication, though logarithms are exceptionally slow), but
the primary goal is to improve accuracy of results. A secondary
goal has been to maximize efficiency since these computations
are frequently done within a *O(n^3)* loop.

The `LogFloat`

of this module is restricted to non-negative
numbers for efficiency's sake, see Data.Number.LogFloat.Signed
for doing signed log-domain calculations.

- module Data.Number.Transfinite
- data LogFloat
- logFloat :: Double -> LogFloat
- fromLogFloat :: LogFloat -> Double
- logToLogFloat :: Double -> LogFloat
- logFromLogFloat :: LogFloat -> Double
- sum :: [LogFloat] -> LogFloat
- product :: [LogFloat] -> LogFloat
- pow :: LogFloat -> Double -> LogFloat
- log1p :: Double -> Double
- expm1 :: Double -> Double

# Exceptional numeric values

module Data.Number.Transfinite

`LogFloat`

data type

A `LogFloat`

is just a `Double`

with a special interpretation.
The `logFloat`

function is presented instead of the constructor,
in order to ensure semantic conversion. At present the `Show`

instance will convert back to the normal-domain, and hence will
underflow at that point. This behavior may change in the future.

Because `logFloat`

performs the semantic conversion, we can use
operators which say what we *mean* rather than saying what we're
actually doing to the underlying representation. That is,
equivalences like the following are true[1] thanks to type-class
overloading:

logFloat (p + q) == logFloat p + logFloat q logFloat (p * q) == logFloat p * logFloat q

(Do note, however, that subtraction can and negation will throw
errors: since `LogFloat`

can only represent the positive half of
`Double`

. `Num`

is the wrong abstraction to put at the bottom
of the numeric type-class hierarchy; but alas, we're stuck with
it.)

Performing operations in the log-domain is cheap, prevents
underflow, and is otherwise very nice for dealing with miniscule
probabilities. However, crossing into and out of the log-domain
is expensive and should be avoided as much as possible. In
particular, if you're doing a series of multiplications as in
`lp * logFloat q * logFloat r`

it's faster to do ```
lp * logFloat
(q * r)
```

if you're reasonably sure the normal-domain multiplication
won't underflow; because that way you enter the log-domain only
once, instead of twice. Also note that, for precision, if you're
doing more than a few multiplications in the log-domain, you
should use `product`

rather than using '(*)' repeatedly.

Even more particularly, you should *avoid addition* whenever
possible. Addition is provided because sometimes we need it, and
the proper implementation is not immediately apparent. However,
between two `LogFloat`

s addition requires crossing the exp/log
boundary twice; with a `LogFloat`

and a `Double`

it's three
times, since the regular number needs to enter the log-domain
first. This makes addition incredibly slow. Again, if you can
parenthesize to do normal-domain operations first, do it!

- 1
- That is, true up-to underflow and floating point fuzziness. Which is, of course, the whole point of this module.

## Isomorphism to normal-domain

logFloat :: Double -> LogFloat Source

Constructor which does semantic conversion from normal-domain
to log-domain. Throws errors on negative and NaN inputs. If `p`

is non-negative, then following equivalence holds:

logFloat p == logToLogFloat (log p)

If `p`

is NaN or negative, then the two sides differ only in
which error is thrown.

fromLogFloat :: LogFloat -> Double Source

Semantically convert our log-domain value back into the normal-domain. Beware of overflow/underflow. The following equivalence holds (without qualification):

fromLogFloat == exp . logFromLogFloat

## Isomorphism to log-domain

logToLogFloat :: Double -> LogFloat Source

Constructor which assumes the argument is already in the
log-domain. Throws errors on `notANumber`

inputs.

logFromLogFloat :: LogFloat -> Double Source

Return the log-domain value itself without conversion.

## Additional operations

sum :: [LogFloat] -> LogFloat Source

*O(n)*. Compute the sum of a finite list of `LogFloat`

s, being
careful to avoid underflow issues. That is, the following
equivalence holds (modulo underflow and all that):

logFloat . sum == sum . map logFloat

*N.B.*, this function requires two passes over the input. Thus,
it is not amenable to list fusion, and hence will use a lot of
memory when summing long lists.

*Since: 0.13*

product :: [LogFloat] -> LogFloat Source

*O(n)*. Compute the product of a finite list of `LogFloat`

s,
being careful to avoid numerical error due to loss of precision.
That is, the following equivalence holds (modulo underflow and
all that):

logFloat . product == product . map logFloat

*Since: 0.13*

pow :: LogFloat -> Double -> LogFloat infixr 8 Source

*O(1)*. Compute powers in the log-domain; that is, the following
equivalence holds (modulo underflow and all that):

logFloat (p ** m) == logFloat p `pow` m

*Since: 0.13*

# Accurate versions of logarithm/exponentiation

log1p :: Double -> Double Source

Definition: `log1p == log . (1+)`

. Standard C libraries provide
a special definition for `log1p`

which is more accurate than
doing the naive thing, especially for very small arguments. For
example, the naive version underflows around `2 ** -53`

, whereas
the specialized version underflows around `2 ** -1074`

. This
function is used by (`+`

) and (`-`

) on `LogFloat`

.

N.B. The `statistics:Statistics.Math`

module provides a pure
Haskell implementation of `log1p`

for those who are interested.
We do not copy it here because it relies on the `vector`

package
which is non-portable. If there is sufficient interest, a portable
variant of that implementation could be made. Contact the
maintainer if the FFI and naive implementations are insufficient
for your needs.

*This installation was compiled to use the FFI version.*

expm1 :: Double -> Double Source

Definition: `expm1 == subtract 1 . exp`

. Standard C libraries
provide a special definition for `expm1`

which is more accurate
than doing the naive thing, especially for very small arguments.
This function isn't needed internally, but is provided for
symmetry with `log1p`

.

*This installation was compiled to use the FFI version.*