Portability GHC only experimental ekmett@gmail.com None

Description

Mixed-Mode Automatic Differentiation.

Each combinator exported from this module chooses an appropriate AD mode. The following basic operations are supported, modified as appropriate by the suffixes below:

• `grad` computes the gradient (partial derivatives) of a function at a point
• `jacobian` computes the Jacobian matrix of a function at a point
• `diff` computes the derivative of a function at a point
• `du` computes a directional derivative of a function at a point
• `hessian` compute the Hessian matrix (matrix of second partial derivatives) of a function at a point

The suffixes have the following meanings:

• `'` -- also return the answer
• `With` lets the user supply a function to blend the input with the output
• `F` is a version of the base function lifted to return a `Traversable` (or `Functor`) result
• `s` means the function returns all higher derivatives in a list or f-branching `Stream`
• `T` means the result is transposed with respect to the traditional formulation.
• `0` means that the resulting derivative list is padded with 0s at the end.

Synopsis

class (Num t, Num (Scalar t)) => Mode t whereSource

Methods

auto :: Scalar t -> tSource

Embed a constant

Instances

 Mode (ForwardDouble s) (Mode t, Mode (Scalar t)) => Mode (On t) Num a => Mode (Tower a s) Num a => Mode (Id a s) Num a => Mode (Sparse a s) (Num a, Reifies * s Tape) => Mode (Reverse a s) Num a => Mode (Kahn a s) Num a => Mode (Forward a s) (Num a, Traversable f) => Mode (Dense f a s)

type family Scalar t :: *Source

grad :: (Traversable f, Num a) => (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> f aSource

The `grad` function calculates the gradient of a non-scalar-to-scalar function with reverse-mode AD in a single pass.

````>>> ````grad (\[x,y,z] -> x*y+z) [1,2,3]
```[2,1,1]
```

grad' :: (Traversable f, Num a) => (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> (a, f a)Source

The `grad'` function calculates the result and gradient of a non-scalar-to-scalar function with reverse-mode AD in a single pass.

````>>> ````grad' (\[x,y,z] -> x*y+z) [1,2,3]
```(5,[2,1,1])
```

gradWith :: (Traversable f, Num a) => (a -> a -> b) -> (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> f bSource

`grad g f` function calculates the gradient of a non-scalar-to-scalar function `f` with reverse-mode AD in a single pass. The gradient is combined element-wise with the argument using the function `g`.

``` `grad` == `gradWith` (_ dx -> dx)
`id` == `gradWith` `const`
```

gradWith' :: (Traversable f, Num a) => (a -> a -> b) -> (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> (a, f b)Source

`grad' g f` calculates the result and gradient of a non-scalar-to-scalar function `f` with reverse-mode AD in a single pass the gradient is combined element-wise with the argument using the function `g`.

``` `grad'` == `gradWith'` (_ dx -> dx)
```

grads :: (Traversable f, Num a) => (forall s. f (Sparse a s) -> Sparse a s) -> f a -> Cofree f aSource

Unfortunately, variadicity comes at the expense of being able to use quantification to avoid sensitivity confusion, so be careful when counting the number of `auto` calls you use when taking the gradient of a function that takes gradients!

class Num a => Grad i o o' a | i -> a o o', o -> a i o', o' -> a i oSource

Instances

 Num a => Grad (Kahn a ()) [a] (a, [a]) a Grad i o o' a => Grad (Kahn a () -> i) (a -> o) (a -> o') a

class Num a => Grads i o a | i -> a o, o -> a iSource

Instances

 Grads i o a => Grads (Sparse a () -> i) (a -> o) a Num a => Grads (Sparse a ()) (Cofree [] a) a

# Jacobians (Sparse or Reverse)

jacobian :: (Traversable f, Functor g, Num a) => (forall s. Reifies s Tape => f (Reverse a s) -> g (Reverse a s)) -> f a -> g (f a)Source

Calculate the Jacobian of a non-scalar-to-non-scalar function, automatically choosing between sparse and Reverse mode AD.

If you know that you have relatively many outputs per input, consider using `jacobian`.

````>>> ````jacobian (\[x,y] -> [y,x,x+y,x*y,exp x * sin y]) [pi,1]
```[[0.0,1.0],[1.0,0.0],[1.0,1.0],[1.0,3.141592653589793],[19.472221418841606,12.502969588876512]]
```

jacobian' :: (Traversable f, Functor g, Num a) => (forall s. Reifies s Tape => f (Reverse a s) -> g (Reverse a s)) -> f a -> g (a, f a)Source

Calculate both the answer and Jacobian of a non-scalar-to-non-scalar function, using reverse-mode AD.

If you have relatively many outputs per input, consider using `jacobian'`.

jacobianWith :: (Traversable f, Functor g, Num a) => (a -> a -> b) -> (forall s. Reifies s Tape => f (Reverse a s) -> g (Reverse a s)) -> f a -> g (f b)Source

`jacobianWith g f` calculates the Jacobian of a non-scalar-to-non-scalar function, using Reverse mode AD.

The resulting Jacobian matrix is then recombined element-wise with the input using `g`.

If you know that you have relatively many outputs per input, consider using `jacobianWith`.

jacobianWith' :: (Traversable f, Functor g, Num a) => (a -> a -> b) -> (forall s. Reifies s Tape => f (Reverse a s) -> g (Reverse a s)) -> f a -> g (a, f b)Source

`jacobianWith' g f` calculates the answer and Jacobian of a non-scalar-to-non-scalar function, using Reverse mode AD.

The resulting Jacobian matrix is then recombined element-wise with the input using `g`.

If you know that you have relatively many outputs per input, consider using `jacobianWith'`.

# Higher Order Jacobian (Sparse-on-Reverse)

jacobians :: (Traversable f, Functor g, Num a) => (forall s. f (Sparse a s) -> g (Sparse a s)) -> f a -> g (Cofree f a)Source

# Transposed Jacobians (Forward Mode)

jacobianT :: (Traversable f, Functor g, Num a) => (forall s. f (Forward a s) -> g (Forward a s)) -> f a -> f (g a)Source

A fast, simple, transposed Jacobian computed with forward-mode AD.

jacobianWithT :: (Traversable f, Functor g, Num a) => (a -> a -> b) -> (forall s. f (Forward a s) -> g (Forward a s)) -> f a -> f (g b)Source

A fast, simple, transposed Jacobian computed with `Forward` mode `AD` that combines the output with the input.

# Hessian (Sparse-On-Reverse)

hessian :: (Traversable f, Num a) => (forall s s'. Reifies s Tape => f (On (Reverse (Sparse a s') s)) -> On (Reverse (Sparse a s') s)) -> f a -> f (f a)Source

Compute the Hessian via the Jacobian of the gradient. gradient is computed in reverse mode and then the Jacobian is computed in sparse (forward) mode.

````>>> ````hessian (\[x,y] -> x*y) [1,2]
```[[0,1],[1,0]]
```

hessian' :: (Traversable f, Num a) => (forall s. f (Sparse a s) -> Sparse a s) -> f a -> (a, f (a, f a))Source

# Hessian Tensors (Sparse or Sparse-On-Reverse)

hessianF :: (Traversable f, Functor g, Num a) => (forall s s'. Reifies s Tape => f (On (Reverse (Sparse a s') s)) -> g (On (Reverse (Sparse a s') s))) -> f a -> g (f (f a))Source

Compute the order 3 Hessian tensor on a non-scalar-to-non-scalar function using 'Sparse'-on-'Reverse'

````>>> ````hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2]
```[[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]
```

# Hessian Tensors (Sparse)

hessianF' :: (Traversable f, Functor g, Num a) => (forall s. f (Sparse a s) -> g (Sparse a s)) -> f a -> g (a, f (a, f a))Source

# Hessian Vector Products (Forward-On-Reverse)

hessianProduct :: (Traversable f, Num a) => (forall s s'. Reifies s Tape => f (On (Reverse (Forward a s') s)) -> On (Reverse (Forward a s') s)) -> f (a, a) -> f aSource

`hessianProduct f wv` computes the product of the hessian `H` of a non-scalar-to-scalar function `f` at `w = fst \$ wv` with a vector `v = snd \$ wv` using "Pearlmutter's method" from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.6143, which states:

``` H v = (d/dr) grad_w (w + r v) | r = 0
```

Or in other words, we take the directional derivative of the gradient. The gradient is calculated in reverse mode, then the directional derivative is calculated in forward mode.

hessianProduct' :: (Traversable f, Num a) => (forall s s'. Reifies s Tape => f (On (Reverse (Forward a s') s)) -> On (Reverse (Forward a s') s)) -> f (a, a) -> f (a, a)Source

`hessianProduct' f wv` computes both the gradient of a non-scalar-to-scalar `f` at `w = fst \$ wv` and the product of the hessian `H` at `w` with a vector `v = snd \$ wv` using "Pearlmutter's method". The outputs are returned wrapped in the same functor.

``` H v = (d/dr) grad_w (w + r v) | r = 0
```

Or in other words, we return the gradient and the directional derivative of the gradient. The gradient is calculated in reverse mode, then the directional derivative is calculated in forward mode.

# Derivatives (Forward Mode)

diff :: Num a => (forall s. Forward a s -> Forward a s) -> a -> aSource

The `diff` function calculates the first derivative of a scalar-to-scalar function by forward-mode `AD`

````>>> ````diff sin 0
```1.0
```

diffF :: (Functor f, Num a) => (forall s. Forward a s -> f (Forward a s)) -> a -> f aSource

The `diffF` function calculates the first derivatives of scalar-to-nonscalar function by `Forward` mode `AD`

````>>> ````diffF (\a -> [sin a, cos a]) 0
```[1.0,-0.0]
```

diff' :: Num a => (forall s. Forward a s -> Forward a s) -> a -> (a, a)Source

The `diff'` function calculates the result and first derivative of scalar-to-scalar function by `Forward` mode `AD`

``` `diff'` `sin` == `sin` `&&&` `cos`
`diff'` f = f `&&&` d f
```
````>>> ````diff' sin 0
```(0.0,1.0)
```
````>>> ````diff' exp 0
```(1.0,1.0)
```

diffF' :: (Functor f, Num a) => (forall s. Forward a s -> f (Forward a s)) -> a -> f (a, a)Source

The `diffF'` function calculates the result and first derivatives of a scalar-to-non-scalar function by `Forward` mode `AD`

````>>> ````diffF' (\a -> [sin a, cos a]) 0
```[(0.0,1.0),(1.0,-0.0)]
```

# Derivatives (Tower)

diffs :: Num a => (forall s. Tower a s -> Tower a s) -> a -> [a]Source

diffsF :: (Functor f, Num a) => (forall s. Tower a s -> f (Tower a s)) -> a -> f [a]Source

diffs0 :: Num a => (forall s. Tower a s -> Tower a s) -> a -> [a]Source

diffs0F :: (Functor f, Num a) => (forall s. Tower a s -> f (Tower a s)) -> a -> f [a]Source

# Directional Derivatives (Forward Mode)

du :: (Functor f, Num a) => (forall s. f (Forward a s) -> Forward a s) -> f (a, a) -> aSource

Compute the directional derivative of a function given a zipped up `Functor` of the input values and their derivatives

du' :: (Functor f, Num a) => (forall s. f (Forward a s) -> Forward a s) -> f (a, a) -> (a, a)Source

Compute the answer and directional derivative of a function given a zipped up `Functor` of the input values and their derivatives

duF :: (Functor f, Functor g, Num a) => (forall s. f (Forward a s) -> g (Forward a s)) -> f (a, a) -> g aSource

Compute a vector of directional derivatives for a function given a zipped up `Functor` of the input values and their derivatives.

duF' :: (Functor f, Functor g, Num a) => (forall s. f (Forward a s) -> g (Forward a s)) -> f (a, a) -> g (a, a)Source

Compute a vector of answers and directional derivatives for a function given a zipped up `Functor` of the input values and their derivatives.

# Directional Derivatives (Tower)

dus :: (Functor f, Num a) => (forall s. f (Tower a s) -> Tower a s) -> f [a] -> [a]Source

dus0 :: (Functor f, Num a) => (forall s. f (Tower a s) -> Tower a s) -> f [a] -> [a]Source

dusF :: (Functor f, Functor g, Num a) => (forall s. f (Tower a s) -> g (Tower a s)) -> f [a] -> g [a]Source

dus0F :: (Functor f, Functor g, Num a) => (forall s. f (Tower a s) -> g (Tower a s)) -> f [a] -> g [a]Source

# Taylor Series (Tower)

taylor :: Fractional a => (forall s. Tower a s -> Tower a s) -> a -> a -> [a]Source

taylor0 :: Fractional a => (forall s. Tower a s -> Tower a s) -> a -> a -> [a]Source

# Maclaurin Series (Tower)

maclaurin :: Fractional a => (forall s. Tower a s -> Tower a s) -> a -> [a]Source

maclaurin0 :: Fractional a => (forall s. Tower a s -> Tower a s) -> a -> [a]Source

gradientDescent :: (Traversable f, Fractional a, Ord a) => (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> [f a]Source

The `gradientDescent` function performs a multivariate optimization, based on the naive-gradient-descent in the file `stalingrad/examples/flow-tests/pre-saddle-1a.vlad` from the VLAD compiler Stalingrad sources. Its output is a stream of increasingly accurate results. (Modulo the usual caveats.)

It uses reverse mode automatic differentiation to compute the gradient.

gradientAscent :: (Traversable f, Fractional a, Ord a) => (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> [f a]Source

Perform a gradient descent using reverse mode automatic differentiation to compute the gradient.

conjugateGradientDescent :: (Traversable f, Ord a, Fractional a) => (forall t. (Mode t, a ~ Scalar t, Num t) => f t -> t) -> f a -> [f a]Source

Perform a conjugate gradient descent using reverse mode automatic differentiation to compute the gradient, and using forward-on-forward mode for computing extrema.

````>>> ````let sq x = x * x
````>>> ````let rosenbrock [x,y] = sq (1 - x) + 100 * sq (y - sq x)
````>>> ````rosenbrock [0,0]
```1
`>>> ````rosenbrock (conjugateGradientDescent rosenbrock [0, 0] !! 5) < 0.1
```True
```

conjugateGradientAscent :: (Traversable f, Ord a, Fractional a) => (forall t. (Mode t, a ~ Scalar t, Num t) => f t -> t) -> f a -> [f a]Source

Perform a conjugate gradient ascent using reverse mode automatic differentiation to compute the gradient.