Portability GHC only experimental ekmett@gmail.com None

Description

Mixed-Mode Automatic Differentiation.

Each combinator exported from this module chooses an appropriate AD mode. The following basic operations are supported, modified as appropriate by the suffixes below:

• `grad` computes the gradient (partial derivatives) of a function at a point
• `jacobian` computes the Jacobian matrix of a function at a point
• `diff` computes the derivative of a function at a point
• `du` computes a directional derivative of a function at a point
• `hessian` compute the Hessian matrix (matrix of second partial derivatives) of a function at a point

The suffixes have the following meanings:

• `'` -- also return the answer
• `With` lets the user supply a function to blend the input with the output
• `F` is a version of the base function lifted to return a `Traversable` (or `Functor`) result
• `s` means the function returns all higher derivatives in a list or f-branching `Stream`
• `T` means the result is transposed with respect to the traditional formulation.
• `0` means that the resulting derivative list is padded with 0s at the end.

Synopsis

class (Num t, Num (Scalar t)) => Mode t whereSource

Methods

auto :: Scalar t -> tSource

Embed a constant

Instances

 Mode (ForwardDouble s) (Mode t, Mode (Scalar t)) => Mode (On t) Num a => Mode (Tower a s) Num a => Mode (Id a s) Num a => Mode (Sparse a s) (Num a, Reifies * s Tape) => Mode (Reverse a s) Num a => Mode (Kahn a s) Num a => Mode (Forward a s) (Num a, Traversable f) => Mode (Dense f a s)

type family Scalar t :: *Source

grad :: (Traversable f, Num a) => (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> f aSource

The `grad` function calculates the gradient of a non-scalar-to-scalar function with reverse-mode AD in a single pass.

````>>> ````grad (\[x,y,z] -> x*y+z) [1,2,3]
```[2,1,1]
```

grad' :: (Traversable f, Num a) => (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> (a, f a)Source

The `grad'` function calculates the result and gradient of a non-scalar-to-scalar function with reverse-mode AD in a single pass.

````>>> ````grad' (\[x,y,z] -> x*y+z) [1,2,3]
```(5,[2,1,1])
```

gradWith :: (Traversable f, Num a) => (a -> a -> b) -> (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> f bSource

`grad g f` function calculates the gradient of a non-scalar-to-scalar function `f` with reverse-mode AD in a single pass. The gradient is combined element-wise with the argument using the function `g`.

``` `grad` == `gradWith` (_ dx -> dx)
`id` == `gradWith` `const`
```

gradWith' :: (Traversable f, Num a) => (a -> a -> b) -> (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> (a, f b)Source

`grad' g f` calculates the result and gradient of a non-scalar-to-scalar function `f` with reverse-mode AD in a single pass the gradient is combined element-wise with the argument using the function `g`.

``` `grad'` == `gradWith'` (_ dx -> dx)
```

grads :: (Traversable f, Num a) => (forall s. f (Sparse a s) -> Sparse a s) -> f a -> Cofree f aSource

Unfortunately, variadicity comes at the expense of being able to use quantification to avoid sensitivity confusion, so be careful when counting the number of `auto` calls you use when taking the gradient of a function that takes gradients!

class Num a => Grad i o o' a | i -> a o o', o -> a i o', o' -> a i oSource

Instances

 Num a => Grad (Kahn a ()) [a] (a, [a]) a Grad i o o' a => Grad (Kahn a () -> i) (a -> o) (a -> o') a

class Num a => Grads i o a | i -> a o, o -> a iSource

Instances

 Grads i o a => Grads (Sparse a () -> i) (a -> o) a Num a => Grads (Sparse a ()) (Cofree [] a) a

Jacobians (Sparse or Reverse)

jacobian :: (Traversable f, Functor g, Num a) => (forall s. Reifies s Tape => f (Reverse a s) -> g (Reverse a s)) -> f a -> g (f a)Source

Calculate the Jacobian of a non-scalar-to-non-scalar function, automatically choosing between sparse and Reverse mode AD.

If you know that you have relatively many outputs per input, consider using `jacobian`.

````>>> ````jacobian (\[x,y] -> [y,x,x+y,x*y,exp x * sin y]) [pi,1]
```[[0.0,1.0],[1.0,0.0],[1.0,1.0],[1.0,3.141592653589793],[19.472221418841606,12.502969588876512]]
```

jacobian' :: (Traversable f, Functor g, Num a) => (forall s. Reifies s Tape => f (Reverse a s) -> g (Reverse a s)) -> f a -> g (a, f a)Source

Calculate both the answer and Jacobian of a non-scalar-to-non-scalar function, using reverse-mode AD.

If you have relatively many outputs per input, consider using `jacobian'`.

jacobianWith :: (Traversable f, Functor g, Num a) => (a -> a -> b) -> (forall s. Reifies s Tape => f (Reverse a s) -> g (Reverse a s)) -> f a -> g (f b)Source

`jacobianWith g f` calculates the Jacobian of a non-scalar-to-non-scalar function, using Reverse mode AD.

The resulting Jacobian matrix is then recombined element-wise with the input using `g`.

If you know that you have relatively many outputs per input, consider using `jacobianWith`.

jacobianWith' :: (Traversable f, Functor g, Num a) => (a -> a -> b) -> (forall s. Reifies s Tape => f (Reverse a s) -> g (Reverse a s)) -> f a -> g (a, f b)Source

`jacobianWith' g f` calculates the answer and Jacobian of a non-scalar-to-non-scalar function, using Reverse mode AD.

The resulting Jacobian matrix is then recombined element-wise with the input using `g`.

If you know that you have relatively many outputs per input, consider using `jacobianWith'`.

Higher Order Jacobian (Sparse-on-Reverse)

jacobians :: (Traversable f, Functor g, Num a) => (forall s. f (Sparse a s) -> g (Sparse a s)) -> f a -> g (Cofree f a)Source

Transposed Jacobians (Forward Mode)

jacobianT :: (Traversable f, Functor g, Num a) => (forall s. f (Forward a s) -> g (Forward a s)) -> f a -> f (g a)Source

A fast, simple, transposed Jacobian computed with forward-mode AD.

jacobianWithT :: (Traversable f, Functor g, Num a) => (a -> a -> b) -> (forall s. f (Forward a s) -> g (Forward a s)) -> f a -> f (g b)Source

A fast, simple, transposed Jacobian computed with `Forward` mode `AD` that combines the output with the input.

Hessian (Sparse-On-Reverse)

hessian :: (Traversable f, Num a) => (forall s s'. Reifies s Tape => f (On (Reverse (Sparse a s') s)) -> On (Reverse (Sparse a s') s)) -> f a -> f (f a)Source

Compute the Hessian via the Jacobian of the gradient. gradient is computed in reverse mode and then the Jacobian is computed in sparse (forward) mode.

````>>> ````hessian (\[x,y] -> x*y) [1,2]
```[[0,1],[1,0]]
```

hessian' :: (Traversable f, Num a) => (forall s. f (Sparse a s) -> Sparse a s) -> f a -> (a, f (a, f a))Source

Hessian Tensors (Sparse or Sparse-On-Reverse)

hessianF :: (Traversable f, Functor g, Num a) => (forall s s'. Reifies s Tape => f (On (Reverse (Sparse a s') s)) -> g (On (Reverse (Sparse a s') s))) -> f a -> g (f (f a))Source

Compute the order 3 Hessian tensor on a non-scalar-to-non-scalar function using 'Sparse'-on-'Reverse'

````>>> ````hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2]
```[[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]
```

Hessian Tensors (Sparse)

hessianF' :: (Traversable f, Functor g, Num a) => (forall s. f (Sparse a s) -> g (Sparse a s)) -> f a -> g (a, f (a, f a))Source

Hessian Vector Products (Forward-On-Reverse)

hessianProduct :: (Traversable f, Num a) => (forall s s'. Reifies s Tape => f (On (Reverse (Forward a s') s)) -> On (Reverse (Forward a s') s)) -> f (a, a) -> f aSource

`hessianProduct f wv` computes the product of the hessian `H` of a non-scalar-to-scalar function `f` at `w = fst \$ wv` with a vector `v = snd \$ wv` using "Pearlmutter's method" from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.6143, which states:

``` H v = (d/dr) grad_w (w + r v) | r = 0
```

Or in other words, we take the directional derivative of the gradient. The gradient is calculated in reverse mode, then the directional derivative is calculated in forward mode.

hessianProduct' :: (Traversable f, Num a) => (forall s s'. Reifies s Tape => f (On (Reverse (Forward a s') s)) -> On (Reverse (Forward a s') s)) -> f (a, a) -> f (a, a)Source

`hessianProduct' f wv` computes both the gradient of a non-scalar-to-scalar `f` at `w = fst \$ wv` and the product of the hessian `H` at `w` with a vector `v = snd \$ wv` using "Pearlmutter's method". The outputs are returned wrapped in the same functor.

``` H v = (d/dr) grad_w (w + r v) | r = 0
```

Or in other words, we return the gradient and the directional derivative of the gradient. The gradient is calculated in reverse mode, then the directional derivative is calculated in forward mode.

Derivatives (Forward Mode)

diff :: Num a => (forall s. Forward a s -> Forward a s) -> a -> aSource

The `diff` function calculates the first derivative of a scalar-to-scalar function by forward-mode `AD`

````>>> ````diff sin 0
```1.0
```

diffF :: (Functor f, Num a) => (forall s. Forward a s -> f (Forward a s)) -> a -> f aSource

The `diffF` function calculates the first derivatives of scalar-to-nonscalar function by `Forward` mode `AD`

````>>> ````diffF (\a -> [sin a, cos a]) 0
```[1.0,-0.0]
```

diff' :: Num a => (forall s. Forward a s -> Forward a s) -> a -> (a, a)Source

The `diff'` function calculates the result and first derivative of scalar-to-scalar function by `Forward` mode `AD`

``` `diff'` `sin` == `sin` `&&&` `cos`
`diff'` f = f `&&&` d f
```
````>>> ````diff' sin 0
```(0.0,1.0)
```
````>>> ````diff' exp 0
```(1.0,1.0)
```

diffF' :: (Functor f, Num a) => (forall s. Forward a s -> f (Forward a s)) -> a -> f (a, a)Source

The `diffF'` function calculates the result and first derivatives of a scalar-to-non-scalar function by `Forward` mode `AD`

````>>> ````diffF' (\a -> [sin a, cos a]) 0
```[(0.0,1.0),(1.0,-0.0)]
```

Derivatives (Tower)

diffs :: Num a => (forall s. Tower a s -> Tower a s) -> a -> [a]Source

diffsF :: (Functor f, Num a) => (forall s. Tower a s -> f (Tower a s)) -> a -> f [a]Source

diffs0 :: Num a => (forall s. Tower a s -> Tower a s) -> a -> [a]Source

diffs0F :: (Functor f, Num a) => (forall s. Tower a s -> f (Tower a s)) -> a -> f [a]Source

Directional Derivatives (Forward Mode)

du :: (Functor f, Num a) => (forall s. f (Forward a s) -> Forward a s) -> f (a, a) -> aSource

Compute the directional derivative of a function given a zipped up `Functor` of the input values and their derivatives

du' :: (Functor f, Num a) => (forall s. f (Forward a s) -> Forward a s) -> f (a, a) -> (a, a)Source

Compute the answer and directional derivative of a function given a zipped up `Functor` of the input values and their derivatives

duF :: (Functor f, Functor g, Num a) => (forall s. f (Forward a s) -> g (Forward a s)) -> f (a, a) -> g aSource

Compute a vector of directional derivatives for a function given a zipped up `Functor` of the input values and their derivatives.

duF' :: (Functor f, Functor g, Num a) => (forall s. f (Forward a s) -> g (Forward a s)) -> f (a, a) -> g (a, a)Source

Compute a vector of answers and directional derivatives for a function given a zipped up `Functor` of the input values and their derivatives.

Directional Derivatives (Tower)

dus :: (Functor f, Num a) => (forall s. f (Tower a s) -> Tower a s) -> f [a] -> [a]Source

dus0 :: (Functor f, Num a) => (forall s. f (Tower a s) -> Tower a s) -> f [a] -> [a]Source

dusF :: (Functor f, Functor g, Num a) => (forall s. f (Tower a s) -> g (Tower a s)) -> f [a] -> g [a]Source

dus0F :: (Functor f, Functor g, Num a) => (forall s. f (Tower a s) -> g (Tower a s)) -> f [a] -> g [a]Source

Taylor Series (Tower)

taylor :: Fractional a => (forall s. Tower a s -> Tower a s) -> a -> a -> [a]Source

taylor0 :: Fractional a => (forall s. Tower a s -> Tower a s) -> a -> a -> [a]Source

Maclaurin Series (Tower)

maclaurin :: Fractional a => (forall s. Tower a s -> Tower a s) -> a -> [a]Source

maclaurin0 :: Fractional a => (forall s. Tower a s -> Tower a s) -> a -> [a]Source

gradientDescent :: (Traversable f, Fractional a, Ord a) => (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> [f a]Source

The `gradientDescent` function performs a multivariate optimization, based on the naive-gradient-descent in the file `stalingrad/examples/flow-tests/pre-saddle-1a.vlad` from the VLAD compiler Stalingrad sources. Its output is a stream of increasingly accurate results. (Modulo the usual caveats.)

It uses reverse mode automatic differentiation to compute the gradient.

gradientAscent :: (Traversable f, Fractional a, Ord a) => (forall s. Reifies s Tape => f (Reverse a s) -> Reverse a s) -> f a -> [f a]Source

Perform a gradient descent using reverse mode automatic differentiation to compute the gradient.

conjugateGradientDescent :: (Traversable f, Ord a, Fractional a) => (forall t. (Mode t, a ~ Scalar t, Num t) => f t -> t) -> f a -> [f a]Source

Perform a conjugate gradient descent using reverse mode automatic differentiation to compute the gradient, and using forward-on-forward mode for computing extrema.

````>>> ````let sq x = x * x
````>>> ````let rosenbrock [x,y] = sq (1 - x) + 100 * sq (y - sq x)
````>>> ````rosenbrock [0,0]
```1
`>>> ````rosenbrock (conjugateGradientDescent rosenbrock [0, 0] !! 5) < 0.1
```True
```

conjugateGradientAscent :: (Traversable f, Ord a, Fractional a) => (forall t. (Mode t, a ~ Scalar t, Num t) => f t -> t) -> f a -> [f a]Source

Perform a conjugate gradient ascent using reverse mode automatic differentiation to compute the gradient.