Copyright | (c) Justin Le 2018 |
---|---|
License | BSD3 |
Maintainer | justin@jle.im |
Stability | experimental |
Portability | non-portable |
Safe Haskell | None |
Language | Haskell2010 |
Provides the Op
type and combinators, which represent differentiable
functions/operations on values, and are used internally by the library
to perform back-propagation.
Users of the library can ignore this module for the most part. Library
authors defining backpropagatable primitives for their functions are
recommend to simply use op0
, op1
, op2
, op3
, which are
re-exported in Numeric.Backprop. However, authors who want more
options in defining their primtive functions might find some of these
functions useful.
Note that if your entire function is a single non-branching composition
of functions, Op
and its utility functions alone are sufficient to
differentiate/backprop. However, this happens rarely in practice.
To use these Op
s with the backprop library, they can be made to work
with BVar
s using liftOp
, liftOp1
, liftOp2
, and liftOp3
.
If you are writing a library, see
https://backprop.jle.im/06-equipping-your-library.html for a guide for
equipping your library with backpropatable operations using Op
s.
See also this guide for writing Ops manually on your own numerical functions.
- newtype Op as a = Op {}
- data Prod k (f :: k -> *) (a :: [k]) :: forall k. (k -> *) -> [k] -> * where
- type Tuple = Prod * I
- newtype I a :: * -> * = I {
- getI :: a
- runOp :: Num a => Op as a -> Tuple as -> (a, Tuple as)
- evalOp :: Op as a -> Tuple as -> a
- gradOp :: Num a => Op as a -> Tuple as -> Tuple as
- gradOpWith :: Op as a -> Tuple as -> a -> Tuple as
- op0 :: a -> Op '[] a
- opConst :: (Every Num as, Known Length as) => a -> Op as a
- idOp :: Op '[a] a
- opConst' :: Every Num as => Length as -> a -> Op as a
- opLens :: Num a => Lens' a b -> Op '[a] b
- op1 :: (a -> (b, b -> a)) -> Op '[a] b
- op2 :: (a -> b -> (c, c -> (a, b))) -> Op '[a, b] c
- op3 :: (a -> b -> c -> (d, d -> (a, b, c))) -> Op '[a, b, c] d
- opCoerce :: Coercible a b => Op '[a] b
- opTup :: Op as (Tuple as)
- opIso :: (a -> b) -> (b -> a) -> Op '[a] b
- opIso2 :: (a -> b -> c) -> (c -> (a, b)) -> Op '[a, b] c
- opIso3 :: (a -> b -> c -> d) -> (d -> (a, b, c)) -> Op '[a, b, c] d
- opIsoN :: (Tuple as -> b) -> (b -> Tuple as) -> Op as b
- noGrad1 :: (a -> b) -> Op '[a] b
- noGrad :: (Tuple as -> b) -> Op as b
- composeOp :: (Every Num as, Known Length as) => Prod (Op as) bs -> Op bs c -> Op as c
- composeOp1 :: (Every Num as, Known Length as) => Op as b -> Op '[b] c -> Op as c
- (~.) :: (Known Length as, Every Num as) => Op '[b] c -> Op as b -> Op as c
- composeOp' :: Every Num as => Length as -> Prod (Op as) bs -> Op bs c -> Op as c
- composeOp1' :: Every Num as => Length as -> Op as b -> Op '[b] c -> Op as c
- pattern (:>) :: forall k (f :: k -> *) (a :: k) (b :: k). f a -> f b -> Prod k f ((:) k a ((:) k b ([] k)))
- only :: f a -> Prod k f ((:) k a ([] k))
- head' :: Prod k f ((:<) k a as) -> f a
- pattern (::<) :: forall a (as :: [*]). a -> Tuple as -> Tuple ((:<) * a as)
- only_ :: a -> Tuple ((:) * a ([] *))
- (+.) :: Num a => Op '[a, a] a
- (-.) :: Num a => Op '[a, a] a
- (*.) :: Num a => Op '[a, a] a
- negateOp :: Num a => Op '[a] a
- absOp :: Num a => Op '[a] a
- signumOp :: Num a => Op '[a] a
- (/.) :: Fractional a => Op '[a, a] a
- recipOp :: Fractional a => Op '[a] a
- expOp :: Floating a => Op '[a] a
- logOp :: Floating a => Op '[a] a
- sqrtOp :: Floating a => Op '[a] a
- (**.) :: Floating a => Op '[a, a] a
- logBaseOp :: Floating a => Op '[a, a] a
- sinOp :: Floating a => Op '[a] a
- cosOp :: Floating a => Op '[a] a
- tanOp :: Floating a => Op '[a] a
- asinOp :: Floating a => Op '[a] a
- acosOp :: Floating a => Op '[a] a
- atanOp :: Floating a => Op '[a] a
- sinhOp :: Floating a => Op '[a] a
- coshOp :: Floating a => Op '[a] a
- tanhOp :: Floating a => Op '[a] a
- asinhOp :: Floating a => Op '[a] a
- acoshOp :: Floating a => Op '[a] a
- atanhOp :: Floating a => Op '[a] a
Implementation
Op
s contain information on a function as well as its gradient, but
provides that information in a way that allows them to be "chained".
For example, for a function
\[ f : \mathbb{R}^n \rightarrow \mathbb{R} \]
We might want to apply a function \(g\) to the result we get, to get our "final" result:
\[ \eqalign{ y &= f(\mathbf{x})\cr z &= g(y) } \]
Now, we might want the gradient \(\nabla z\) with respect to \(\mathbf{x}\), or \(\nabla_\mathbf{x} z\). Explicitly, this is:
\[ \nabla_\mathbf{x} z = \left< \frac{\partial z}{\partial x_1}, \frac{\partial z}{\partial x_2}, \ldots \right> \]
We can compute that by multiplying the total derivative of \(z\) with respect to \(y\) (that is, \(\frac{dz}{dy}\)) with the gradient of \(f\)) itself:
\[ \eqalign{ \nabla_\mathbf{x} z &= \frac{dz}{dy} \left< \frac{\partial y}{\partial x_1}, \frac{\partial y}{\partial x_2}, \ldots \right>\cr \nabla_\mathbf{x} z &= \frac{dz}{dy} \nabla_\mathbf{x} y } \]
So, to create an
with the Op
as aOp
constructor, you give
a function that returns a tuple, containing:
- An
a
: The result of the function - An
a -> Tuple as
: A function that, when given \(\frac{dz}{dy}\), returns the total gradient \(\nabla_z \mathbf{x}\).
This is done so that Op
s can easily be "chained" together, one after
the other. If you have an Op
for \(f\) and an Op
for \(g\), you can
compute the gradient of \(f\) knowing that the result target is
\(g \circ f\).
See this guide for a detailed look on writing ops manually on your own numerical functions.
Note that end users should probably never be required to construct an
Op
explicitly this way. Instead, libraries should provide
carefuly pre-constructed ones, or provide ways to generate them
automatically (like op1
, op2
, and op3
here).
For examples of Op
s implemented from scratch, see the implementations
of +.
, -.
, recipOp
, sinOp
, etc.
See Numeric.Backprop.Op for a mini-tutorial on using Prod
and
Tuple
.
An
describes a differentiable function from Op
as aas
to a
.
For example, a value of type
Op
'[Int, Bool] Double
is a function from an Int
and a Bool
, returning a Double
. It can
be differentiated to give a gradient of an Int
and a Bool
if given
a total derivative for the Double
. If we call Bool
\(2\), then,
mathematically, it is akin to a:
\[ f : \mathbb{Z} \times 2 \rightarrow \mathbb{R} \]
See runOp
, gradOp
, and gradOpWith
for examples on how to run it,
and Op
for instructions on creating it.
It is simpler to not use this type constructor directly, and instead use
the op2
, op1
, op2
, and op3
helper smart constructors.
See Numeric.Backprop.Op for a mini-tutorial on using Prod
and
Tuple
.
To use an Op
with the backprop library, see liftOp
, liftOp1
,
liftOp2
, and liftOp3
.
Op | Construct an See the module documentation for Numeric.Backprop.Op for more
details on the function that this constructor and |
|
(Known [*] (Length *) as, Every * Floating as, Every * Fractional as, Every * Num as, Floating a) => Floating (Op as a) Source # | |
(Known [*] (Length *) as, Every * Fractional as, Every * Num as, Fractional a) => Fractional (Op as a) Source # | |
(Known [*] (Length *) as, Every * Num as, Num a) => Num (Op as a) Source # | |
Tuple Types
Prod
, from the <http://hackage.haskell.org/package/type-combinators
type-combinators> library (in Data.Type.Product) is a heterogeneous
list/tuple type, which allows you to tuple together multiple values of
different types and operate on them generically.
A
contains an Prod
f '[a, b, c]f a
, an f b
, and an f c
, and
is constructed by consing them together with :<
(using 'Ø' as nil):
I
"hello":<
I True :< I 7.8 :< Ø ::Prod
I
'[String, Bool, Double]C
"hello" :< C "world" :< C "ok" :< Ø ::Prod
(C
String) '[a, b, c]Proxy
:< Proxy :< Proxy :< Ø ::Prod
Proxy
'[a, b, c]
(I
is the identity functor, and C
is the constant functor)
So, in general:
x :: f a y :: f b z :: f c x :< y :< z :< Ø :: Prod f '[a, b, c]
If you're having problems typing 'Ø', you can use only
:
only z :: Prod f '[c] x :< y :< only z :: Prod f '[a, b, c]
Tuple
is provided as a convenient type synonym for Prod
I
, and has
a convenient pattern synonym ::<
(and only_
), which can also be used
for pattern matching:
x :: a y :: b z :: conly_
z ::Tuple
'[c] x::<
y ::< z ::< Ø ::Tuple
'[a, b, c] x ::< y ::< only_ z ::Tuple
'[a, b, c]
data Prod k (f :: k -> *) (a :: [k]) :: forall k. (k -> *) -> [k] -> * where #
Witness ØC ØC (Prod k f (Ø k)) | |
Functor1 k [k] (Prod k) | |
Foldable1 k [k] (Prod k) | |
Traversable1 k [k] (Prod k) | |
IxFunctor1 k [k] (Index k) (Prod k) | |
IxFoldable1 k [k] (Index k) (Prod k) | |
IxTraversable1 k [k] (Index k) (Prod k) | |
TestEquality k f => TestEquality [k] (Prod k f) | |
BoolEquality k f => BoolEquality [k] (Prod k f) | |
Eq1 k f => Eq1 [k] (Prod k f) | |
Ord1 k f => Ord1 [k] (Prod k f) | |
Show1 k f => Show1 [k] (Prod k f) | |
Read1 k f => Read1 [k] (Prod k f) | |
(Known [k] (Length k) as, Every k (Known k f) as) => Known [k] (Prod k f) as | |
(Witness p q (f a2), Witness s t (Prod a1 f as)) => Witness (p, s) (q, t) (Prod a1 f ((:<) a1 a2 as)) | |
ListC ((<$>) * Constraint Eq ((<$>) k * f as)) => Eq (Prod k f as) | |
(ListC ((<$>) * Constraint Eq ((<$>) k * f as)), ListC ((<$>) * Constraint Ord ((<$>) k * f as))) => Ord (Prod k f as) | |
ListC ((<$>) * Constraint Show ((<$>) k * f as)) => Show (Prod k f as) | |
ListC ((<$>) * Constraint Backprop ((<$>) * * f as)) => Backprop (Prod * f as) Source # | |
type WitnessC ØC ØC (Prod k f (Ø k)) | |
type KnownC [k] (Prod k f) as | |
type WitnessC (p, s) (q, t) (Prod a1 f ((:<) a1 a2 as)) | |
Running
Pure
runOp :: Num a => Op as a -> Tuple as -> (a, Tuple as) Source #
Run the function that an Op
encodes, to get the resulting output and
also its gradient with respect to the inputs.
>>>
gradOp' (op2 (*)) (3 ::< 5 ::< Ø)
(15, 5 ::< 3 ::< Ø)
evalOp :: Op as a -> Tuple as -> a Source #
Run the function that an Op
encodes, to get the result.
>>>
runOp (op2 (*)) (3 ::< 5 ::< Ø)
15
gradOp :: Num a => Op as a -> Tuple as -> Tuple as Source #
Run the function that an Op
encodes, and get the gradient of the
output with respect to the inputs.
>>>
gradOp (op2 (*)) (3 ::< 5 ::< Ø)
5 ::< 3 ::< Ø -- the gradient of x*y is (y, x)
gradOp
o xs =gradOpWith
o xs 1
:: Op as a |
|
-> Tuple as | Inputs to run it with |
-> a | The total derivative of the result. |
-> Tuple as | The gradient |
Get the gradient function that an Op
encodes, with a third argument
expecting the total derivative of the result.
See the module documentaiton for Numeric.Backprop.Op for more information.
Creation
opConst :: (Every Num as, Known Length as) => a -> Op as a Source #
An Op
that ignores all of its inputs and returns a given constant
value.
>>>
gradOp' (opConst 10) (1 ::< 2 ::< 3 ::< Ø)
(10, 0 ::< 0 ::< 0 ::< Ø)
opConst' :: Every Num as => Length as -> a -> Op as a Source #
A version of opConst
taking explicit Length
, indicating the
number of inputs and their types.
Requiring an explicit Length
is mostly useful for rare "extremely
polymorphic" situations, where GHC can't infer the type and length of
the the expected input tuple. If you ever actually explicitly write
down as
as a list of types, you should be able to just use
opConst
.
Giving gradients directly
op1 :: (a -> (b, b -> a)) -> Op '[a] b Source #
Create an Op
of a function taking one input, by giving its explicit
derivative. The function should return a tuple containing the result of
the function, and also a function taking the derivative of the result
and return the derivative of the input.
If we have
\[ \eqalign{ f &: \mathbb{R} \rightarrow \mathbb{R}\cr y &= f(x)\cr z &= g(y) } \]
Then the derivative \( \frac{dz}{dx} \), it would be:
\[ \frac{dz}{dx} = \frac{dz}{dy} \frac{dy}{dx} \]
If our Op
represents \(f\), then the second item in the resulting
tuple should be a function that takes \(\frac{dz}{dy}\) and returns
\(\frac{dz}{dx}\).
As an example, here is an Op
that squares its input:
square :: Num a =>Op
'[a] a square =op1
$ \x -> (x*x, \d -> 2 * d * x )
Remember that, generally, end users shouldn't directly construct Op
s;
they should be provided by libraries or generated automatically.
op2 :: (a -> b -> (c, c -> (a, b))) -> Op '[a, b] c Source #
Create an Op
of a function taking two inputs, by giving its explicit
gradient. The function should return a tuple containing the result of
the function, and also a function taking the derivative of the result
and return the derivative of the input.
If we have
\[ \eqalign{ f &: \mathbb{R}^2 \rightarrow \mathbb{R}\cr z &= f(x, y)\cr k &= g(z) } \]
Then the gradient \( \left< \frac{\partial k}{\partial x}, \frac{\partial k}{\partial y} \right> \) would be:
\[ \left< \frac{\partial k}{\partial x}, \frac{\partial k}{\partial y} \right> = \left< \frac{dk}{dz} \frac{\partial z}{dx}, \frac{dk}{dz} \frac{\partial z}{dy} \right> \]
If our Op
represents \(f\), then the second item in the resulting
tuple should be a function that takes \(\frac{dk}{dz}\) and returns
\( \left< \frac{\partial k}{dx}, \frac{\partial k}{dx} \right> \).
As an example, here is an Op
that multiplies its inputs:
mul :: Num a =>Op
'[a, a] a mul =op2'
$ \x y -> (x*y, \d -> (d*y, x*d) )
Remember that, generally, end users shouldn't directly construct Op
s;
they should be provided by libraries or generated automatically.
From Isomorphisms
opTup :: Op as (Tuple as) Source #
An Op
that takes as
and returns exactly the input tuple.
>>>
gradOp' opTup (1 ::< 2 ::< 3 ::< Ø)
(1 ::< 2 ::< 3 ::< Ø, 1 ::< 1 ::< 1 ::< Ø)
opIsoN :: (Tuple as -> b) -> (b -> Tuple as) -> Op as b Source #
An Op
that runs the input value through an isomorphism between
a tuple of values and a value. See opIso
for caveats.
In Numeric.Backprop.Op since version 0.1.2.0, but only exported from Numeric.Backprop since version 0.1.3.0.
Since: 0.1.2.0
No gradient
noGrad1 :: (a -> b) -> Op '[a] b Source #
Create an Op
with no gradient. Can be evaluated with evalOp
, but
will throw a runtime exception when asked for the gradient.
Can be used with BVar
with liftOp1
, and evalBP
will work fine.
gradBP
and backprop
will also work fine if the result is never used
in the final answer, but will throw a runtime exception if the final
answer depends on the result of this operation.
Useful if your only API is exposed through backprop. Just be sure to tell your users that this will explode when finding the gradient if the result is used in the final result.
Since: 0.1.3.0
noGrad :: (Tuple as -> b) -> Op as b Source #
Create an Op
with no gradient. Can be evaluated with evalOp
, but
will throw a runtime exception when asked for the gradient.
Can be used with BVar
with liftOp
, and evalBP
will work fine.
gradBP
and backprop
will also work fine if the result is never used
in the final answer, but will throw a runtime exception if the final
answer depends on the result of this operation.
Useful if your only API is exposed through backprop. Just be sure to tell your users that this will explode when finding the gradient if the result is used in the final result.
Since: 0.1.3.0
Manipulation
(~.) :: (Known Length as, Every Num as) => Op '[b] c -> Op as b -> Op as c infixr 9 Source #
Convenient infix synonym for (flipped) composeOp1
. Meant to be used
just like .
:
f ::Op
'[b] c g ::Op
'[a,a] b f~.
g :: Op '[a, a] c
:: Every Num as | |
=> Length as | |
-> Prod (Op as) bs | |
-> Op bs c |
|
-> Op as c | Composed |
A version of composeOp
taking explicit Length
, indicating the
number of inputs expected and their types.
Requiring an explicit Length
is mostly useful for rare "extremely
polymorphic" situations, where GHC can't infer the type and length of
the the expected input tuple. If you ever actually explicitly write
down as
as a list of types, you should be able to just use
composeOp
.
composeOp1' :: Every Num as => Length as -> Op as b -> Op '[b] c -> Op as c Source #
A version of composeOp1
taking explicit Length
, indicating the
number of inputs expected and their types.
Requiring an explicit Length
is mostly useful for rare "extremely
polymorphic" situations, where GHC can't infer the type and length of
the the expected input tuple. If you ever actually explicitly write
down as
as a list of types, you should be able to just use
composeOp1
.
Utility
pattern (:>) :: forall k (f :: k -> *) (a :: k) (b :: k). f a -> f b -> Prod k f ((:) k a ((:) k b ([] k))) infix 6 #
Construct a two element Prod. Since the precedence of (:>) is higher than (:<), we can conveniently write lists like:
>>>
a :< b :> c
Which is identical to:
>>>
a :< b :< c :< Ø
pattern (::<) :: forall a (as :: [*]). a -> Tuple as -> Tuple ((:<) * a as) infixr 5 #
Cons onto a Tuple.