| Copyright | (c) Justin Le 2018 |
|---|---|
| License | BSD3 |
| Maintainer | justin@jle.im |
| Stability | experimental |
| Portability | non-portable |
| Safe Haskell | None |
| Language | Haskell2010 |
Numeric.Backprop.Op
Contents
Description
Provides the Op type and combinators, which represent differentiable
functions/operations on values, and are used internally by the library
to perform back-propagation.
Users of the library can ignore this module for the most part. Library
authors defining backpropagatable primitives for their functions are
recommend to simply use op0, op1, op2, op3, which are
re-exported in Numeric.Backprop. However, authors who want more
options in defining their primtive functions might find some of these
functions useful.
Note that if your entire function is a single non-branching composition
of functions, Op and its utility functions alone are sufficient to
differentiate/backprop. However, this happens rarely in practice.
To use these Ops with the backprop library, they can be made to work
with BVars using liftOp, liftOp1, liftOp2, and liftOp3.
If you are writing a library, see
https://backprop.jle.im/06-equipping-your-library.html for a guide for
equipping your library with backpropatable operations using Ops.
See also this guide for writing Ops manually on your own numerical functions.
- newtype Op as a = Op {}
- data Prod k (f :: k -> *) (a :: [k]) :: forall k. (k -> *) -> [k] -> * where
- type Tuple = Prod * I
- newtype I a :: * -> * = I {
- getI :: a
- runOp :: Num a => Op as a -> Tuple as -> (a, Tuple as)
- evalOp :: Op as a -> Tuple as -> a
- gradOp :: Num a => Op as a -> Tuple as -> Tuple as
- gradOpWith :: Op as a -> Tuple as -> a -> Tuple as
- op0 :: a -> Op '[] a
- opConst :: (Every Num as, Known Length as) => a -> Op as a
- idOp :: Op '[a] a
- opConst' :: Every Num as => Length as -> a -> Op as a
- opLens :: Num a => Lens' a b -> Op '[a] b
- op1 :: (a -> (b, b -> a)) -> Op '[a] b
- op2 :: (a -> b -> (c, c -> (a, b))) -> Op '[a, b] c
- op3 :: (a -> b -> c -> (d, d -> (a, b, c))) -> Op '[a, b, c] d
- opCoerce :: Coercible a b => Op '[a] b
- opTup :: Op as (Tuple as)
- opIso :: (a -> b) -> (b -> a) -> Op '[a] b
- opIso2 :: (a -> b -> c) -> (c -> (a, b)) -> Op '[a, b] c
- opIso3 :: (a -> b -> c -> d) -> (d -> (a, b, c)) -> Op '[a, b, c] d
- opIsoN :: (Tuple as -> b) -> (b -> Tuple as) -> Op as b
- noGrad1 :: (a -> b) -> Op '[a] b
- noGrad :: (Tuple as -> b) -> Op as b
- composeOp :: (Every Num as, Known Length as) => Prod (Op as) bs -> Op bs c -> Op as c
- composeOp1 :: (Every Num as, Known Length as) => Op as b -> Op '[b] c -> Op as c
- (~.) :: (Known Length as, Every Num as) => Op '[b] c -> Op as b -> Op as c
- composeOp' :: Every Num as => Length as -> Prod (Op as) bs -> Op bs c -> Op as c
- composeOp1' :: Every Num as => Length as -> Op as b -> Op '[b] c -> Op as c
- pattern (:>) :: forall k (f :: k -> *) (a :: k) (b :: k). f a -> f b -> Prod k f ((:) k a ((:) k b ([] k)))
- only :: f a -> Prod k f ((:) k a ([] k))
- head' :: Prod k f ((:<) k a as) -> f a
- pattern (::<) :: forall a (as :: [*]). a -> Tuple as -> Tuple ((:<) * a as)
- only_ :: a -> Tuple ((:) * a ([] *))
- (+.) :: Num a => Op '[a, a] a
- (-.) :: Num a => Op '[a, a] a
- (*.) :: Num a => Op '[a, a] a
- negateOp :: Num a => Op '[a] a
- absOp :: Num a => Op '[a] a
- signumOp :: Num a => Op '[a] a
- (/.) :: Fractional a => Op '[a, a] a
- recipOp :: Fractional a => Op '[a] a
- expOp :: Floating a => Op '[a] a
- logOp :: Floating a => Op '[a] a
- sqrtOp :: Floating a => Op '[a] a
- (**.) :: Floating a => Op '[a, a] a
- logBaseOp :: Floating a => Op '[a, a] a
- sinOp :: Floating a => Op '[a] a
- cosOp :: Floating a => Op '[a] a
- tanOp :: Floating a => Op '[a] a
- asinOp :: Floating a => Op '[a] a
- acosOp :: Floating a => Op '[a] a
- atanOp :: Floating a => Op '[a] a
- sinhOp :: Floating a => Op '[a] a
- coshOp :: Floating a => Op '[a] a
- tanhOp :: Floating a => Op '[a] a
- asinhOp :: Floating a => Op '[a] a
- acoshOp :: Floating a => Op '[a] a
- atanhOp :: Floating a => Op '[a] a
Implementation
Ops contain information on a function as well as its gradient, but
provides that information in a way that allows them to be "chained".
For example, for a function
\[ f : \mathbb{R}^n \rightarrow \mathbb{R} \]
We might want to apply a function \(g\) to the result we get, to get our "final" result:
\[ \eqalign{ y &= f(\mathbf{x})\cr z &= g(y) } \]
Now, we might want the gradient \(\nabla z\) with respect to \(\mathbf{x}\), or \(\nabla_\mathbf{x} z\). Explicitly, this is:
\[ \nabla_\mathbf{x} z = \left< \frac{\partial z}{\partial x_1}, \frac{\partial z}{\partial x_2}, \ldots \right> \]
We can compute that by multiplying the total derivative of \(z\) with respect to \(y\) (that is, \(\frac{dz}{dy}\)) with the gradient of \(f\)) itself:
\[ \eqalign{ \nabla_\mathbf{x} z &= \frac{dz}{dy} \left< \frac{\partial y}{\partial x_1}, \frac{\partial y}{\partial x_2}, \ldots \right>\cr \nabla_\mathbf{x} z &= \frac{dz}{dy} \nabla_\mathbf{x} y } \]
So, to create an with the Op as aOp constructor, you give
a function that returns a tuple, containing:
- An
a: The result of the function - An
a -> Tuple as: A function that, when given \(\frac{dz}{dy}\), returns the total gradient \(\nabla_z \mathbf{x}\).
This is done so that Ops can easily be "chained" together, one after
the other. If you have an Op for \(f\) and an Op for \(g\), you can
compute the gradient of \(f\) knowing that the result target is
\(g \circ f\).
See this guide for a detailed look on writing ops manually on your own numerical functions.
Note that end users should probably never be required to construct an
Op explicitly this way. Instead, libraries should provide
carefuly pre-constructed ones, or provide ways to generate them
automatically (like op1, op2, and op3 here).
For examples of Ops implemented from scratch, see the implementations
of +., -., recipOp, sinOp, etc.
See Numeric.Backprop.Op for a mini-tutorial on using Prod and
Tuple.
An describes a differentiable function from Op as aas to a.
For example, a value of type
Op '[Int, Bool] Double
is a function from an Int and a Bool, returning a Double. It can
be differentiated to give a gradient of an Int and a Bool if given
a total derivative for the Double. If we call Bool \(2\), then,
mathematically, it is akin to a:
\[ f : \mathbb{Z} \times 2 \rightarrow \mathbb{R} \]
See runOp, gradOp, and gradOpWith for examples on how to run it,
and Op for instructions on creating it.
It is simpler to not use this type constructor directly, and instead use
the op2, op1, op2, and op3 helper smart constructors.
See Numeric.Backprop.Op for a mini-tutorial on using Prod and
Tuple.
To use an Op with the backprop library, see liftOp, liftOp1,
liftOp2, and liftOp3.
Constructors
| Op | Construct an See the module documentation for Numeric.Backprop.Op for more
details on the function that this constructor and |
Fields
| |
Instances
| (Known [*] (Length *) as, Every * Floating as, Every * Fractional as, Every * Num as, Floating a) => Floating (Op as a) Source # | |
| (Known [*] (Length *) as, Every * Fractional as, Every * Num as, Fractional a) => Fractional (Op as a) Source # | |
| (Known [*] (Length *) as, Every * Num as, Num a) => Num (Op as a) Source # | |
Tuple Types
Prod, from the <http://hackage.haskell.org/package/type-combinators
type-combinators> library (in Data.Type.Product) is a heterogeneous
list/tuple type, which allows you to tuple together multiple values of
different types and operate on them generically.
A contains an Prod f '[a, b, c]f a, an f b, and an f c, and
is constructed by consing them together with :< (using 'Ø' as nil):
I"hello":<I True :< I 7.8 :< Ø ::ProdI'[String, Bool, Double]C"hello" :< C "world" :< C "ok" :< Ø ::Prod(CString) '[a, b, c]Proxy:< Proxy :< Proxy :< Ø ::ProdProxy'[a, b, c]
(I is the identity functor, and C is the constant functor)
So, in general:
x :: f a y :: f b z :: f c x :< y :< z :< Ø :: Prod f '[a, b, c]
If you're having problems typing 'Ø', you can use only:
only z :: Prod f '[c] x :< y :< only z :: Prod f '[a, b, c]
Tuple is provided as a convenient type synonym for Prod I, and has
a convenient pattern synonym ::< (and only_), which can also be used
for pattern matching:
x :: a y :: b z :: conly_z ::Tuple'[c] x::<y ::< z ::< Ø ::Tuple'[a, b, c] x ::< y ::< only_ z ::Tuple'[a, b, c]
data Prod k (f :: k -> *) (a :: [k]) :: forall k. (k -> *) -> [k] -> * where #
Instances
| Witness ØC ØC (Prod k f (Ø k)) | |
| Functor1 k [k] (Prod k) | |
| Foldable1 k [k] (Prod k) | |
| Traversable1 k [k] (Prod k) | |
| IxFunctor1 k [k] (Index k) (Prod k) | |
| IxFoldable1 k [k] (Index k) (Prod k) | |
| IxTraversable1 k [k] (Index k) (Prod k) | |
| TestEquality k f => TestEquality [k] (Prod k f) | |
| BoolEquality k f => BoolEquality [k] (Prod k f) | |
| Eq1 k f => Eq1 [k] (Prod k f) | |
| Ord1 k f => Ord1 [k] (Prod k f) | |
| Show1 k f => Show1 [k] (Prod k f) | |
| Read1 k f => Read1 [k] (Prod k f) | |
| (Known [k] (Length k) as, Every k (Known k f) as) => Known [k] (Prod k f) as | |
| (Witness p q (f a2), Witness s t (Prod a1 f as)) => Witness (p, s) (q, t) (Prod a1 f ((:<) a1 a2 as)) | |
| ListC ((<$>) * Constraint Eq ((<$>) k * f as)) => Eq (Prod k f as) | |
| (ListC ((<$>) * Constraint Eq ((<$>) k * f as)), ListC ((<$>) * Constraint Ord ((<$>) k * f as))) => Ord (Prod k f as) | |
| ListC ((<$>) * Constraint Show ((<$>) k * f as)) => Show (Prod k f as) | |
| ListC ((<$>) * Constraint Backprop ((<$>) * * f as)) => Backprop (Prod * f as) Source # | |
| type WitnessC ØC ØC (Prod k f (Ø k)) | |
| type KnownC [k] (Prod k f) as | |
| type WitnessC (p, s) (q, t) (Prod a1 f ((:<) a1 a2 as)) | |
Running
Pure
runOp :: Num a => Op as a -> Tuple as -> (a, Tuple as) Source #
Run the function that an Op encodes, to get the resulting output and
also its gradient with respect to the inputs.
>>>gradOp' (op2 (*)) (3 ::< 5 ::< Ø)(15, 5 ::< 3 ::< Ø)
evalOp :: Op as a -> Tuple as -> a Source #
Run the function that an Op encodes, to get the result.
>>>runOp (op2 (*)) (3 ::< 5 ::< Ø)15
gradOp :: Num a => Op as a -> Tuple as -> Tuple as Source #
Run the function that an Op encodes, and get the gradient of the
output with respect to the inputs.
>>>gradOp (op2 (*)) (3 ::< 5 ::< Ø)5 ::< 3 ::< Ø -- the gradient of x*y is (y, x)
gradOpo xs =gradOpWitho xs 1
Arguments
| :: Op as a |
|
| -> Tuple as | Inputs to run it with |
| -> a | The total derivative of the result. |
| -> Tuple as | The gradient |
Get the gradient function that an Op encodes, with a third argument
expecting the total derivative of the result.
See the module documentaiton for Numeric.Backprop.Op for more information.
Creation
opConst :: (Every Num as, Known Length as) => a -> Op as a Source #
An Op that ignores all of its inputs and returns a given constant
value.
>>>gradOp' (opConst 10) (1 ::< 2 ::< 3 ::< Ø)(10, 0 ::< 0 ::< 0 ::< Ø)
opConst' :: Every Num as => Length as -> a -> Op as a Source #
A version of opConst taking explicit Length, indicating the
number of inputs and their types.
Requiring an explicit Length is mostly useful for rare "extremely
polymorphic" situations, where GHC can't infer the type and length of
the the expected input tuple. If you ever actually explicitly write
down as as a list of types, you should be able to just use
opConst.
Giving gradients directly
op1 :: (a -> (b, b -> a)) -> Op '[a] b Source #
Create an Op of a function taking one input, by giving its explicit
derivative. The function should return a tuple containing the result of
the function, and also a function taking the derivative of the result
and return the derivative of the input.
If we have
\[ \eqalign{ f &: \mathbb{R} \rightarrow \mathbb{R}\cr y &= f(x)\cr z &= g(y) } \]
Then the derivative \( \frac{dz}{dx} \), it would be:
\[ \frac{dz}{dx} = \frac{dz}{dy} \frac{dy}{dx} \]
If our Op represents \(f\), then the second item in the resulting
tuple should be a function that takes \(\frac{dz}{dy}\) and returns
\(\frac{dz}{dx}\).
As an example, here is an Op that squares its input:
square :: Num a =>Op'[a] a square =op1$ \x -> (x*x, \d -> 2 * d * x )
Remember that, generally, end users shouldn't directly construct Ops;
they should be provided by libraries or generated automatically.
op2 :: (a -> b -> (c, c -> (a, b))) -> Op '[a, b] c Source #
Create an Op of a function taking two inputs, by giving its explicit
gradient. The function should return a tuple containing the result of
the function, and also a function taking the derivative of the result
and return the derivative of the input.
If we have
\[ \eqalign{ f &: \mathbb{R}^2 \rightarrow \mathbb{R}\cr z &= f(x, y)\cr k &= g(z) } \]
Then the gradient \( \left< \frac{\partial k}{\partial x}, \frac{\partial k}{\partial y} \right> \) would be:
\[ \left< \frac{\partial k}{\partial x}, \frac{\partial k}{\partial y} \right> = \left< \frac{dk}{dz} \frac{\partial z}{dx}, \frac{dk}{dz} \frac{\partial z}{dy} \right> \]
If our Op represents \(f\), then the second item in the resulting
tuple should be a function that takes \(\frac{dk}{dz}\) and returns
\( \left< \frac{\partial k}{dx}, \frac{\partial k}{dx} \right> \).
As an example, here is an Op that multiplies its inputs:
mul :: Num a =>Op'[a, a] a mul =op2'$ \x y -> (x*y, \d -> (d*y, x*d) )
Remember that, generally, end users shouldn't directly construct Ops;
they should be provided by libraries or generated automatically.
From Isomorphisms
opTup :: Op as (Tuple as) Source #
An Op that takes as and returns exactly the input tuple.
>>>gradOp' opTup (1 ::< 2 ::< 3 ::< Ø)(1 ::< 2 ::< 3 ::< Ø, 1 ::< 1 ::< 1 ::< Ø)
opIsoN :: (Tuple as -> b) -> (b -> Tuple as) -> Op as b Source #
An Op that runs the input value through an isomorphism between
a tuple of values and a value. See opIso for caveats.
In Numeric.Backprop.Op since version 0.1.2.0, but only exported from Numeric.Backprop since version 0.1.3.0.
Since: 0.1.2.0
No gradient
noGrad1 :: (a -> b) -> Op '[a] b Source #
Create an Op with no gradient. Can be evaluated with evalOp, but
will throw a runtime exception when asked for the gradient.
Can be used with BVar with liftOp1, and evalBP will work fine.
gradBP and backprop will also work fine if the result is never used
in the final answer, but will throw a runtime exception if the final
answer depends on the result of this operation.
Useful if your only API is exposed through backprop. Just be sure to tell your users that this will explode when finding the gradient if the result is used in the final result.
Since: 0.1.3.0
noGrad :: (Tuple as -> b) -> Op as b Source #
Create an Op with no gradient. Can be evaluated with evalOp, but
will throw a runtime exception when asked for the gradient.
Can be used with BVar with liftOp, and evalBP will work fine.
gradBP and backprop will also work fine if the result is never used
in the final answer, but will throw a runtime exception if the final
answer depends on the result of this operation.
Useful if your only API is exposed through backprop. Just be sure to tell your users that this will explode when finding the gradient if the result is used in the final result.
Since: 0.1.3.0
Manipulation
(~.) :: (Known Length as, Every Num as) => Op '[b] c -> Op as b -> Op as c infixr 9 Source #
Convenient infix synonym for (flipped) composeOp1. Meant to be used
just like .:
f ::Op'[b] c g ::Op'[a,a] b f~.g :: Op '[a, a] c
Arguments
| :: Every Num as | |
| => Length as | |
| -> Prod (Op as) bs | |
| -> Op bs c |
|
| -> Op as c | Composed |
A version of composeOp taking explicit Length, indicating the
number of inputs expected and their types.
Requiring an explicit Length is mostly useful for rare "extremely
polymorphic" situations, where GHC can't infer the type and length of
the the expected input tuple. If you ever actually explicitly write
down as as a list of types, you should be able to just use
composeOp.
composeOp1' :: Every Num as => Length as -> Op as b -> Op '[b] c -> Op as c Source #
A version of composeOp1 taking explicit Length, indicating the
number of inputs expected and their types.
Requiring an explicit Length is mostly useful for rare "extremely
polymorphic" situations, where GHC can't infer the type and length of
the the expected input tuple. If you ever actually explicitly write
down as as a list of types, you should be able to just use
composeOp1.
Utility
pattern (:>) :: forall k (f :: k -> *) (a :: k) (b :: k). f a -> f b -> Prod k f ((:) k a ((:) k b ([] k))) infix 6 #
Construct a two element Prod. Since the precedence of (:>) is higher than (:<), we can conveniently write lists like:
>>>a :< b :> c
Which is identical to:
>>>a :< b :< c :< Ø
pattern (::<) :: forall a (as :: [*]). a -> Tuple as -> Tuple ((:<) * a as) infixr 5 #
Cons onto a Tuple.