% Neural networks with backprop library
% Justin Le

The *backprop* library performs back-propagation over a *hetereogeneous*
system of relationships.  It offers both an implicit ([ad][]-like) and explicit graph
building usage style.  Let's use it to build neural networks!

[ad]: http://hackage.haskell.org/package/ad

Repository source is [on github][repo], and so are the [rendered unstable
docs][docs].

[repo]: https://github.com/mstksg/backprop
[docs]: https://mstksg.github.io/backprop

> {-# LANGUAGE DeriveGeneric                 #-}
> {-# LANGUAGE GADTs                         #-}
> {-# LANGUAGE LambdaCase                    #-}
> {-# LANGUAGE RankNTypes                    #-}
> {-# LANGUAGE ScopedTypeVariables           #-}
> {-# LANGUAGE StandaloneDeriving            #-}
> {-# LANGUAGE TypeApplications              #-}
> {-# LANGUAGE TypeInType                    #-}
> {-# LANGUAGE TypeOperators                 #-}
> {-# LANGUAGE ViewPatterns                  #-}
> {-# OPTIONS_GHC -fno-warn-orphans          #-}
> {-# OPTIONS_GHC -fno-warn-unused-top-binds #-}
> 
> import           Data.Functor
> import           Data.Kind
> import           Data.Maybe
> import           Data.Singletons
> import           Data.Singletons.Prelude
> import           Data.Singletons.TypeLits
> import           Data.Type.Combinator
> import           Data.Type.Product
> import           GHC.Generics                        (Generic)
> import           Numeric.Backprop
> import           Numeric.Backprop.Iso
> import           Numeric.LinearAlgebra.Static hiding (dot)
> import           System.Random.MWC
> import qualified Generics.SOP                        as SOP

Ops
===

First, we define values of `Op` for the operations we want to do.  `Op`s
are bundles of functions packaged with their hetereogeneous gradients.  For
simple numeric functions, *backprop* can derive `Op`s automatically.  But
for matrix operations, we have to derive them ourselves.

The types help us with matching up the dimensions, but we still need to be
careful that our gradients are calculated correctly.

`L` and `R` are matrix and vector types from the great *hmatrix* library.

First, matrix-vector multiplication:

> matVec
>     :: (KnownNat m, KnownNat n)
>     => Op '[ L m n, R n ] (R m)
> matVec = op2' $ \m v -> ( m #> v
>                         , \(fromMaybe 1 -> g) ->
>                              (g `outer` v, tr m #> g)
>                         )

Now, dot products:

> dot :: KnownNat n
>     => Op '[ R n, R n ] Double
> dot = op2' $ \x y -> ( x <.> y
>                      , \case Nothing -> (y, x)
>                              Just g  -> (konst g * y, x * konst g)
>                      )

Polymorphic functions can be easily turned into `Op`s with `op1`/`op2`
etc., but they can also be run directly on graph nodes.

> logistic :: Floating a => a -> a
> logistic x = 1 / (1 + exp (-x))

A Simple Complete Example
=========================

At this point, we already have enough to train a simple single-hidden-layer
neural network:

> simpleOp
>       :: (KnownNat m, KnownNat n, KnownNat o)
>       => R m
>       -> BPOpI s '[ L n m, R n, L o n, R o ] (R o)
> simpleOp inp = \(w1 :< b1 :< w2 :< b2 :< Ø) ->
>     let z = logistic $ liftB2 matVec w1 x + b1
>     in  logistic $ liftB2 matVec w2 z + b2
>   where
>     x = constVar inp

Here, `simpleOp` is defined in implicit (non-monadic) style, given a tuple
of inputs and returning outputs.  Now `simpleOp` can be "run" with the
input vectors and parameters (a `L n m`, `R n`, `L o n`, and `R o`) and
calculate the output of the neural net.

> runSimple
>     :: (KnownNat m, KnownNat n, KnownNat o)
>     => R m
>     -> Tuple '[ L n m, R n, L o n, R o ]
>     -> R o
> runSimple inp = evalBPOp (implicitly $ simpleOp inp)

Alternatively, we can define `simpleOp` in explicit monadic style, were we
specify our graph nodes explicitly.  The results should be the same.

> simpleOpExplicit
>       :: (KnownNat m, KnownNat n, KnownNat o)
>       => R m
>       -> BPOp s '[ L n m, R n, L o n, R o ] (R o)
> simpleOpExplicit inp = withInps $ \(w1 :< b1 :< w2 :< b2 :< Ø) -> do
>     -- First layer
>     y1  <- matVec ~$ (w1 :< x1 :< Ø)
>     let x2 = logistic (y1 + b1)
>     -- Second layer
>     y2  <- matVec ~$ (w2 :< x2 :< Ø)
>     return $ logistic (y2 + b2)
>   where
>     x1 = constVar inp

Now, for the magic of *backprop*:  the library can now take advantage of
the implicit (or explicit) graph and use it to do back-propagation, too!

> simpleGrad
>     :: forall m n o. (KnownNat m, KnownNat n, KnownNat o)
>     => R m
>     -> R o
>     -> Tuple '[ L n m, R n, L o n, R o ]
>     -> Tuple '[ L n m, R n, L o n, R o ]
> simpleGrad inp targ params = gradBPOp opError params
>   where
>     opError :: BPOp s '[ L n m, R n, L o n, R o ] Double
>     opError = do
>         res <- implicitly $ simpleOp inp
>         -- we explicitly bind err to prevent recomputation
>         err <- bindVar $ res - t
>         dot ~$ (err :< err :< Ø)
>       where
>         t = constVar targ

The result is the gradient of the input tuple's components, with respect
to the `Double` result of `opError` (the squared error).  We can then use
this gradient to do gradient descent.

With Parameter Containers
=========================

This method doesn't quite scale, because we might want to make networks
with multiple layers and parameterize networks by layers.  Let's make some
basic container data types to help us organize our types, including a
recursive `Network` type that lets us chain multiple layers.

> data Layer :: Nat -> Nat -> Type where
>     Layer :: { _lWeights :: L m n
>              , _lBiases  :: R m
>              }
>           -> Layer n m
>       deriving (Show, Generic)
> 
>
> data Network :: Nat -> [Nat] -> Nat -> Type where
>     NØ   :: !(Layer a b) -> Network a '[] b
>     (:&) :: !(Layer a b) -> Network b bs c -> Network a (b ': bs) c

A `Layer n m` is a layer taking an n-vector and returning an m-vector.  A
`Network a '[b, c, d] e` would be a Network that takes in an a-vector and
outputs an e-vector, with hidden layers of sizes b, c, and d.

Isomorphisms
------------

The *backprop* library lets you apply operations on "parts" of data types
(like on the weights and biases of a `Layer`) by using `Iso`'s
(isomorphisms), like the ones from the *lens* library.  The library doesn't
depend on lens, but it can use the `Iso`s from the library and also
custom-defined ones.

First, we can auto-generate isomorphisms using the *generics-sop* library:

> instance SOP.Generic (Layer n m)

And then can create isomorphisms by hand for the two `Network`
constructors:

> netExternal :: Iso' (Network a '[] b) (Tuple '[Layer a b])
> netExternal = iso (\case NØ x     -> x ::< Ø)
>                   (\case I x :< Ø -> NØ x   )
> 
> netInternal :: Iso' (Network a (b ': bs) c) (Tuple '[Layer a b, Network b bs c])
> netInternal = iso (\case x :& xs          -> x ::< xs ::< Ø)
>                   (\case I x :< I xs :< Ø -> x :& xs       )

An `Iso' a (Tuple as)` means that an `a` can really just be seen as a tuple
of `as`.

Running a network
=================

Now, we can write the `BPOp` that reprenents running the network and
getting a result.  We pass in a `Sing bs` (a singleton list of the hidden
layer sizes) so that we can "pattern match" on the list and handle the
different network constructors differently.

> netOp
>     :: forall s a bs c. (KnownNat a, KnownNat c)
>     => Sing bs
>     -> BPOp s '[ R a, Network a bs c ] (R c)
> netOp sbs = go sbs
>   where
>     go :: forall d es. KnownNat d
>         => Sing es
>         -> BPOp s '[ R d, Network d es c ] (R c)
>     go = \case
>       SNil -> withInps $ \(x :< n :< Ø) -> do
>         -- peek into the NØ using netExternal iso
>         l :< Ø <- netExternal #<~ n
>         -- run the 'layerOp' BP, with x and l as inputs
>         bpOp layerOp ~$ (x :< l :< Ø)
>       SNat `SCons` ses -> withInps $ \(x :< n :< Ø) -> withSingI ses $ do
>         -- peek into the (:&) using the netInternal iso
>         l :< n' :< Ø <- netInternal #<~ n
>         -- run the 'layerOp' BP, with x and l as inputs
>         z <- bpOp layerOp  ~$ (x :< l :< Ø)
>         -- run the 'go ses' BP, with z and n as inputs
>         bpOp (go ses)      ~$ (z :< n' :< Ø)
>     layerOp
>         :: forall d e. (KnownNat d, KnownNat e)
>         => BPOp s '[ R d, Layer d e ] (R e)
>     layerOp = withInps $ \(x :< l :< Ø) -> do
>         -- peek into the layer using the gTuple iso, auto-generated with SOP.Generic
>         w :< b :< Ø <- gTuple #<~ l
>         y           <- matVec  ~$ (w :< x :< Ø)
>         return $ logistic (y + b)

There's some singletons work going on here, but it's fairly standard
singletons stuff.  Most of the complexity here is from the static typing in
our neural network type, and *not* from *backprop*.

From *backprop* specifically, the only elements are `#<~` lets you "split" an
input ref with the given iso, and `bpOp`, which converts a `BPOp` into an `Op`
that you can bind with `~$`.

Note that this library doesn't support truly pattern matching on GADTs, and
that we had to pass in `Sing bs` as a reference to the structure of our
networks.

Gradient Descent
----------------

Now we can do simple gradient descent.  Defining an error function:

> errOp
>     :: KnownNat m
>     => R m
>     -> BVar s rs (R m)
>     -> BPOp s rs Double
> errOp targ r = do
>     err <- bindVar $ r - t
>     dot ~$ (err :< err :< Ø)
>   where
>     t = constVar targ

And now, we can use `backprop` to generate the gradient, and shift the
`Network`!  Things are made a bit cleaner from the fact that `Network a bs c`
has a `Num` instance, so we can use `(-)` and `(*)` etc.

> train
>     :: (KnownNat a, SingI bs, KnownNat c)
>     => Double
>     -> R a
>     -> R c
>     -> Network a bs c
>     -> Network a bs c
> train r x t n = case backprop (errOp t =<< netOp sing) (x ::< n ::< Ø) of
>     (_, _ :< I g :< Ø) -> n - (realToFrac r * g)

(`(::<)` is cons and `Ø` is nil for tuples.)

Main
====

`main`, which will train on sample data sets, is still in progress!  Right
now it just generates a random network using the *mwc-random* library and
prints each internal layer.

> main :: IO ()
> main = withSystemRandom $ \g -> do
>     n <- uniform @(Network 4 '[3,2] 1) g
>     void $ traverseNetwork sing (\l -> l <$ print l) n

Appendix: Boilerplate
=====================

And now for some typeclass instances and boilerplates unrelated to the
*backprop* library that makes our custom types easier to use.

> instance KnownNat n => Variate (R n) where
>     uniform g = randomVector <$> uniform g <*> pure Uniform
>     uniformR (l, h) g = (\x -> x * (h - l) + l) <$> uniform g
> 
> instance (KnownNat m, KnownNat n) => Variate (L m n) where
>     uniform g = uniformSample <$> uniform g <*> pure 0 <*> pure 1
>     uniformR (l, h) g = (\x -> x * (h - l) + l) <$> uniform g
> 
> instance (KnownNat n, KnownNat m) => Variate (Layer n m) where
>     uniform g = subtract 1 . (* 2) <$> (Layer <$> uniform g <*> uniform g)
>     uniformR (l, h) g = (\x -> x * (h - l) + l) <$> uniform g
> 
> instance (KnownNat m, KnownNat n) => Num (Layer n m) where
>     Layer w1 b1 + Layer w2 b2 = Layer (w1 + w2) (b1 + b2)
>     Layer w1 b1 - Layer w2 b2 = Layer (w1 - w2) (b1 - b2)
>     Layer w1 b1 * Layer w2 b2 = Layer (w1 * w2) (b1 * b2)
>     abs    (Layer w b) = Layer (abs w) (abs b)
>     signum (Layer w b) = Layer (signum w) (signum b)
>     negate (Layer w b) = Layer (negate w) (negate b)
>     fromInteger x = Layer (fromInteger x) (fromInteger x)
> 
> instance (KnownNat m, KnownNat n) => Fractional (Layer n m) where
>     Layer w1 b1 / Layer w2 b2 = Layer (w1 / w2) (b1 / b2)
>     recip (Layer w b) = Layer (recip w) (recip b)
>     fromRational x = Layer (fromRational x) (fromRational x)
> 
> instance (KnownNat a, SingI bs, KnownNat c) => Variate (Network a bs c) where
>     uniform g = genNet sing (uniform g)
>     uniformR (l, h) g = (\x -> x * (h - l) + l) <$> uniform g
> 
> genNet
>     :: forall f a bs c. (Applicative f, KnownNat a, KnownNat c)
>     => Sing bs
>     -> (forall d e. (KnownNat d, KnownNat e) => f (Layer d e))
>     -> f (Network a bs c)
> genNet sbs f = go sbs
>   where
>     go :: forall d es. KnownNat d => Sing es -> f (Network d es c)
>     go = \case
>       SNil             -> NØ <$> f
>       SNat `SCons` ses -> (:&) <$> f <*> go ses
> 
> mapNetwork0
>     :: forall a bs c. (KnownNat a, KnownNat c)
>     => Sing bs
>     -> (forall d e. (KnownNat d, KnownNat e) => Layer d e)
>     -> Network a bs c
> mapNetwork0 sbs f = getI $ genNet sbs (I f)
> 
> traverseNetwork
>     :: forall a bs c f. (KnownNat a, KnownNat c, Applicative f)
>     => Sing bs
>     -> (forall d e. (KnownNat d, KnownNat e) => Layer d e -> f (Layer d e))
>     -> Network a bs c
>     -> f (Network a bs c)
> traverseNetwork sbs f = go sbs
>   where
>     go :: forall d es. KnownNat d => Sing es -> Network d es c -> f (Network d es c)
>     go = \case
>       SNil -> \case
>         NØ x -> NØ <$> f x
>       SNat `SCons` ses -> \case
>         x :& xs -> (:&) <$> f x <*> go ses xs
> 
> mapNetwork1
>     :: forall a bs c. (KnownNat a, KnownNat c)
>     => Sing bs
>     -> (forall d e. (KnownNat d, KnownNat e) => Layer d e -> Layer d e)
>     -> Network a bs c
>     -> Network a bs c
> mapNetwork1 sbs f = getI . traverseNetwork sbs (I . f)
> 
> mapNetwork2
>     :: forall a bs c. (KnownNat a, KnownNat c)
>     => Sing bs
>     -> (forall d e. (KnownNat d, KnownNat e) => Layer d e -> Layer d e -> Layer d e)
>     -> Network a bs c
>     -> Network a bs c
>     -> Network a bs c
> mapNetwork2 sbs f = go sbs
>   where
>     go :: forall d es. KnownNat d => Sing es -> Network d es c -> Network d es c -> Network d es c
>     go = \case
>       SNil -> \case
>         NØ x -> \case
>           NØ y -> NØ (f x y)
>       SNat `SCons` ses -> \case
>         x :& xs -> \case
>           y :& ys -> f x y :& go ses xs ys
> 
> instance (KnownNat a, SingI bs, KnownNat c) => Num (Network a bs c) where
>     (+)           = mapNetwork2 sing (+)
>     (-)           = mapNetwork2 sing (-)
>     (*)           = mapNetwork2 sing (*)
>     negate        = mapNetwork1 sing negate
>     abs           = mapNetwork1 sing abs
>     signum        = mapNetwork1 sing signum
>     fromInteger x = mapNetwork0 sing (fromInteger x)
> 
> instance (KnownNat a, SingI bs, KnownNat c) => Fractional (Network a bs c) where
>     (/)            = mapNetwork2 sing (/)
>     recip          = mapNetwork1 sing recip
>     fromRational x = mapNetwork0 sing (fromRational x)