# backprop

**Literate Haskell Tutorial/Demo on MNIST data set** (and PDF
rendering)

Automatic *heterogeneous* back-propagation that can be used either *implicitly*
(in the style of the ad library) or using *explicit* graphs built in
monadic style. Implements reverse-mode automatic differentiation. Differs
from ad by offering full heterogeneity -- each intermediate step and the
resulting value can have different types. Mostly intended for usage with
tensor manipulation libraries to implement automatic back-propagation for
gradient descent and other optimization techniques.

Documentation is currently rendered on github pages!

## MNIST Digit Classifier Example

Tutorial and example on training on the MNIST data set available here as a
literate haskell file, or rendered here as a PDF!
**Read this first!**

## Brief example

The quick example below describes the running of a neural network with one
hidden layer to calculate its squared error with respect to target `targ`

,
which is parameterized by two weight matrices and two bias vectors.
Vector/matrix types are from the *hmatrix* package.

```
logistic :: Floating a => a -> a
logistic x = 1 / (1 + exp (-x))
matVec
:: (KnownNat m, KnownNat n)
=> Op '[ L m n, R n ] (R m)
neuralNetImplicit
:: (KnownNat m, KnownNat n, KnownNat o)
=> R m
-> BPOpI s '[ L n m, R n, L o n, R o ] (R o)
neuralNetImplicit inp = \(w1 :< b1 :< w2 :< b2 :< Ø) ->
let z = logistic (liftB2 matVec w1 x + b1)
in logistic (liftB2 matVec w2 z + b2)
where
x = constRef inp
neuralNetExplicit
:: (KnownNat m, KnownNat n, KnownNat o)
=> R m
-> BPOp s '[ L n m, R n, L o n, R o ] (R o)
neuralNetExplicit inp = withInps $ \(w1 :< b1 :< w2 :< b2 :< Ø) -> do
y1 <- matVec ~$ (w1 :< x1 :< Ø)
let x2 = logistic (y1 + b1)
y2 <- matVec ~$ (w2 :< x2 :< Ø)
return $ logistic (y2 + b2)
where
x1 = constVar inp
```

Now `neuralNetExplicit`

and `neuralNetImplicit`

can be "run" with the input
vectors and parameters (a `L n m`

, `R n`

, `L o n`

, and `R o`

) and calculate the
output of the neural net.

```
runNet
:: (KnownNat m, KnownNat n, KnownNat o)
=> R m
-> Tuple '[ L n m, R n, L o n, R o ]
-> R o
runNet inp = evalBPOp (neuralNetExplicit inp)
```

But, in defining `neuralNet`

, we also generated a graph that *backprop* can
use to do back-propagation, too!

```
dot :: KnownNat n
=> Op '[ R n , R n ] Double
netGrad
:: forall m n o. (KnownNat m, KnownNat n, KnownNat o)
=> R m
-> R o
-> Tuple '[ L n m, R n, L o n, R o ]
-> Tuple '[ L n m, R n, L o n, R o ]
netGrad inp targ params = gradBPOp opError params
where
-- calculate squared error, in *explicit* style
opError :: BPOp s '[ L n m, R n, L o n, R o ] Double
opError = do
res <- neuralNetExplicit inp
err <- bindRef (res - t)
dot ~$ (err :< err :< Ø)
where
t = constRef targ
```

The result is the gradient of the input tuple's components, with respect
to the `Double`

result of `opError`

(the squared error). We can then use
this gradient to do gradient descent.

For a more fleshed out example, see the MNIST tutorial (also
rendered as a pdf)

## Todo

Actual profiling and benchmarking, to gauge how much overhead this library
adds over "manual" back-propagation.

Ideally this can be brought down to 0?

Some simple performance and API tweaks that are probably possible now and
would clearly benefit: (if you want to contribute)

a. Providing optimized `Num`

/`Fractional`

/`Floating`

instances for `BVal`

by supplying known gradients directly instead of relying on *ad*.

b. Switch from `ST s`

to `IO`

, and use `unsafePerformIO`

to automatically
bind `BVal`

s (like *ad* does) when using `liftB`

. This might remove
some overhead during graph building, and, from an API standpoint,
remove the need for explicit binding.

c. Switch from `STRef`

s/`IORef`

s to `Array`

. (This one I'm unclear if it
would help any)

Benchmark against competing back-propagation libraries like *ad*, and
auto-differentiating tensor libraries like *grenade*

Explore opportunities for parallelization. There are some naive ways of
directly parallelizing right now, but potential overhead should be
investigated.

Some open questions:

a. Is it possible to offer pattern matching on sum types/with different
constructors for implicit-graph backprop? It's possible for
explicit-graph versions already, with `choicesVar`

, but not yet with
the implicit-graph interface. Could be similar to an "Applicative vs.
Monad" issue where you can only have pre-determined fixed computation
paths when using `Applicative`

, but I'm not sure. Still, it would be
nice, because if this was possible, we could possibly do away with
explicit-graph mode completely.

b. Though we already have sum type support with explicit-graph mode, we
can't support GADTs yet. It'd be nice to see if this is possible,
because a lot of dependently typed neural network stuff is made much
simpler with GADTs.