backprop
Literate Haskell Tutorial/Demo on MNIST data set (and PDF
rendering)
Automatic heterogeneous backpropagation that can be used either implicitly
(in the style of the ad library) or using explicit graphs built in
monadic style. Implements reversemode automatic differentiation. Differs
from ad by offering full heterogeneity  each intermediate step and the
resulting value can have different types. Mostly intended for usage with
tensor manipulation libraries to implement automatic backpropagation for
gradient descent and other optimization techniques.
Documentation is currently rendered on github pages!
MNIST Digit Classifier Example
Tutorial and example on training on the MNIST data set available here as a
literate haskell file, or rendered here as a PDF!
Read this first!
Brief example
The quick example below describes the running of a neural network with one
hidden layer to calculate its squared error with respect to target targ
,
which is parameterized by two weight matrices and two bias vectors.
Vector/matrix types are from the hmatrix package.
logistic :: Floating a => a > a
logistic x = 1 / (1 + exp (x))
matVec
:: (KnownNat m, KnownNat n)
=> Op '[ L m n, R n ] (R m)
neuralNetImplicit
:: (KnownNat m, KnownNat n, KnownNat o)
=> R m
> BPOpI s '[ L n m, R n, L o n, R o ] (R o)
neuralNetImplicit inp = \(w1 :< b1 :< w2 :< b2 :< Ø) >
let z = logistic (liftB2 matVec w1 x + b1)
in logistic (liftB2 matVec w2 z + b2)
where
x = constRef inp
neuralNetExplicit
:: (KnownNat m, KnownNat n, KnownNat o)
=> R m
> BPOp s '[ L n m, R n, L o n, R o ] (R o)
neuralNetExplicit inp = withInps $ \(w1 :< b1 :< w2 :< b2 :< Ø) > do
y1 < matVec ~$ (w1 :< x1 :< Ø)
let x2 = logistic (y1 + b1)
y2 < matVec ~$ (w2 :< x2 :< Ø)
return $ logistic (y2 + b2)
where
x1 = constVar inp
Now neuralNetExplicit
and neuralNetImplicit
can be "run" with the input
vectors and parameters (a L n m
, R n
, L o n
, and R o
) and calculate the
output of the neural net.
runNet
:: (KnownNat m, KnownNat n, KnownNat o)
=> R m
> Tuple '[ L n m, R n, L o n, R o ]
> R o
runNet inp = evalBPOp (neuralNetExplicit inp)
But, in defining neuralNet
, we also generated a graph that backprop can
use to do backpropagation, too!
dot :: KnownNat n
=> Op '[ R n , R n ] Double
netGrad
:: forall m n o. (KnownNat m, KnownNat n, KnownNat o)
=> R m
> R o
> Tuple '[ L n m, R n, L o n, R o ]
> Tuple '[ L n m, R n, L o n, R o ]
netGrad inp targ params = gradBPOp opError params
where
 calculate squared error, in *explicit* style
opError :: BPOp s '[ L n m, R n, L o n, R o ] Double
opError = do
res < neuralNetExplicit inp
err < bindRef (res  t)
dot ~$ (err :< err :< Ø)
where
t = constRef targ
The result is the gradient of the input tuple's components, with respect
to the Double
result of opError
(the squared error). We can then use
this gradient to do gradient descent.
For a more fleshed out example, see the MNIST tutorial (also
rendered as a pdf)
Todo

Actual profiling and benchmarking, to gauge how much overhead this library
adds over "manual" backpropagation.
Ideally this can be brought down to 0?

Some simple performance and API tweaks that are probably possible now and
would clearly benefit: (if you want to contribute)
a. Providing optimized Num
/Fractional
/Floating
instances for BVal
by supplying known gradients directly instead of relying on ad.
b. Switch from ST s
to IO
, and use unsafePerformIO
to automatically
bind BVal
s (like ad does) when using liftB
. This might remove
some overhead during graph building, and, from an API standpoint,
remove the need for explicit binding.
c. Switch from STRef
s/IORef
s to Array
. (This one I'm unclear if it
would help any)

Benchmark against competing backpropagation libraries like ad, and
autodifferentiating tensor libraries like grenade

Explore opportunities for parallelization. There are some naive ways of
directly parallelizing right now, but potential overhead should be
investigated.

Some open questions:
a. Is it possible to offer pattern matching on sum types/with different
constructors for implicitgraph backprop? It's possible for
explicitgraph versions already, with choicesVar
, but not yet with
the implicitgraph interface. Could be similar to an "Applicative vs.
Monad" issue where you can only have predetermined fixed computation
paths when using Applicative
, but I'm not sure. Still, it would be
nice, because if this was possible, we could possibly do away with
explicitgraph mode completely.
b. Though we already have sum type support with explicitgraph mode, we
can't support GADTs yet. It'd be nice to see if this is possible,
because a lot of dependently typed neural network stuff is made much
simpler with GADTs.