backprop ======== [![Build Status](https://travis-ci.org/mstksg/backprop.svg?branch=master)](https://travis-ci.org/mstksg/backprop) [**Literate Haskell Tutorial/Demo on MNIST data set**][mnist-lhs] (and [PDF rendering][mnist-pdf]) Automatic *heterogeneous* back-propagation that can be used either *implicitly* (in the style of the [ad][] library) or using *explicit* graphs built in monadic style. Implements reverse-mode automatic differentiation. Differs from [ad][] by offering full heterogeneity -- each intermediate step and the resulting value can have different types. Mostly intended for usage with tensor manipulation libraries to implement automatic back-propagation for gradient descent and other optimization techniques. [ad]: http://hackage.haskell.org/package/ad Documentation is currently rendered [on github pages][docs]! [docs]: https://mstksg.github.io/backprop MNIST Digit Classifier Example ------------------------------ Tutorial and example on training on the MNIST data set [available here as a literate haskell file][mnist-lhs], or [rendered here as a PDF][mnist-pdf]! **Read this first!** [mnist-lhs]: https://github.com/mstksg/backprop/blob/master/samples/MNIST.lhs [mnist-pdf]: https://github.com/mstksg/backprop/blob/master/renders/MNIST.pdf Brief example ------------- The quick example below describes the running of a neural network with one hidden layer to calculate its squared error with respect to target `targ`, which is parameterized by two weight matrices and two bias vectors. Vector/matrix types are from the *hmatrix* package. ~~~haskell logistic :: Floating a => a -> a logistic x = 1 / (1 + exp (-x)) matVec :: (KnownNat m, KnownNat n) => Op '[ L m n, R n ] (R m) neuralNetImplicit :: (KnownNat m, KnownNat n, KnownNat o) => R m -> BPOpI s '[ L n m, R n, L o n, R o ] (R o) neuralNetImplicit inp = \(w1 :< b1 :< w2 :< b2 :< Ø) -> let z = logistic (liftB2 matVec w1 x + b1) in logistic (liftB2 matVec w2 z + b2) where x = constRef inp neuralNetExplicit :: (KnownNat m, KnownNat n, KnownNat o) => R m -> BPOp s '[ L n m, R n, L o n, R o ] (R o) neuralNetExplicit inp = withInps $ \(w1 :< b1 :< w2 :< b2 :< Ø) -> do y1 <- matVec ~$ (w1 :< x1 :< Ø) let x2 = logistic (y1 + b1) y2 <- matVec ~$ (w2 :< x2 :< Ø) return $ logistic (y2 + b2) where x1 = constVar inp ~~~ Now `neuralNetExplicit` and `neuralNetImplicit` can be "run" with the input vectors and parameters (a `L n m`, `R n`, `L o n`, and `R o`) and calculate the output of the neural net. ~~~haskell runNet :: (KnownNat m, KnownNat n, KnownNat o) => R m -> Tuple '[ L n m, R n, L o n, R o ] -> R o runNet inp = evalBPOp (neuralNetExplicit inp) ~~~ But, in defining `neuralNet`, we also generated a graph that *backprop* can use to do back-propagation, too! ~~~haskell dot :: KnownNat n => Op '[ R n , R n ] Double netGrad :: forall m n o. (KnownNat m, KnownNat n, KnownNat o) => R m -> R o -> Tuple '[ L n m, R n, L o n, R o ] -> Tuple '[ L n m, R n, L o n, R o ] netGrad inp targ params = gradBPOp opError params where -- calculate squared error, in *explicit* style opError :: BPOp s '[ L n m, R n, L o n, R o ] Double opError = do res <- neuralNetExplicit inp err <- bindRef (res - t) dot ~$ (err :< err :< Ø) where t = constRef targ ~~~ The result is the gradient of the input tuple's components, with respect to the `Double` result of `opError` (the squared error). We can then use this gradient to do gradient descent. For a more fleshed out example, see the [MNIST tutorial][mnist-lhs] (also [rendered as a pdf][mnist-pdf]) Todo ---- 1. Actual profiling and benchmarking, to gauge how much overhead this library adds over "manual" back-propagation. Ideally this can be brought down to 0? 2. Some simple performance and API tweaks that are probably possible now and would clearly benefit: (if you want to contribute) a. Providing optimized `Num`/`Fractional`/`Floating` instances for `BVal` by supplying known gradients directly instead of relying on *ad*. b. Switch from `ST s` to `IO`, and use `unsafePerformIO` to automatically bind `BVal`s (like *ad* does) when using `liftB`. This might remove some overhead during graph building, and, from an API standpoint, remove the need for explicit binding. c. Switch from `STRef`s/`IORef`s to `Array`. (This one I'm unclear if it would help any) 3. Benchmark against competing back-propagation libraries like *ad*, and auto-differentiating tensor libraries like *[grenade][]* [grenade]: https://github.com/HuwCampbell/grenade 4. Explore opportunities for parallelization. There are some naive ways of directly parallelizing right now, but potential overhead should be investigated. 5. Some open questions: a. Is it possible to offer pattern matching on sum types/with different constructors for implicit-graph backprop? It's possible for explicit-graph versions already, with `choicesVar`, but not yet with the implicit-graph interface. Could be similar to an "Applicative vs. Monad" issue where you can only have pre-determined fixed computation paths when using `Applicative`, but I'm not sure. Still, it would be nice, because if this was possible, we could possibly do away with explicit-graph mode completely. b. Though we already have sum type support with explicit-graph mode, we can't support GADTs yet. It'd be nice to see if this is possible, because a lot of dependently typed neural network stuff is made much simpler with GADTs.