--- author: - Justin Le fontfamily: 'palatino,cmtt' geometry: margin=1in links-as-notes: true title: Neural networks with backprop library --- The *backprop* library performs back-propagation over a *hetereogeneous* system of relationships. It offers both an implicit ([ad]-like) and explicit graph building usage style. Let’s use it to build neural networks! [ad]: http://hackage.haskell.org/package/ad Repository source is [on github], and so are the [rendered unstable docs]. [on github]: https://github.com/mstksg/backprop [rendered unstable docs]: https://mstksg.github.io/backprop ``` {.sourceCode .literate .haskell} {-# LANGUAGE DeriveGeneric #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE LambdaCase #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE StandaloneDeriving #-} {-# LANGUAGE TypeApplications #-} {-# LANGUAGE TypeInType #-} {-# LANGUAGE TypeOperators #-} {-# LANGUAGE ViewPatterns #-} {-# OPTIONS_GHC -fno-warn-orphans #-} {-# OPTIONS_GHC -fno-warn-unused-top-binds #-} import Data.Functor import Data.Kind import Data.Maybe import Data.Singletons import Data.Singletons.Prelude import Data.Singletons.TypeLits import Data.Type.Combinator import Data.Type.Product import GHC.Generics (Generic) import Numeric.Backprop import Numeric.Backprop.Iso import Numeric.LinearAlgebra.Static hiding (dot) import System.Random.MWC import qualified Generics.SOP as SOP ``` Ops === First, we define values of `Op` for the operations we want to do. `Op`s are bundles of functions packaged with their hetereogeneous gradients. For simple numeric functions, *backprop* can derive `Op`s automatically. But for matrix operations, we have to derive them ourselves. The types help us with matching up the dimensions, but we still need to be careful that our gradients are calculated correctly. `L` and `R` are matrix and vector types from the great *hmatrix* library. First, matrix-vector multiplication: ``` {.sourceCode .literate .haskell} matVec :: (KnownNat m, KnownNat n) => Op '[ L m n, R n ] (R m) matVec = op2' $ \m v -> ( m #> v , \(fromMaybe 1 -> g) -> (g `outer` v, tr m #> g) ) ``` Now, dot products: ``` {.sourceCode .literate .haskell} dot :: KnownNat n => Op '[ R n, R n ] Double dot = op2' $ \x y -> ( x <.> y , \case Nothing -> (y, x) Just g -> (konst g * y, x * konst g) ) ``` Polymorphic functions can be easily turned into `Op`s with `op1`/`op2` etc., but they can also be run directly on graph nodes. ``` {.sourceCode .literate .haskell} logistic :: Floating a => a -> a logistic x = 1 / (1 + exp (-x)) ``` A Simple Complete Example ========================= At this point, we already have enough to train a simple single-hidden-layer neural network: ``` {.sourceCode .literate .haskell} simpleOp :: (KnownNat m, KnownNat n, KnownNat o) => R m -> BPOpI s '[ L n m, R n, L o n, R o ] (R o) simpleOp inp = \(w1 :< b1 :< w2 :< b2 :< Ø) -> let z = logistic $ liftB2 matVec w1 x + b1 in logistic $ liftB2 matVec w2 z + b2 where x = constVar inp ``` Here, `simpleOp` is defined in implicit (non-monadic) style, given a tuple of inputs and returning outputs. Now `simpleOp` can be “run” with the input vectors and parameters (a `L n m`, `R n`, `L o n`, and `R o`) and calculate the output of the neural net. ``` {.sourceCode .literate .haskell} runSimple :: (KnownNat m, KnownNat n, KnownNat o) => R m -> Tuple '[ L n m, R n, L o n, R o ] -> R o runSimple inp = evalBPOp (implicitly $ simpleOp inp) ``` Alternatively, we can define `simpleOp` in explicit monadic style, were we specify our graph nodes explicitly. The results should be the same. ``` {.sourceCode .literate .haskell} simpleOpExplicit :: (KnownNat m, KnownNat n, KnownNat o) => R m -> BPOp s '[ L n m, R n, L o n, R o ] (R o) simpleOpExplicit inp = withInps $ \(w1 :< b1 :< w2 :< b2 :< Ø) -> do -- First layer y1 <- matVec ~$ (w1 :< x1 :< Ø) let x2 = logistic (y1 + b1) -- Second layer y2 <- matVec ~$ (w2 :< x2 :< Ø) return $ logistic (y2 + b2) where x1 = constVar inp ``` Now, for the magic of *backprop*: the library can now take advantage of the implicit (or explicit) graph and use it to do back-propagation, too! ``` {.sourceCode .literate .haskell} simpleGrad :: forall m n o. (KnownNat m, KnownNat n, KnownNat o) => R m -> R o -> Tuple '[ L n m, R n, L o n, R o ] -> Tuple '[ L n m, R n, L o n, R o ] simpleGrad inp targ params = gradBPOp opError params where opError :: BPOp s '[ L n m, R n, L o n, R o ] Double opError = do res <- implicitly $ simpleOp inp -- we explicitly bind err to prevent recomputation err <- bindVar $ res - t dot ~$ (err :< err :< Ø) where t = constVar targ ``` The result is the gradient of the input tuple’s components, with respect to the `Double` result of `opError` (the squared error). We can then use this gradient to do gradient descent. With Parameter Containers ========================= This method doesn’t quite scale, because we might want to make networks with multiple layers and parameterize networks by layers. Let’s make some basic container data types to help us organize our types, including a recursive `Network` type that lets us chain multiple layers. ``` {.sourceCode .literate .haskell} data Layer :: Nat -> Nat -> Type where Layer :: { _lWeights :: L m n , _lBiases :: R m } -> Layer n m deriving (Show, Generic) data Network :: Nat -> [Nat] -> Nat -> Type where NØ :: !(Layer a b) -> Network a '[] b (:&) :: !(Layer a b) -> Network b bs c -> Network a (b ': bs) c ``` A `Layer n m` is a layer taking an n-vector and returning an m-vector. A `Network a '[b, c, d] e` would be a Network that takes in an a-vector and outputs an e-vector, with hidden layers of sizes b, c, and d. Isomorphisms ------------ The *backprop* library lets you apply operations on “parts” of data types (like on the weights and biases of a `Layer`) by using `Iso`’s (isomorphisms), like the ones from the *lens* library. The library doesn’t depend on lens, but it can use the `Iso`s from the library and also custom-defined ones. First, we can auto-generate isomorphisms using the *generics-sop* library: ``` {.sourceCode .literate .haskell} instance SOP.Generic (Layer n m) ``` And then can create isomorphisms by hand for the two `Network` constructors: ``` {.sourceCode .literate .haskell} netExternal :: Iso' (Network a '[] b) (Tuple '[Layer a b]) netExternal = iso (\case NØ x -> x ::< Ø) (\case I x :< Ø -> NØ x ) netInternal :: Iso' (Network a (b ': bs) c) (Tuple '[Layer a b, Network b bs c]) netInternal = iso (\case x :& xs -> x ::< xs ::< Ø) (\case I x :< I xs :< Ø -> x :& xs ) ``` An `Iso' a (Tuple as)` means that an `a` can really just be seen as a tuple of `as`. Running a network ================= Now, we can write the `BPOp` that reprenents running the network and getting a result. We pass in a `Sing bs` (a singleton list of the hidden layer sizes) so that we can “pattern match” on the list and handle the different network constructors differently. ``` {.sourceCode .literate .haskell} netOp :: forall s a bs c. (KnownNat a, KnownNat c) => Sing bs -> BPOp s '[ R a, Network a bs c ] (R c) netOp sbs = go sbs where go :: forall d es. KnownNat d => Sing es -> BPOp s '[ R d, Network d es c ] (R c) go = \case SNil -> withInps $ \(x :< n :< Ø) -> do -- peek into the NØ using netExternal iso l :< Ø <- netExternal #<~ n -- run the 'layerOp' BP, with x and l as inputs bpOp layerOp ~$ (x :< l :< Ø) SNat `SCons` ses -> withInps $ \(x :< n :< Ø) -> withSingI ses $ do -- peek into the (:&) using the netInternal iso l :< n' :< Ø <- netInternal #<~ n -- run the 'layerOp' BP, with x and l as inputs z <- bpOp layerOp ~$ (x :< l :< Ø) -- run the 'go ses' BP, with z and n as inputs bpOp (go ses) ~$ (z :< n' :< Ø) layerOp :: forall d e. (KnownNat d, KnownNat e) => BPOp s '[ R d, Layer d e ] (R e) layerOp = withInps $ \(x :< l :< Ø) -> do -- peek into the layer using the gTuple iso, auto-generated with SOP.Generic w :< b :< Ø <- gTuple #<~ l y <- matVec ~$ (w :< x :< Ø) return $ logistic (y + b) ``` There’s some singletons work going on here, but it’s fairly standard singletons stuff. Most of the complexity here is from the static typing in our neural network type, and *not* from *backprop*. From *backprop* specifically, the only elements are `#<~` lets you “split” an input ref with the given iso, and `bpOp`, which converts a `BPOp` into an `Op` that you can bind with `~$`. Note that this library doesn’t support truly pattern matching on GADTs, and that we had to pass in `Sing bs` as a reference to the structure of our networks. Gradient Descent ---------------- Now we can do simple gradient descent. Defining an error function: ``` {.sourceCode .literate .haskell} errOp :: KnownNat m => R m -> BVar s rs (R m) -> BPOp s rs Double errOp targ r = do err <- bindVar $ r - t dot ~$ (err :< err :< Ø) where t = constVar targ ``` And now, we can use `backprop` to generate the gradient, and shift the `Network`! Things are made a bit cleaner from the fact that `Network a bs c` has a `Num` instance, so we can use `(-)` and `(*)` etc. ``` {.sourceCode .literate .haskell} train :: (KnownNat a, SingI bs, KnownNat c) => Double -> R a -> R c -> Network a bs c -> Network a bs c train r x t n = case backprop (errOp t =<< netOp sing) (x ::< n ::< Ø) of (_, _ :< I g :< Ø) -> n - (realToFrac r * g) ``` (`(::<)` is cons and `Ø` is nil for tuples.) Main ==== `main`, which will train on sample data sets, is still in progress! Right now it just generates a random network using the *mwc-random* library and prints each internal layer. ``` {.sourceCode .literate .haskell} main :: IO () main = withSystemRandom $ \g -> do n <- uniform @(Network 4 '[3,2] 1) g void $ traverseNetwork sing (\l -> l <$ print l) n ``` Appendix: Boilerplate ===================== And now for some typeclass instances and boilerplates unrelated to the *backprop* library that makes our custom types easier to use. ``` {.sourceCode .literate .haskell} instance KnownNat n => Variate (R n) where uniform g = randomVector <$> uniform g <*> pure Uniform uniformR (l, h) g = (\x -> x * (h - l) + l) <$> uniform g instance (KnownNat m, KnownNat n) => Variate (L m n) where uniform g = uniformSample <$> uniform g <*> pure 0 <*> pure 1 uniformR (l, h) g = (\x -> x * (h - l) + l) <$> uniform g instance (KnownNat n, KnownNat m) => Variate (Layer n m) where uniform g = subtract 1 . (* 2) <$> (Layer <$> uniform g <*> uniform g) uniformR (l, h) g = (\x -> x * (h - l) + l) <$> uniform g instance (KnownNat m, KnownNat n) => Num (Layer n m) where Layer w1 b1 + Layer w2 b2 = Layer (w1 + w2) (b1 + b2) Layer w1 b1 - Layer w2 b2 = Layer (w1 - w2) (b1 - b2) Layer w1 b1 * Layer w2 b2 = Layer (w1 * w2) (b1 * b2) abs (Layer w b) = Layer (abs w) (abs b) signum (Layer w b) = Layer (signum w) (signum b) negate (Layer w b) = Layer (negate w) (negate b) fromInteger x = Layer (fromInteger x) (fromInteger x) instance (KnownNat m, KnownNat n) => Fractional (Layer n m) where Layer w1 b1 / Layer w2 b2 = Layer (w1 / w2) (b1 / b2) recip (Layer w b) = Layer (recip w) (recip b) fromRational x = Layer (fromRational x) (fromRational x) instance (KnownNat a, SingI bs, KnownNat c) => Variate (Network a bs c) where uniform g = genNet sing (uniform g) uniformR (l, h) g = (\x -> x * (h - l) + l) <$> uniform g genNet :: forall f a bs c. (Applicative f, KnownNat a, KnownNat c) => Sing bs -> (forall d e. (KnownNat d, KnownNat e) => f (Layer d e)) -> f (Network a bs c) genNet sbs f = go sbs where go :: forall d es. KnownNat d => Sing es -> f (Network d es c) go = \case SNil -> NØ <$> f SNat `SCons` ses -> (:&) <$> f <*> go ses mapNetwork0 :: forall a bs c. (KnownNat a, KnownNat c) => Sing bs -> (forall d e. (KnownNat d, KnownNat e) => Layer d e) -> Network a bs c mapNetwork0 sbs f = getI $ genNet sbs (I f) traverseNetwork :: forall a bs c f. (KnownNat a, KnownNat c, Applicative f) => Sing bs -> (forall d e. (KnownNat d, KnownNat e) => Layer d e -> f (Layer d e)) -> Network a bs c -> f (Network a bs c) traverseNetwork sbs f = go sbs where go :: forall d es. KnownNat d => Sing es -> Network d es c -> f (Network d es c) go = \case SNil -> \case NØ x -> NØ <$> f x SNat `SCons` ses -> \case x :& xs -> (:&) <$> f x <*> go ses xs mapNetwork1 :: forall a bs c. (KnownNat a, KnownNat c) => Sing bs -> (forall d e. (KnownNat d, KnownNat e) => Layer d e -> Layer d e) -> Network a bs c -> Network a bs c mapNetwork1 sbs f = getI . traverseNetwork sbs (I . f) mapNetwork2 :: forall a bs c. (KnownNat a, KnownNat c) => Sing bs -> (forall d e. (KnownNat d, KnownNat e) => Layer d e -> Layer d e -> Layer d e) -> Network a bs c -> Network a bs c -> Network a bs c mapNetwork2 sbs f = go sbs where go :: forall d es. KnownNat d => Sing es -> Network d es c -> Network d es c -> Network d es c go = \case SNil -> \case NØ x -> \case NØ y -> NØ (f x y) SNat `SCons` ses -> \case x :& xs -> \case y :& ys -> f x y :& go ses xs ys instance (KnownNat a, SingI bs, KnownNat c) => Num (Network a bs c) where (+) = mapNetwork2 sing (+) (-) = mapNetwork2 sing (-) (*) = mapNetwork2 sing (*) negate = mapNetwork1 sing negate abs = mapNetwork1 sing abs signum = mapNetwork1 sing signum fromInteger x = mapNetwork0 sing (fromInteger x) instance (KnownNat a, SingI bs, KnownNat c) => Fractional (Network a bs c) where (/) = mapNetwork2 sing (/) recip = mapNetwork1 sing recip fromRational x = mapNetwork0 sing (fromRational x) ```