This library provides definitions and algorithms for various graphical models
such as mixture models, kalman Filters, and restricted Boltzmann machines, as
well as algorithms for fitting them e.g.  expectation maximization and
contrastive divergence minimization. Underlying all of these models is is a
generalized linear object known as a Harmonium, and in the following I will
briefly introduce them.

The core definition of this library is a `Manifold` of joint distributions I
call of an `AffineHarmonium`
```haskell
newtype AffineHarmonium f y x z w = AffineHarmonium (Affine f y z x, w)
```
which is a product `Manifold` composed of a of a `Manifold` of likelihood
functions `Affine f y z x`, and a `Manifold` of distributions `w` that partially
define the space of priors. `AffineHarmoniums` provide a bit more flexibility
than what I call a `Harmonium`
```haskell
type Harmonium f z w = AffineHarmonium f z w z w
```
which is a simpler object. Nevertheless, from a theoretical point of view, an
`AffineHarmonium` is a special case of a `Harmonium`, and we may think of them
more or less equivalently.

A `Harmonium` is a model over a observable variables and latent variables, and represents a sort of generalized linear joint distribution over the two of them. The theory for `Harmonium`s is summarized well by [this paper](https://papers.nips.cc/paper/2004/hash/0e900ad84f63618452210ab8baae0218-Abstract.html) and [this paper](https://proceedings.neurips.cc/paper/2013/hash/28f0b864598a1291557bed248a998d4e-Abstract.html). Although `Harmonium`s might seem like a little-studied, and esoteric object, various well known models, such as mixture models and restricted Boltzmann machines, are in fact `Harmonium`s, and various other models, such as factor analysis, Kalman filters, or hidden Markov models, can be expressed in terms of them.

All of the aforementioned models can be fit with Expectation-Maximization (EM), and EM can be expressed in an entirely general manner for `Harmonium`s. Firstly, the expectation step of a `Harmonium` is implemented by
```haskell
expectationStep
    :: ( ExponentialFamily z, Map Natural f x y, Bilinear f y x
       , Translation z y, Translation w x, LegendreExponentialFamily w )
    => Sample z -- ^ Model Samples
    -> Natural # AffineHarmonium f y x z w -- ^ Harmonium
    -> Mean # AffineHarmonium f y x z w -- ^ Harmonium expected sufficient statistics
expectationStep zs hrm =
    let mzs = sufficientStatistic <$> zs
        mys = anchor <$> mzs
        pstr = fst . split $ transposeHarmonium hrm
        mws = transition <$> pstr >$> mys
        mxs = anchor <$> mws
        myx = (>$<) mys mxs
     in joinHarmonium (average mzs) myx $ average mws
```
In summary, what we do is

- take some observations,
- compute their `sufficientStatistics`,
- map these statistics into the predictions of the latent variables,
- transition these latent predictions from `Natural` coordinates to `Mean` coordinates,
- and assemble the results into the `Mean` `sufficientStatistics` of the joint distribution.

The maximization step then consists simply of mapping the whole joint distribution from `Mean` back to `Natural` coordinates, such that `expectationMaximization` may be expressed as
```haskell
expectationMaximization
    :: ( DuallyFlatExponentialFamily (AffineHarmonium f y x z w)
       , ExponentialFamily z, Map Natural f x y, Bilinear f y x
       , Translation z y, Translation w x, LegendreExponentialFamily w )
    => Sample z
    -> Natural # AffineHarmonium f y x z w
    -> Natural # AffineHarmonium f y x z w
expectationMaximization zs hrm = transition $ expectationStep zs hrm
```

As such, for a wide variety of models, we may reduce implementing expectation maximization to instantating the class requirements of the `expectationMaximization` function. This is rarely trivial, but in some sense, much more straight-forward and well-defined that deriving EM algorithms from scratch.

For in-depth tutorials visit my
[blog](https://sacha-sokoloski.gitlab.io/website/pages/blog.html).