Safe Haskell | None |
---|---|
Language | Haskell98 |
Main module of the stochastic gradient descent (SGD) library.
SGD is a method for optimizing a global objective function defined as a sum
of smaller, differentiable functions. The individual component functions
share the same set of parameters, represented by the ParamSet
class. This
allows for heterogeneous parameter representation (vectors, maps, custom
records, etc.).
The library adopts a Pipe
-based interface in which SGD
takes the form
of a process consuming dataset subsets (the so-called mini-batches) and
producing a stream of output parameter values. The library implements
different variants of SGD
(momentum
, adam
, adaDelta
)
which can be executed in either the pure context (run
) or in IO (runIO
).
The use of lower-level pipe-processing combinators (pipeRan
, batch
,
result
, etc.) is also possible.
To perform SGD, the gradients of the individual functions need to be determined. This can be done manually or automatically, using an automatic differentiation library (ad, backprop).
Synopsis
- momentum :: (Monad m, ParamSet p) => Config -> (e -> p -> p) -> SGD m e p
- adaDelta :: (Monad m, ParamSet p) => Config -> (e -> p -> p) -> SGD m e p
- adam :: (Monad m, ParamSet p) => Config -> (e -> p -> p) -> SGD m e p
- run :: ParamSet p => SGD Identity e p -> [e] -> p -> p
- data Config = Config {}
- runIO :: ParamSet p => Config -> SGD IO [e] p -> (e -> p -> Double) -> DataSet e -> p -> IO p
- pipeSeq :: DataSet e -> Producer e IO ()
- pipeRan :: DataSet e -> Producer e IO ()
- batch :: Monad m => Int -> Pipe e [e] m ()
- batchGradSeq :: ParamSet p => (e -> p -> p) -> [e] -> p -> p
- batchGradPar :: (ParamSet p, NFData p) => (e -> p -> p) -> [e] -> p -> p
- batchGradPar' :: ParamSet p => (e -> p -> p) -> [e] -> p -> p
- result :: Monad m => p -> Producer p m () -> m p
- keepEvery :: Monad m => Int -> Pipe a a m x
- decreasingBy :: (Monad m, Ord a) => (p -> m a) -> Pipe p p m x
- def :: Default a => a
Example
Let's say we have a list of functions defined as:
funs = [\x -> 0.3*x^2, \x -> -2*x, const 3, sin]
The global objective (which we want to minimize) is then defined as:
objective x = sum $ map ($x) funs
To perform SGD, we can either manually determine the individual derivatives:
derivs = [\x -> 0.6*x, const (-2), const 0, cos]
or use an automatic differentiation library, for instance:
import qualified Numeric.AD as AD derivs = map (\k -> AD.diff (funs !! k)) [0..length funs-1]
Finally, run
allows to approach a (potentially local) minimum of the
global objective function:
>>>
run (momentum def id) (take 10000 $ cycle derivs) 0.0
4.180177042912455
where:
SGD variants
:: (Monad m, ParamSet p) | |
=> Config | Momentum configuration |
-> (e -> p -> p) | Gradient on a training element |
-> SGD m e p |
Stochastic gradient descent with momentum. See Numeric.SGD.Momentum for more information.
:: (Monad m, ParamSet p) | |
=> Config | AdaDelta configuration |
-> (e -> p -> p) | Gradient on a training element |
-> SGD m e p |
Perform gradient descent using the AdaDelta algorithm. See Numeric.SGD.AdaDelta for more information.
:: (Monad m, ParamSet p) | |
=> Config | Adam configuration |
-> (e -> p -> p) | Gradient on a training element |
-> SGD m e p |
Perform gradient descent using the Adam algorithm. See Numeric.SGD.Adam for more information.
Pure SGD
:: ParamSet p | |
=> SGD Identity e p | Selected SGD method |
-> [e] | Training data stream |
-> p | Initial parameters |
-> p |
Traverse all the elements in the training data stream in one pass, calculate the subsequent gradients, and apply them progressively starting from the initial parameter values.
Consider using runIO
if your training dataset is large.
IO-based SGD
High-level IO-based SGD configuration
Config | |
|
Instances
Eq Config Source # | |
Ord Config Source # | |
Show Config Source # | |
Generic Config Source # | |
Default Config Source # | |
Defined in Numeric.SGD | |
type Rep Config Source # | |
Defined in Numeric.SGD type Rep Config = D1 (MetaData "Config" "Numeric.SGD" "sgd-0.7.0.1-JCqXBaIkTOOGdrVvsAY0u" False) (C1 (MetaCons "Config" PrefixI True) ((S1 (MetaSel (Just "iterNum") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Natural) :*: S1 (MetaSel (Just "batchSize") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Natural)) :*: (S1 (MetaSel (Just "batchOverlap") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Natural) :*: (S1 (MetaSel (Just "batchRandom") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Bool) :*: S1 (MetaSel (Just "reportEvery") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Double))))) |
:: ParamSet p | |
=> Config | SGD configuration |
-> SGD IO [e] p | SGD pipe consuming mini-batches of dataset elements |
-> (e -> p -> Double) | Value of the objective function on a dataset element (used for model quality reporting) |
-> DataSet e | Training dataset |
-> p | Initial parameter values |
-> IO p |
Perform SGD in the IO monad, regularly reporting the value of the objective function on the entire dataset. A higher-level wrapper which should be convenient to use when the training dataset is large.
An alternative is to use the simpler function run
, or to build a custom
SGD pipeline based on lower-level combinators (pipeSeq
, batch
,
adam
, keepEvery
, result
, etc.).
Combinators
Input
pipeRan :: DataSet e -> Producer e IO () Source #
Pipe all the elements in the dataset in a random order.
Batch
batch :: Monad m => Int -> Pipe e [e] m () Source #
Group dataset elements into (mini-)batches of the given size.
batchGradSeq :: ParamSet p => (e -> p -> p) -> [e] -> p -> p Source #
Adapt the gradient function to handle (mini-)batches. The function calculates the individual sub-gradients sequentially.
batchGradPar :: (ParamSet p, NFData p) => (e -> p -> p) -> [e] -> p -> p Source #
Adapt the gradient function to handle (mini-)batches. Relies on the p
's
NFData
instance to efficiently calculate gradients in parallel.
batchGradPar' :: ParamSet p => (e -> p -> p) -> [e] -> p -> p Source #
A version of batchGradPar
with no NFData
constraint. Evaluates the
sub-gradients calculated in parallel to weak head normal form.
Output
:: Monad m | |
=> p | Default value (in case the stream is empty) |
-> Producer p m () | Stream of parameter sets |
-> m p |
Extract the result of the SGD calculation (the last parameter set flowing downstream).
Misc
keepEvery :: Monad m => Int -> Pipe a a m x Source #
Keep every k
-th element flowing downstream and discard all the others.
decreasingBy :: (Monad m, Ord a) => (p -> m a) -> Pipe p p m x Source #
Make the stream decreasing in the given (monadic) function by discarding elements with values higher than those already seen.