úÎ!©° 2„      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ ` a b c d e f g h i j k l m n o p q r s tuvwxyz{|}~€‚ƒ None"#É sgdDataset stored on a disksgdJThe size of the dataset; the individual indices are [0, 1, ..., size - 1]sgd1Get the dataset element with the given identifiersgd+Lazily load the entire dataset from a disk.sgdJRandom dataset sample with a specified number of elements (loaded eagerly)sgd\Construct dataset from a list of elements, store it as a vector, and run the given handler.sgd{Construct dataset from a list of elements, store it on a disk and run the given handler. Training elements must have the „% instance for this function to work.…sgdYLazily evaluate each action in the sequence from left to right, and collect the results.†sgd† f is equivalent to … . ‡ f.None78>UV_fAûˆsgd(Helper class for automatically deriving   using GHC Generics.‰sgd(Helper class for automatically deriving  using GHC Generics.Šsgd(Helper class for automatically deriving  using GHC Generics.‹sgd(Helper class for automatically deriving  using GHC Generics.Œsgd(Helper class for automatically deriving   using GHC Generics.sgd(Helper class for automatically deriving   using GHC Generics. sgdÜClass of types that can be treated as parameter sets. It provides basic element-wise operations (addition, multiplication, mapping) which are required to perform stochastic gradient descent. Many of the operations ( , ,  , w, etc.) have the same interpretation and follow the same laws (e.g. associativity) as the corresponding operations in Ž and .   takes a parameter set as argument and "zero out"'s all its elements (as in the backprop library). This allows instances for , ‘e, etc., where the structure of the parameter set is dynamic. This leads to the following property: add (zero x) x = x However,   does not have to obey (add (zero x) y = y).A  2 can be also seen as a (structured) vector, hence   and j. The latter is not strictly necessary to perform SGD, but it is useful to control the training process.  should obey the following law:  pmap id x = x®If you leave the body of an instance declaration blank, GHC Generics will be used to derive instances if the type has a single constructor and each field is an instance of  . sgdElement-wise mapping sgdZero-out all elements sgdElement-wise addition sgdElementi-wise substructionsgdElement-wise multiplicationsgdElement-wise divisionsgdL2 norm’sgd ; using GHC Generics; works if all fields are instances of  /, but only for values with single constructors.“sgd ; using GHC Generics; works if all fields are instances of  /, but only for values with single constructors.”sgd; using GHC Generics; works if all fields are instances of  /, but only for values with single constructors.•sgd; using GHC Generics; works if all fields are instances of  /, but only for values with single constructors.–sgd; using GHC Generics; works if all fields are instances of  /, but only for values with single constructors.—sgd ; using GHC Generics; works if all fields are instances of  /, but only for values with single constructors.)sgdXA map with different parameter sets (of the same type) assigned to the individual keys.[When combining two maps with different sets of keys, only their intersection is preserved.*sgd˜6 represents a deactivated parameter set component. If ˜( is given as an argument to one of the   operations, the result is ˜ as well.MThis differs from the corresponding instance in the backprop library, where ˜ý is equivalent to `Just 0`. However, the implementation below seems to correspond adequately enough to the notion that a particular component is either active or not in both the parameter set and the gradient, hence it doesn't make sense to combine ™ with ˜.      NoneMGê4sgd,Signed real value in the logarithmic domain.6sgdPositive component7sgdNegative component8sgdSmart LogSigned constructor.9sgd2Make LogSigned from a positive, log-domain number.:sgd2Make LogSigned from a negative, log-domain number.;sgd#Shift LogSigned to a normal domain.<sgd Change the 4 to either negative š › or positive œ ›. 456789:;< 456789:;<NoneV6 Bsgd×Gradient with nonzero values stored in a logarithmic domain. Since values equal to zero have no impact on the update phase of the SGD method, it is more efficient to not to store those components in the gradient.Csgd?Add normal-domain double to the gradient at the given position.DsgdDAdd log-domain, singed number to the gradient at the given position.Esgd~Construct gradient from a list of (index, value) pairs. All values from the list are added at respective gradient positions.Fsgd“Construct gradient from a list of (index, signed, log-domain number) pairs. All values from the list are added at respective gradient positions.Gsgd9Collect gradient components with values in normal domain.Hsgd0Empty gradient, i.e. with all elements set to 0.IsgdKPerform parallel unions operation on gradient list. Experimental version.sgd!Parallel unions in the Par monad.BCDEFGHIBHCDEFGINone"#d žsgd3Type synonym for mutable vector with Double values.JsgdVector of parameters.Ksgd0SGD parameters controlling the learning process.MsgdSize of the batchNsgdRegularization varianceOsgdNumber of iterationsPsgdInitial gain parameterQsgdOAfter how many iterations over the entire dataset the gain parameter is halvedRsgdDefault SGD parameter values.SsgdA stochastic gradient descent method. A notification function can be used to provide user with information about the progress of the learning.Ÿsgd8Add up all gradients and store results in normal domain. sgd$Scale the vector by the given value.¡sgdYApply gradient to the parameters vector, that is add the first vector to the second one.SsgdSGD parameter valuessgdNotification run every updatesgdGradient for dataset elementsgdDatasetsgdStarting pointsgd SGD resultBCDEFGHIJKLMNOPQRS KLMNOPQRJSNone"#w|¢sgd3Type synonym for mutable vector with Double values.TsgdVector of parameters.Usgd0SGD parameters controlling the learning process.WsgdSize of the batchXsgdRegularization varianceYsgdNumber of iterationsZsgdInitial gain parameter[sgdOAfter how many iterations over the entire dataset the gain parameter is halved\sgdDefault SGD parameter values.£sgd*The gamma parameter which drives momentum.TODO: put in SgdArgs.]sgdA stochastic gradient descent method. A notification function can be used to provide user with information about the progress of the learning.¤sgd+Compute the new momentum (gradient) vector.¥sgd+Compute the new momentum (gradient) vector.¦sgd8Add up all gradients and store results in normal domain.§sgd$Scale the vector by the given value.¨sgdYApply gradient to the parameters vector, that is add the first vector to the second one.]sgdSGD parameter valuessgdNotification run every updatesgdGradient for dataset elementsgdDataSetsgdStarting pointsgd SGD result¤sgdRegularization parametersgdThe parameterssgdThe current gradient¥sgdThe gamma parametersgdThe previous momentumsgdThe scaled current gradientBCDEFGHITUVWXYZ[\] UVWXYZ[\T]Safezê^sgd]SGD is a pipe which, given the initial parameter values, consumes training elements of type eA and outputs the subsequently calculated parameter sets of type p.^^ None"#7€¯_sgdMomentum configurationasgd2Initial gain parameter, used to scale the gradientbsgdAAfter how many gradient calculations the gain parameter is halvedcsgd Momentum termdsgd/Stochastic gradient descent with momentum. See Numeric.SGD.Momentum for more information.©sgdScalingdsgdMomentum configurationsgdGradient on a training element_`cabd_`cabd None"#7…ÎjsgdAdaDelta configurationlsgdExponential decay parametermsgd Epsilon valuensgd>Perform gradient descent using the AdaDelta algorithm. See Numeric.SGD.AdaDelta for more information.ªsgdScaling«sgd Root square¬sgdSquarensgdAdaDelta configurationsgdGradient on a training elementjkmlnjkmlnNone"#7Ÿ¾ tsgd%High-level IO-based SGD configurationvsgd4Number of iteration over the entire training datasetwsgdÉShould the mini-batch be selected at random? If not, the subsequent training elements will be picked sequentially. Random selection gives no guarantee of seeing each training sample in every epoch.xsgdHHow often the value of the objective function should be reported (with `1`. meaning once per pass over the training data)ysgd±Traverse all the elements in the training data stream in one pass, calculate the subsequent gradients, and apply them progressively starting from the initial parameter values.Consider using z# if your training dataset is large.zsgdËPerform SGD in the IO monad, regularly reporting the value of the objective function on the entire dataset. A higher-level wrapper which should be convenient to use when the training dataset is large..An alternative is to use the simpler function yG, or to build a custom SGD pipeline based on lower-level combinators ({, n, ~, }, etc.).{sgd(Pipe the dataset sequentially in a loop.|sgd$Pipe the dataset randomly in a loop.}sgdWExtract the result of the SGD calculation (the last parameter set flowing downstream).~sgdApply the given function every k param sets flowing downstream.ysgdSelected SGD methodsgdTraining data streamsgdInitial parameterszsgdSGD configurationsgdSelected SGD methodsgdYValue of the objective function on a sample element (needed for model quality reporting)sgdTraining datasetsgdInitial parameter values}sgd+Default value (in case the stream is empty)sgdStream of parameter setsdntuvwxyz{|}~dnytuvwxz{|}~­  !"#$%&'()*+,-./0123456789:;<=>?@@ABCDEFGHIJKLMNOPQRSTUUVWXYZ[\TUUVWXYZ[\] ^ ^ Y Z _ ` a b c d e ^ ^ f g h a b c d e^^Xijklmnopabcdeqrstuvwxyz{|}~v€v‚vƒ„…†‡ˆ‰Š‹ŒvƒŽvƒv‘’“”v•–—˜™š—_›œ˜™š ™ ™ žŸ"sgd-0.5.0.0-LTFyabUnvFz8Z2zpTerSay Numeric.SGDNumeric.SGD.DataSetNumeric.SGD.ParamSetNumeric.SGD.Sparse.LogSignedNumeric.SGD.Sparse.GradNumeric.SGD.SparseNumeric.SGD.Sparse.MomentumNumeric.SGD.TypeNumeric.SGD.MomentumNumeric.SGD.AdaDelta1data-default-class-0.1.2.0-FeIQ5tLoVZBHMSgrT9zptQData.Default.ClassdefDataSetsizeelemAtloadData randomSamplewithVectwithDiskParamSetpmapzeroaddsubmuldivnorm_2$fGAddM1$fGAddU1$fGAddV1 $fGAdd:*:$fGSubM1$fGSubU1$fGSubV1 $fGSub:*:$fGMulM1$fGMulU1$fGMulV1 $fGMul:*:$fGDivM1$fGDivU1$fGDivV1 $fGDiv:*: $fGNorm2M1 $fGNorm2U1 $fGNorm2V1 $fGNorm2:*: $fGPMapM1 $fGPMapU1 $fGPMapV1 $fGPMap:*: $fParamSetMap$fParamSetMaybe $fParamSetL $fParamSetR$fParamSetDouble $fGPMapK1 $fGNorm2K1$fGDivK1$fGMulK1$fGSubK1$fGAddK1 LogSignedposneg logSignedfromPosfromNegtoNorm toLogFloat$fNumLogSigned$fNFDataLogSigned$fOrdLogSigned $fEqLogSigned$fShowLogSignedGradaddLfromList fromLogListtoListempty parUnionsParaSgdArgs batchSizeregVariterNumgain0tausgdArgsDefaultsgdSGDConfiggammamomentum$fDefaultConfig $fShowConfig $fEqConfig $fOrdConfig$fGenericConfigdecayepsadaDelta batchRandom reportEveryrunrunIOpipeSeqpipeRanresulteverybinary-0.8.6.0Data.Binary.ClassBinary lazySequencelazyMapMbaseGHC.BasemapGPMapGNorm2GDivGMulGSubGAddGHC.NumNumGHC.Real Fractional GHC.MaybeMaybecontainers-0.6.0.1Data.Map.InternalMap genericAdd genericSub genericDiv genericMul genericNorm2 genericPMapNothingJust Data.EitherLeft(logfloat-0.13.3.3-INZZl0jI1XtLqGOkSuzmnAData.Number.LogFloatLogFloatRight parUnionsPMVectaddUpscaleaddToapplyRegularizationupdateMomentum squareRootsquare