úÎ!ÀXµ˜ž      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef g h i j k l m n o p q r s t u v w x y z { | } ~  € ‚ ƒ „ … † ‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œ None"# sgdDataset stored on a disksgdJThe size of the dataset; the individual indices are [0, 1, ..., size - 1]sgd1Get the dataset element with the given identifiersgd+Lazily load the entire dataset from a disk.sgdShuffle the dataset.sgdJRandom dataset sample with a specified number of elements (loaded eagerly)sgd\Construct dataset from a list of elements, store it as a vector, and run the given handler. sgd{Construct dataset from a list of elements, store it on a disk and run the given handler. Training elements must have the ž% instance for this function to work.ŸsgdYLazily evaluate each action in the sequence from left to right, and collect the results. sgd  f is equivalent to Ÿ . ¡ f.   None78>UV_fC» sgd(Helper class for automatically deriving  using GHC Generics. sgd(Helper class for automatically deriving  using GHC Generics. sgd(Helper class for automatically deriving  using GHC Generics. sgd(Helper class for automatically deriving  using GHC Generics.sgd(Helper class for automatically deriving  using GHC Generics.sgd(Helper class for automatically deriving  using GHC Generics.sgdÜClass of types that can be treated as parameter sets. It provides basic element-wise operations (addition, multiplication, mapping) which are required to perform stochastic gradient descent. Many of the operations (, , , w, etc.) have the same interpretation and follow the same laws (e.g. associativity) as the corresponding operations in ¢ and £.  takes a parameter set as argument and "zero out"'s all its elements (as in the backprop library). This allows instances for ¤, ¥e, etc., where the structure of the parameter set is dynamic. This leads to the following property: add (zero x) x = x However,  does not have to obey (add (zero x) y = y).A 2 can be also seen as a (structured) vector, hence  and j. The latter is not strictly necessary to perform SGD, but it is useful to control the training process. should obey the following law:  pmap id x = x®If you leave the body of an instance declaration blank, GHC Generics will be used to derive instances if the type has a single constructor and each field is an instance of .sgdElement-wise mappingsgdZero-out all elementssgdElement-wise additionsgdElementi-wise substructionsgdElement-wise multiplicationsgdElement-wise divisionsgdL2 norm¦sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.§sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.¨sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.©sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.ªsgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.«sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.0sgdXA map with different parameter sets (of the same type) assigned to the individual keys.[When combining two maps with different sets of keys, only their intersection is preserved.1sgd¬6 represents a deactivated parameter set component. If ¬( is given as an argument to one of the  operations, the result is ¬ as well.MThis differs from the corresponding instance in the backprop library, where ¬ý is equivalent to `Just 0`. However, the implementation below seems to correspond adequately enough to the notion that a particular component is either active or not in both the parameter set and the gradient, hence it doesn't make sense to combine ­ with ¬.   NoneMIÚ;sgd,Signed real value in the logarithmic domain.=sgdPositive component>sgdNegative component?sgdSmart LogSigned constructor.@sgd2Make LogSigned from a positive, log-domain number.Asgd2Make LogSigned from a negative, log-domain number.Bsgd#Shift LogSigned to a normal domain.Csgd Change the ; to either negative ® ¯ or positive ° ¯. ;<=>?@ABC ;<=>?@ABCNoneX& Isgd×Gradient with nonzero values stored in a logarithmic domain. Since values equal to zero have no impact on the update phase of the SGD method, it is more efficient to not to store those components in the gradient.Jsgd?Add normal-domain double to the gradient at the given position.KsgdDAdd log-domain, singed number to the gradient at the given position.Lsgd~Construct gradient from a list of (index, value) pairs. All values from the list are added at respective gradient positions.Msgd“Construct gradient from a list of (index, signed, log-domain number) pairs. All values from the list are added at respective gradient positions.Nsgd9Collect gradient components with values in normal domain.Osgd0Empty gradient, i.e. with all elements set to 0.PsgdKPerform parallel unions operation on gradient list. Experimental version.±sgd!Parallel unions in the Par monad.IJKLMNOPIOJKLMNPNone"#f ²sgd3Type synonym for mutable vector with Double values.QsgdVector of parameters.Rsgd0SGD parameters controlling the learning process.TsgdSize of the batchUsgdRegularization varianceVsgdNumber of iterationsWsgdInitial gain parameterXsgdOAfter how many iterations over the entire dataset the gain parameter is halvedYsgdDefault SGD parameter values.ZsgdA stochastic gradient descent method. A notification function can be used to provide user with information about the progress of the learning.³sgd8Add up all gradients and store results in normal domain.´sgd$Scale the vector by the given value.µsgdYApply gradient to the parameters vector, that is add the first vector to the second one.ZsgdSGD parameter valuessgdNotification run every updatesgdGradient for dataset elementsgdDatasetsgdStarting pointsgd SGD result IJKLMNOPQRSTUVWXYZ RSTUVWXYQZNone"#yp¶sgd3Type synonym for mutable vector with Double values.[sgdVector of parameters.\sgd0SGD parameters controlling the learning process.^sgdSize of the batch_sgdRegularization variance`sgdNumber of iterationsasgdInitial gain parameterbsgdOAfter how many iterations over the entire dataset the gain parameter is halvedcsgdDefault SGD parameter values.·sgd*The gamma parameter which drives momentum.TODO: put in SgdArgs.dsgdA stochastic gradient descent method. A notification function can be used to provide user with information about the progress of the learning.¸sgd+Compute the new momentum (gradient) vector.¹sgd+Compute the new momentum (gradient) vector.ºsgd8Add up all gradients and store results in normal domain.»sgd$Scale the vector by the given value.¼sgdYApply gradient to the parameters vector, that is add the first vector to the second one.dsgdSGD parameter valuessgdNotification run every updatesgdGradient for dataset elementsgdDataSetsgdStarting pointsgd SGD result¸sgdRegularization parametersgdThe parameterssgdThe current gradient¹sgdThe gamma parametersgdThe previous momentumsgdThe scaled current gradient IJKLMNOP[\]^_`abcd \]^_`abc[dSafe|âesgd]SGD is a pipe which, given the initial parameter values, consumes training elements of type eA and outputs the subsequently calculated parameter sets of type p.ee None"#7‚§fsgdMomentum configurationhsgd2Initial gain parameter, used to scale the gradientisgdAAfter how many gradient calculations the gain parameter is halvedjsgd Momentum termksgd/Stochastic gradient descent with momentum. See Numeric.SGD.Momentum for more information.½sgdScalingksgdMomentum configurationsgdGradient on a training elementfgjhikfgjhik None"#7‡ qsgdAdaDelta configurationssgd Step sizetsgd1st exponential moment decayusgd1st exponential moment decayvsgdEpsilonwsgd:Perform gradient descent using the Adam algorithm. See Numeric.SGD.Adam for more information.wsgdAdam configurationsgdGradient on a training elementqrvstuwqrvstuw None"#7ŒÇ}sgdAdaDelta configurationsgdExponential decay parameter€sgd Epsilon valuesgd>Perform gradient descent using the AdaDelta algorithm. See Numeric.SGD.AdaDelta for more information.¾sgdScaling¿sgd Root squareÀsgdSquaresgdAdaDelta configurationsgdGradient on a training element}~€}~€None"#7´ä‡sgd%High-level IO-based SGD configuration‰sgd4Number of iteration over the entire training datasetŠsgdMini-batch size‹sgd=The number of overlapping elements in subsequent mini-batchesŒsgdÉShould the mini-batch be selected at random? If not, the subsequent training elements will be picked sequentially. Random selection gives no guarantee of seeing each training sample in every epoch.sgdHHow often the value of the objective function should be reported (with 1. meaning once per pass over the training data)Žsgd±Traverse all the elements in the training data stream in one pass, calculate the subsequent gradients, and apply them progressively starting from the initial parameter values.Consider using # if your training dataset is large.sgdËPerform SGD in the IO monad, regularly reporting the value of the objective function on the entire dataset. A higher-level wrapper which should be convenient to use when the training dataset is large..An alternative is to use the simpler function ŽG, or to build a custom SGD pipeline based on lower-level combinators (, ’, w, —, –, etc.).sgd2Pipe all the elements in the dataset sequentially.‘sgd7Pipe all the elements in the dataset in a random order.’sgd=Group dataset elements into (mini-)batches of the given size.“sgdEAdapt the gradient function to handle (mini-)batches. Relies on the p's Á9 instance to efficiently calculate gradients in parallel.”sgd A version of “ with no Á[ constraint. Evaluates the sub-gradients calculated in parallel to weak head normal form.Âsgd™Adapt the gradient function to handle (mini-)batches. The sub-gradients of the individual batch elements are evaluated in parallel based on the given Ã.•sgdzAdapt the gradient function to handle (mini-)batches. The function calculates the individual sub-gradients sequentially.–sgdWExtract the result of the SGD calculation (the last parameter set flowing downstream).—sgd Keep every k:-th element flowing downstream and discard all the others.˜sgd~Make the stream decreasing in the given (monadic) function by discarding elements with values higher than those already seen.ŽsgdSelected SGD methodsgdTraining data streamsgdInitial parameterssgdSGD configurationsgd3SGD pipe consuming mini-batches of dataset elementssgdXValue of the objective function on a dataset element (used for model quality reporting)sgdTraining datasetsgdInitial parameter values–sgd+Default value (in case the stream is empty)sgdStream of parameter setskw‡ˆŠ‰‹ŒŽ‘’“”•–—˜kwއˆŠ‰‹Œ‘’•“”–—˜Ä  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHHIJKLMNOPQRSTU VWXYZ[\]]^_`abcd\]]^_`abcde f f a b g h i j k l m f f n o p q r i j k l m f f s q t i j k l mff`^uvwxyz{|}~€‚ijklmƒ„…†‡ˆ‰Šˆ‹ŒˆŽˆ‘’“”•–—˜™ˆšˆ›ˆœžŸ ˆœ¡¢£¤¥¦£g§¨¤¥¦ ¥ ¥ © ª«¬­®¯°±²"sgd-0.7.0.0-IOKixKFbwZk8VD8ulGdPKR Numeric.SGDNumeric.SGD.DataSetNumeric.SGD.ParamSetNumeric.SGD.Sparse.LogSignedNumeric.SGD.Sparse.GradNumeric.SGD.SparseNumeric.SGD.Sparse.MomentumNumeric.SGD.TypeNumeric.SGD.MomentumNumeric.SGD.AdamNumeric.SGD.AdaDelta1data-default-class-0.1.2.0-FeIQ5tLoVZBHMSgrT9zptQData.Default.ClassdefDataSetsizeelemAtloadDatashuffle randomSamplewithVectwithDiskGPMapGNorm2GDivGMulGSubGAddParamSetpmapzeroaddsubmuldivnorm_2$fGAddM1$fGAddU1$fGAddV1 $fGAdd:*:$fGSubM1$fGSubU1$fGSubV1 $fGSub:*:$fGMulM1$fGMulU1$fGMulV1 $fGMul:*:$fGDivM1$fGDivU1$fGDivV1 $fGDiv:*: $fGNorm2M1 $fGNorm2U1 $fGNorm2V1 $fGNorm2:*: $fGPMapM1 $fGPMapU1 $fGPMapV1 $fGPMap:*: $fParamSetMap$fParamSetMaybe $fParamSetL $fParamSetR$fParamSetDouble $fGPMapK1 $fGNorm2K1$fGDivK1$fGMulK1$fGSubK1$fGAddK1 LogSignedposneg logSignedfromPosfromNegtoNorm toLogFloat$fNumLogSigned$fNFDataLogSigned$fOrdLogSigned $fEqLogSigned$fShowLogSignedGradaddLfromList fromLogListtoListempty parUnionsParaSgdArgs batchSizeregVariterNumgain0tausgdArgsDefaultsgdSGDConfiggammamomentum$fDefaultConfig $fShowConfig $fEqConfig $fOrdConfig$fGenericConfigalphabeta1beta2epsadamdecayadaDelta batchOverlap batchRandom reportEveryrunrunIOpipeSeqpipeRanbatch batchGradPar batchGradPar' batchGradSeqresult keepEvery decreasingBybinary-0.8.6.0Data.Binary.ClassBinary lazySequencelazyMapMbaseGHC.BasemapGHC.NumNumGHC.Real Fractional GHC.MaybeMaybecontainers-0.6.0.1Data.Map.InternalMap genericAdd genericSub genericDiv genericMul genericNorm2 genericPMapNothingJust Data.EitherLeft(logfloat-0.13.3.3-INZZl0jI1XtLqGOkSuzmnAData.Number.LogFloatLogFloatRight parUnionsPMVectaddUpscaleaddToapplyRegularizationupdateMomentum squareRootsquaredeepseq-1.4.4.0Control.DeepSeqNFData batchGradWith'parallel-3.2.2.0-EGl5SOk48TWHAD161C93aQControl.Parallel.StrategiesStrategy