!\      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg h i j k l m n o p q r s t u v w x y z { | } ~   None"# sgdDataset stored on a disksgdJThe size of the dataset; the individual indices are [0, 1, ..., size - 1]sgd1Get the dataset element with the given identifiersgd+Lazily load the entire dataset from a disk.sgdShuffle the dataset.sgdJRandom dataset sample with a specified number of elements (loaded eagerly)sgd\Construct dataset from a list of elements, store it as a vector, and run the given handler. sgd{Construct dataset from a list of elements, store it on a disk and run the given handler. Training elements must have the % instance for this function to work.sgdYLazily evaluate each action in the sequence from left to right, and collect the results.sgd f is equivalent to  .  f.   None78>UV_fD sgd(Helper class for automatically deriving  using GHC Generics. sgd(Helper class for automatically deriving  using GHC Generics. sgd(Helper class for automatically deriving  using GHC Generics. sgd(Helper class for automatically deriving  using GHC Generics.sgd(Helper class for automatically deriving  using GHC Generics.sgd(Helper class for automatically deriving  using GHC Generics.sgdClass of types that can be treated as parameter sets. It provides basic element-wise operations (addition, multiplication, mapping) which are required to perform stochastic gradient descent. Many of the operations (, , , w, etc.) have the same interpretation and follow the same laws (e.g. associativity) as the corresponding operations in  and .  takes a parameter set as argument and "zero out"'s all its elements (as in the backprop library). This allows instances for , e, etc., where the structure of the parameter set is dynamic. This leads to the following property: add (zero x) x = x However,  does not have to obey (add (zero x) y = y).A 2 can be also seen as a (structured) vector, hence  and j. The latter is not strictly necessary to perform SGD, but it is useful to control the training process. should obey the following law:  pmap id x = xIf you leave the body of an instance declaration blank, GHC Generics will be used to derive instances if the type has a single constructor and each field is an instance of .sgdElement-wise mappingsgdZero-out all elementssgdElement-wise additionsgdElementi-wise substructionsgdElement-wise multiplicationsgdElement-wise divisionsgdL2 normsgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.sgd; using GHC Generics; works if all fields are instances of /, but only for values with single constructors.0sgdXA map with different parameter sets (of the same type) assigned to the individual keys.[When combining two maps with different sets of keys, only their intersection is preserved.1sgd6 represents a deactivated parameter set component. If ( is given as an argument to one of the  operations, the result is  as well.MThis differs from the corresponding instance in the backprop library, where  is equivalent to `Just 0`. However, the implementation below seems to correspond adequately enough to the notion that a particular component is either active or not in both the parameter set and the gradient, hence it doesn't make sense to combine  with .   NoneMJ5<sgd,Signed real value in the logarithmic domain.>sgdPositive component?sgdNegative component@sgdSmart LogSigned constructor.Asgd2Make LogSigned from a positive, log-domain number.Bsgd2Make LogSigned from a negative, log-domain number.Csgd#Shift LogSigned to a normal domain.Dsgd Change the < to either negative   or positive  . <=>?@ABCD <=>?@ABCDNoneX JsgdGradient with nonzero values stored in a logarithmic domain. Since values equal to zero have no impact on the update phase of the SGD method, it is more efficient to not to store those components in the gradient.Ksgd?Add normal-domain double to the gradient at the given position.LsgdDAdd log-domain, singed number to the gradient at the given position.Msgd~Construct gradient from a list of (index, value) pairs. All values from the list are added at respective gradient positions.NsgdConstruct gradient from a list of (index, signed, log-domain number) pairs. All values from the list are added at respective gradient positions.Osgd9Collect gradient components with values in normal domain.Psgd0Empty gradient, i.e. with all elements set to 0.QsgdKPerform parallel unions operation on gradient list. Experimental version.sgd!Parallel unions in the Par monad.JKLMNOPQJPKLMNOQNone"#fb sgd3Type synonym for mutable vector with Double values.RsgdVector of parameters.Ssgd0SGD parameters controlling the learning process.UsgdSize of the batchVsgdRegularization varianceWsgdNumber of iterationsXsgdInitial gain parameterYsgdOAfter how many iterations over the entire dataset the gain parameter is halvedZsgdDefault SGD parameter values.[sgdA stochastic gradient descent method. A notification function can be used to provide user with information about the progress of the learning.sgd8Add up all gradients and store results in normal domain.sgd$Scale the vector by the given value.sgdYApply gradient to the parameters vector, that is add the first vector to the second one.[sgdSGD parameter valuessgdNotification run every updatesgdGradient for dataset elementsgdDatasetsgdStarting pointsgd SGD result JKLMNOPQRSTUVWXYZ[ STUVWXYZR[None"#ysgd3Type synonym for mutable vector with Double values.\sgdVector of parameters.]sgd0SGD parameters controlling the learning process._sgdSize of the batch`sgdRegularization varianceasgdNumber of iterationsbsgdInitial gain parametercsgdOAfter how many iterations over the entire dataset the gain parameter is halveddsgdDefault SGD parameter values.sgd*The gamma parameter which drives momentum.TODO: put in SgdArgs.esgdA stochastic gradient descent method. A notification function can be used to provide user with information about the progress of the learning.sgd+Compute the new momentum (gradient) vector.sgd+Compute the new momentum (gradient) vector.sgd8Add up all gradients and store results in normal domain.sgd$Scale the vector by the given value.sgdYApply gradient to the parameters vector, that is add the first vector to the second one.esgdSGD parameter valuessgdNotification run every updatesgdGradient for dataset elementsgdDataSetsgdStarting pointsgd SGD resultsgdRegularization parametersgdThe parameterssgdThe current gradientsgdThe gamma parametersgdThe previous momentumsgdThe scaled current gradient JKLMNOPQ\]^_`abcde ]^_`abcd\eSafe}=fsgd]SGD is a pipe which, given the initial parameter values, consumes training elements of type eA and outputs the subsequently calculated parameter sets of type p.ff None"#7+gsgdMomentum configurationisgd-Initial step size, used to scale the gradientjsgdThe step size after k * j iterations = i / (k + 1)ksgd Momentum termlsgd Scale the jJ parameter. Useful e.g. to account for the size of the training dataset.msgd/Stochastic gradient descent with momentum. See Numeric.SGD.Momentum for more information.sgdScalingmsgdMomentum configurationsgdGradient on a training elementghkjilmghkjilm None"#7ssgdAdaDelta configurationusgdInitial step sizevsgdThe step size after k * v iterations = u / (k + 1)wsgd1st exponential moment decayxsgd1st exponential moment decayysgdEpsilonzsgd Scale the vJ parameter. Useful e.g. to account for the size of the training dataset.{sgd:Perform gradient descent using the Adam algorithm. See Numeric.SGD.Adam for more information.{sgdAdam configurationsgdGradient on a training element styvuwxz{ styvuwxz{ None"#7sgdAdaDelta configurationsgdExponential decay parametersgd Epsilon valuesgd>Perform gradient descent using the AdaDelta algorithm. See Numeric.SGD.AdaDelta for more information.sgdScalingsgd Root squaresgdSquaresgdAdaDelta configurationsgdGradient on a training elementNone"#7sgd%High-level IO-based SGD configurationsgd4Number of iteration over the entire training datasetsgdMini-batch sizesgd=The number of overlapping elements in subsequent mini-batchessgdShould the mini-batch be selected at random? If not, the subsequent training elements will be picked sequentially. Random selection gives no guarantee of seeing each training sample in every epoch.sgdHHow often the value of the objective function should be reported (with 1. meaning once per pass over the training data)sgdTraverse all the elements in the training data stream in one pass, calculate the subsequent gradients, and apply them progressively starting from the initial parameter values.Consider using # if your training dataset is large.sgd(Number of new elements in each new batchsgdbCalculate the effective number of SGD iterations (and gradient calculations) performed per epoch.sgd+Report the total objective value on stdout.sgdrValue of the objective function over the entire dataset (i.e. the sum of the objectives on all dataset elements).sgdPerform SGD in the IO monad, regularly reporting the value of the objective function on the entire dataset. A higher-level wrapper which should be convenient to use when the training dataset is large..An alternative is to use the simpler function G, or to build a custom SGD pipeline based on lower-level combinators (, , {, , , etc.).sgd2Pipe all the elements in the dataset sequentially.sgd7Pipe all the elements in the dataset in a random order.sgd=Group dataset elements into (mini-)batches of the given size.sgdEAdapt the gradient function to handle (mini-)batches. Relies on the p's 9 instance to efficiently calculate gradients in parallel.sgd A version of  with no [ constraint. Evaluates the sub-gradients calculated in parallel to weak head normal form.sgdAdapt the gradient function to handle (mini-)batches. The sub-gradients of the individual batch elements are evaluated in parallel based on the given .sgdzAdapt the gradient function to handle (mini-)batches. The function calculates the individual sub-gradients sequentially.sgdWExtract the result of the SGD calculation (the last parameter set flowing downstream).sgd Keep every k:-th element flowing downstream and discard all the others.sgd~Make the stream decreasing in the given (monadic) function by discarding elements with values higher than those already seen.sgdSelected SGD methodsgdTraining data streamsgdInitial parameterssgd Dataset sizesgd4Value of the objective function on a dataset elementsgdTraining datasetsgd4Value of the objective function on a dataset elementsgdTraining datasetsgdSGD configurationsgd3SGD pipe consuming mini-batches of dataset elementssgdFQuality reporting function (the reporting frequency is specified via )sgdTraining datasetsgdInitial parameter valuessgd+Default value (in case the stream is empty)sgdStream of parameter setsm{m{ Safe  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJJKLMNOPQRSTUVW!XYZ[\]^__`abcdef^__`abcdefg h h i d j k l m n o p q h h i d r s t k u m n o p q h h v t w m n o p qhhb`xyz{|}~mnopqj  "sgd-0.8.0.2-D30pVtfdwXEA4chvS4rHnA Numeric.SGDNumeric.SGD.DataSetNumeric.SGD.ParamSetNumeric.SGD.Sparse.LogSignedNumeric.SGD.Sparse.GradNumeric.SGD.SparseNumeric.SGD.Sparse.MomentumNumeric.SGD.TypeNumeric.SGD.MomentumNumeric.SGD.AdamNumeric.SGD.AdaDelta Paths_sgd1data-default-class-0.1.2.0-FeIQ5tLoVZBHMSgrT9zptQData.Default.ClassdefDataSetsizeelemAtloadDatashuffle randomSamplewithVectwithDiskGPMapGNorm2GDivGMulGSubGAddParamSetpmapzeroaddsubmuldivnorm_2$fGAddM1$fGAddU1$fGAddV1 $fGAdd:*:$fGSubM1$fGSubU1$fGSubV1 $fGSub:*:$fGMulM1$fGMulU1$fGMulV1 $fGMul:*:$fGDivM1$fGDivU1$fGDivV1 $fGDiv:*: $fGNorm2M1 $fGNorm2U1 $fGNorm2V1 $fGNorm2:*: $fGPMapM1 $fGPMapU1 $fGPMapV1 $fGPMap:*: $fParamSetMap$fParamSetMaybe $fParamSetL $fParamSetR $fParamSet(,)$fParamSetDouble $fGPMapK1 $fGNorm2K1$fGDivK1$fGMulK1$fGSubK1$fGAddK1 LogSignedposneg logSignedfromPosfromNegtoNorm toLogFloat$fNumLogSigned$fNFDataLogSigned$fOrdLogSigned $fEqLogSigned$fShowLogSignedGradaddLfromList fromLogListtoListempty parUnionsParaSgdArgs batchSizeregVariterNumgain0tausgdArgsDefaultsgdSGDConfigalpha0gammascaleTaumomentum$fDefaultConfig $fShowConfig $fEqConfig $fOrdConfig$fGenericConfigbeta1beta2epsadamdecayadaDelta batchOverlap batchRandom reportEveryruniterNumPerEpochreportObjective objectiveWithrunIOpipeSeqpipeRanbatch batchGradPar batchGradPar' batchGradSeqresult keepEvery decreasingBybinary-0.8.6.0Data.Binary.ClassBinary lazySequencelazyMapMbaseGHC.BasemapGHC.NumNumGHC.Real Fractional GHC.MaybeMaybecontainers-0.6.0.1Data.Map.InternalMap genericAdd genericSub genericDiv genericMul genericNorm2 genericPMapNothingJust Data.EitherLeft(logfloat-0.13.3.3-INZZl0jI1XtLqGOkSuzmnAData.Number.LogFloatLogFloatRight parUnionsPMVectaddUpscaleaddToapplyRegularizationupdateMomentum squareRootsquarebatchNewdeepseq-1.4.4.0Control.DeepSeqNFData batchGradWith'parallel-3.2.2.0-EGl5SOk48TWHAD161C93aQControl.Parallel.StrategiesStrategyversion getBinDir getLibDir getDynLibDir getDataDir getLibexecDir getSysconfDirgetDataFileName