hasktorch-indef-0.0.1.0: Core Hasktorch abstractions wrapping FFI bindings

Copyright (c) Sam Stites 2017 BSD3 sam@stites.io experimental non-portable None Haskell2010

Torch.Indef.Static.NN.Criterion

Description

Synopsis

# Documentation

Arguments

 :: Tensor d input -> Tensor d' target -> Tensor d'' output -> Bool size average -> Bool reduce -> IO ()

absCriterion forward pass (updates the output tensor)

Arguments

 :: Tensor d input -> Tensor d' target -> Tensor d'' gradOutput -> Tensor d'' gradInput -> Bool size average -> Bool reduce -> IO ()

absCriterion backward-update (updates the layer and bias tensors)

Arguments

 :: (Reifies s W, KnownNat n, KnownDim n) => Bool sizeAverage (TODO: swap this out with Reduction) -> Bool reduce (TODO: swap this out with Reduction) -> Maybe (Tensor '[n]) weights -> Tensor '[n] target -> BVar s (Tensor '[n]) input -> BVar s (Tensor '[1]) output

Binary cross-entropy for Sigmoid (two-class version of ClassNLLCriterion)

Creates a criterion that measures the Binary Cross Entropy between the target and the output:  loss(o, t) = - 1/n sum_i (t[i] * log(o[i]) + (1 - t[i]) * log(1 - o[i]))  or in the case of the weights argument being specified:  loss(o, t) = - 1/n sum_i weights[i] * (t[i] * log(o[i]) + (1 - t[i]) * log(1 - o[i]))  This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the outputs o[i] should be numbers between 0 and 1, for instance, the output of an nn.Sigmoid layer and should be interpreted as the probability of predicting t[i] = 1. Note t[i] can be either 0 or 1.

By default, the losses are averaged for each minibatch over observations as well as over dimensions. However, if the field sizeAverage is set to false, the losses are instead summed.

Arguments

 :: (Reifies s W, KnownNat n, KnownDim n) => Tensor '[n] target -> BVar s (Tensor '[n]) input -> BVar s (Tensor '[1]) output

_marginCriterion_updateOutput :: Tensor d -> Tensor d -> Tensor d -> Bool -> Double -> IO () Source #

marginCriterion forward pass (updates the output tensor)

_marginCriterion_updateGradInput :: Tensor d -> Tensor d -> Tensor d -> Bool -> Double -> IO () Source #

marginCriterion backward-update (updates the layer and bias tensors)

_softMarginCriterion_updateOutput :: Tensor d -> Tensor d -> Tensor d -> Bool -> Bool -> IO () Source #

softMarginCriterion forward pass (updates the output tensor)

_softMarginCriterion_updateGradInput :: Tensor d -> Tensor d -> Tensor d -> Tensor d -> Bool -> Bool -> IO () Source #

softMarginCriterion backward-update (updates the layer and bias tensors)

Arguments

 :: Reifies s W => All Dimensions '[d', d, out] => KnownDim bs => d ~ (bs :+ d') => out ~ If reduce '[1] d => SBool size_average size_average: By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True -> SBool reduce reduce: By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True -> Tensor d target -> BVar s (Tensor d) input -> BVar s (Tensor out) loss value and arrow from output gradient to input gradient

MSECriterion

Creates a criterion that measures the mean squared error between n elements in the input x and output y:

  loss(x, y) = 1/n sum |x_i - y_i|^2 .


If x and y are d-dimensional Tensors with a total of n elements, the sum operation still operates over all the elements, and divides by n. The two Tensors must have the same number of elements (but their sizes might be different).

The division by n can be avoided if one sets the internal variable sizeAverage to false:

criterion = nn.MSECriterion() criterion.sizeAverage = false

By default, the losses are averaged over observations for each minibatch. However, if the field sizeAverage is set to false, the losses are instead summed.

Arguments

 :: All Dimensions '[d', d] => Reifies s W => KnownDim bs => d ~ (bs :+ d') => Tensor d target -> BVar s (Tensor d) input -> BVar s (Tensor '[1]) loss value and arrow from output gradient to input gradient

Arguments

 :: All Dimensions '[d', d, out] => KnownDim bs => d ~ (bs :+ d') => out ~ If reduce '[1] d => SBool size_average size_average: By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True -> SBool reduce reduce: By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True -> Tensor d target -> Tensor d input -> IO (Tensor out, Tensor out -> IO (Tensor d)) loss value and arrow from output gradient to input gradient

Arguments

 :: All Dimensions '[d', d] => KnownDim bs => d ~ (bs :+ d') => Tensor d target -> Tensor d input -> IO (Tensor '[1], Tensor '[1] -> IO (Tensor d)) loss value and arrow from output gradient to input gradient

Arguments

 :: Tensor d output tensor to update -> Tensor d input tensor -> Tensor d comparative tensor -> Bool size_average -> Bool reduce -> IO ()

The Kullback-Leibler divergence Loss

KL divergence is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions.

As with NLLLoss, the input given is expected to contain log-probabilities, however unlike ClassNLLLoss, input is not restricted to a 2D Tensor, because the criterion is applied element-wise.

This criterion expects a target Tensor of the same size as the input Tensor.

The loss can be described as:  ell(x, y) = L = {l_1,dots,l_N}^top, quad l_n = y_n odot left( log y_n - x_n right), 

where N is the batch size. If reduce is True, then: @ begin{split}ell(x, y) = begin{cases} operatorname{mean}(L), & text{if}; text{size_average} = text{True},-- operatorname{sum}(L), & text{if}; text{size_average} = text{False}.

distKLDivCriterion forward pass (updates the output tensor)

_distKLDivCriterion_updateGradInput :: Tensor d -> Tensor d -> Tensor d -> Tensor d -> Bool -> Bool -> IO () Source #

distKLDivCriterion backward-update (updates the layer and bias tensors)

_smoothL1Criterion_updateOutput :: Tensor d -> Tensor d -> Tensor d -> Bool -> Bool -> IO () Source #

smoothL1Criterion forward pass (updates the output tensor)

_smoothL1Criterion_updateGradInput :: Tensor d -> Tensor d -> Tensor d -> Tensor d -> Bool -> Bool -> IO () Source #

smoothL1Criterion backward-update (updates the layer and bias tensors)

_l1Cost_updateOutput :: Tensor d -> Tensor d -> IO () Source #

l1Cost forward pass (updates the output tensor)

_l1Cost_updateGradInput :: Tensor d -> Tensor d -> Tensor d -> IO () Source #

l1Cost backward-update (updates the layer and bias tensors)

classNLLCriterion' :: forall s i sz ps. (Reifies s W, All KnownDim '[sz, ps]) => Integer -> Bool -> Bool -> IndexTensor '[sz] -> BVar s (Tensor '[sz, ps]) -> BVar s (Tensor '[1]) Source #

ClassNLLCriterion

The negative log likelihood (NLL) criterion. It is useful to train a classification problem with n classes. If provided, the optional argument weights should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

The input given through a forward() is expected to contain log-probabilities of each class: input has to be a 1D Tensor of size n. Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftMax layer in the last layer of your neural network. You may use CrossEntropyCriterion instead, if you prefer not to add an extra layer to your network. This criterion expects a class index (1 to the number of class) as target when calling forward(input, target) and backward(input, target).

The loss can be described as:

loss(x, class) = -x[class]

or in the case of the weights argument, it is specified as follows:

loss(x, class) = -weights[class] * x[class]

or in the case of the ignoreIndex argument:

loss(x, class) = class != ignoreIndex ? -weights[class] * x[class] : 0

Indeed, the ignoreIndex (defaults to -100) specifies a value for targets to be ignored. The commensurate gradInput for that target will be zero. When sizeAverage=true (the default), the gradInput and output are averaged over non-ignored targets.

Due to the behaviour of the backend code, it is necessary to set sizeAverage to false when calculating losses in non-batch mode.

The following is a code fragment showing how to make a gradient step given an input x, a desired output y (an integer 1 to n, in this case n = 2 classes), a network mlp and a learning rate learningRate:

function gradUpdate(mlp, x, y, learningRate) local criterion = nn.ClassNLLCriterion() local pred = mlp:forward(x) local err = criterion:forward(pred, y) mlp:zeroGradParameters() local t = criterion:backward(pred, y) mlp:backward(x, t) mlp:updateParameters(learningRate) end

By default, the losses are averaged over observations for each minibatch. However, if the argument sizeAverage is set to false, the losses are instead summed for each minibatch. FIXME: add batch dimension

classNLLCriterion :: (Reifies s W, All KnownDim '[n, c]) => IndexTensor '[n] -> BVar s (Tensor '[n, c]) -> BVar s (Tensor '[1]) Source #

Due to behaviour of backend code, it is nessecary to set sizeAverage to False in Non-Batch mode.

classNLLIO :: forall sz ps. (KnownDim sz, KnownDim ps) => IndexTensor '[sz] -> Tensor '[sz, ps] -> IO (Tensor '[1], Tensor '[1] -> IO (Tensor '[sz, ps])) Source #

classNLLWithIO :: forall sz ps. (KnownDim sz, KnownDim ps) => Maybe (Tensor '[1]) -> Maybe (Tensor '[1]) -> Maybe (Tensor '[sz, ps]) -> IndexTensor '[sz] -> Tensor '[sz, ps] -> IO (Tensor '[1], Tensor '[1] -> IO (Tensor '[sz, ps])) Source #