The Kullback-Leibler divergence loss measure Kullback-Leibler divergence is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions.
Arguments
- reduction
(string, optional): Specifies the reduction to apply to the output:
'none'
|'batchmean'
|'sum'
|'mean'
.'none'
: no reduction will be applied.'batchmean'
: the sum of the output will be divided by batchsize.'sum'
: the output will be summed.'mean'
: the output will be divided by the number of elements in the output. Default:'mean'
Details
As with nn_nll_loss()
, the input
given is expected to contain
log-probabilities and is not restricted to a 2D Tensor.
The targets are interpreted as probabilities by default, but could be considered
as log-probabilities with log_target
set to TRUE
.
This criterion expects a target
Tensor
of the same size as the
input
Tensor
.
The unreduced (i.e. with reduction
set to 'none'
) loss can be described
as:
$$ l(x,y) = L = \{ l_1,\dots,l_N \}, \quad l_n = y_n \cdot \left( \log y_n - x_n \right) $$
where the index \(N\) spans all dimensions of input
and \(L\) has the same
shape as input
. If reduction
is not 'none'
(default 'mean'
), then:
$$ \ell(x, y) = \begin{array}{ll} \mbox{mean}(L), & \mbox{if reduction} = \mbox{'mean';} \\ \mbox{sum}(L), & \mbox{if reduction} = \mbox{'sum'.} \end{array} $$
In default reduction
mode 'mean'
, the losses are averaged for each minibatch
over observations as well as over dimensions. 'batchmean'
mode gives the
correct KL divergence where losses are averaged over batch dimension only.
'mean'
mode's behavior will be changed to the same as 'batchmean'
in the next
major release.