The Kullback-Leibler divergence loss measure Kullback-Leibler divergence is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions.

## Arguments

- reduction
(string, optional): Specifies the reduction to apply to the output:

`'none'`

|`'batchmean'`

|`'sum'`

|`'mean'`

.`'none'`

: no reduction will be applied.`'batchmean'`

: the sum of the output will be divided by batchsize.`'sum'`

: the output will be summed.`'mean'`

: the output will be divided by the number of elements in the output. Default:`'mean'`

## Details

As with `nn_nll_loss()`

, the `input`

given is expected to contain
*log-probabilities* and is not restricted to a 2D Tensor.

The targets are interpreted as *probabilities* by default, but could be considered
as *log-probabilities* with `log_target`

set to `TRUE`

.

This criterion expects a `target`

`Tensor`

of the same size as the
`input`

`Tensor`

.

The unreduced (i.e. with `reduction`

set to `'none'`

) loss can be described
as:

$$ l(x,y) = L = \{ l_1,\dots,l_N \}, \quad l_n = y_n \cdot \left( \log y_n - x_n \right) $$

where the index \(N\) spans all dimensions of `input`

and \(L\) has the same
shape as `input`

. If `reduction`

is not `'none'`

(default `'mean'`

), then:

$$ \ell(x, y) = \begin{array}{ll} \mbox{mean}(L), & \mbox{if reduction} = \mbox{'mean';} \\ \mbox{sum}(L), & \mbox{if reduction} = \mbox{'sum'.} \end{array} $$

In default `reduction`

mode `'mean'`

, the losses are averaged for each minibatch
over observations **as well as** over dimensions. `'batchmean'`

mode gives the
correct KL divergence where losses are averaged over batch dimension only.
`'mean'`

mode's behavior will be changed to the same as `'batchmean'`

in the next
major release.