Creates a criterion that uses a squared term if the absolute
element-wise error falls below 1 and an L1 term otherwise.
It is less sensitive to outliers than the MSELoss and in some cases
prevents exploding gradients (e.g. see Fast R-CNN paper by Ross Girshick).
Also known as the Huber loss:
Arguments
- reduction
(string, optional): Specifies the reduction to apply to the output:
'none' | 'mean' | 'sum'. 'none': no reduction will be applied,
'mean': the sum of the output will be divided by the number of
elements in the output, 'sum': the output will be summed.
Details
where is given by:
and arbitrary shapes with a total of elements each
the sum operation still operates over all the elements, and divides by .
The division by can be avoided if sets reduction = 'sum'.
Shape
Input: where means, any number of additional
dimensions
Target: , same shape as the input
Output: scalar. If reduction is 'none', then
, same shape as the input