Adding operations to autograd requires implementing a new
autograd_function
for each operation. Recall that
autograd_functions
s are what autograd
uses to
compute the results and gradients, and encode the operation history.
Every new function requires you to implement 2 methods:
forward()
- the code that performs the operation. It can take as many arguments as you want, with some of them being optional, if you specify the default values. All kinds of R objects are accepted here. Tensor arguments that track history (i.e., withrequires_grad=TRUE
) will be converted to ones that don’t track history before the call, and their use will be registered in the graph. Note that this logic won’t traverse lists or any other data structures and will only consider Tensor’s that are direct arguments to the call. You can return either a single Tensor output, or a list ofTensor
s if there are multiple outputs. Also, please refer to the docs ofautograd_function
to find descriptions of useful methods that can be called only fromforward()
.backward()
- gradient formula. It will be given as many Tensor arguments as there were outputs, with each of them representing gradient w.r.t. that output. It should return as manyTensor
s as there wereTensor's
that required gradients inforward
, with each of them containing the gradient w.r.t. its corresponding input.
Note
It’s the user’s responsibility to use the special functions in the
forward’s ctx
properly in order to ensure that the new
autograd_function
works properly with the autograd
engine.
save_for_backward()
must be used when saving input or ouput of the forward to be used later in the backward.mark_dirty()
must be used to mark any input that is modified inplace by the forward function.mark_non_differentiable()
must be used to tell the engine if an output is not differentiable.
Examples
Below you can find code for a linear function:
linear <- autograd_function(
forward = function(ctx, input, weight, bias = NULL) {
ctx$save_for_backward(input = input, weight = weight, bias = bias)
output <- input$mm(weight$t())
if (!is.null(bias))
output <- output + bias$unsqueeze(0)$expand_as(output)
output
},
backward = function(ctx, grad_output) {
s <- ctx$saved_variables
grads <- list(
input = NULL,
weight = NULL,
bias = NULL
)
if (ctx$needs_input_grad$input)
grads$input <- grad_output$mm(s$weight)
if (ctx$needs_input_grad$weight)
grads$weight <- grad_output$t()$mm(s$input)
if (!is.null(s$bias) && ctx$needs_input_grad$bias)
grads$bias <- grad_output$sum(dim = 0)
grads
}
)
Here, we give an additional example of a function that is parametrized by non-Tensor arguments:
mul_constant <- autograd_function(
forward = function(ctx, tensor, constant) {
ctx$save_for_backward(constant = constant)
tensor * constant
},
backward = function(ctx, grad_output) {
v <- ctx$saved_variables
list(
tensor = grad_output * v$constant
)
}
)
x <- torch_tensor(1, requires_grad = TRUE)
o <- mul_constant(x, 2)
o$backward()
x$grad
#> torch_tensor
#> 2
#> [ CPUFloatType{1} ]