Note: This is an R port of the official tutorial available here. All credits goes to Justin Johnson.

Sometimes you will want to specify models that are more complex than a sequence of existing Modules; for these cases you can define your own Modules by using nn_module function and defining a forward which receives input Tensors and produces output Tensors using other modules or other autograd operations on Tensors.

In this example we implement our two-layer network as a custom Module subclass:

two_layer_net <- nn_module(
   initialize = function(D_in, H, D_out) {
      self$linear1 <- nn_linear(D_in, H)
      self$linear2 <- nn_linear(H, D_out)
   forward = function(x) {
      x %>% 
         self$linear1() %>% 
         nnf_relu() %>% 

if (cuda_is_available()) {
   device <- torch_device("cuda")
} else {
   device <- torch_device("cpu")
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N <- 64
D_in <- 1000
H <- 100
D_out <- 10

# Create random input and output data
# Setting requires_grad=FALSE (the default) indicates that we do not need to 
# compute gradients with respect to these Tensors during the backward pass.
x <- torch_randn(N, D_in, device=device)
y <- torch_randn(N, D_out, device=device)

# Construct our model by instantiating the class defined above
model <- two_layer_net(D_in, H, D_out)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn <- nnf_mse_loss

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algorithms. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate <- 1e-4
optimizer <- optim_sgd(model$parameters, lr=learning_rate)

for (t in seq_len(500)) {
   # Forward pass: compute predicted y by passing x to the model. Module objects
   # can be called like functions. When doing so you pass a Tensor of input
   # data to the Module and it produces a Tensor of output data.
   y_pred <- model(x)
   # Compute and print loss. We pass Tensors containing the predicted and true
   # values of y, and the loss function returns a Tensor containing the
   # loss.
   loss <- loss_fn(y_pred, y)
   if (t %% 100 == 0 || t == 1)
      cat("Step:", t, ":", as.numeric(loss), "\n")
   # Before the backward pass, use the optimizer object to zero all of the
   # gradients for the variables it will update (which are the learnable
   # weights of the model). This is because by default, gradients are
   # accumulated in buffers( i.e, not overwritten) whenever $backward()
   # is called. Checkout docs of `autograd_backward` for more details.

   # Backward pass: compute gradient of the loss with respect to model
   # parameters

   # Calling the step function on an Optimizer makes an update to its
   # parameters
#> Step: 1 : 1.057115 
#> Step: 100 : 1.045371 
#> Step: 200 : 1.03376 
#> Step: 300 : 1.022367 
#> Step: 400 : 1.011174 
#> Step: 500 : 1.000201

In the next example we will about dynamic graphs in torch.