Note: This is an R port of the official tutorial available here. All credits goes to Justin Johnson.

library(torch)

Sometimes you will want to specify models that are more complex than a sequence of existing Modules; for these cases you can define your own Modules by using nn_module function and defining a forward which receives input Tensors and produces output Tensors using other modules or other autograd operations on Tensors.

In this example we implement our two-layer network as a custom Module subclass:

two_layer_net <- nn_module(
"two_layer_net",
initialize = function(D_in, H, D_out) {
self$linear1 <- nn_linear(D_in, H) self$linear2 <- nn_linear(H, D_out)
},
forward = function(x) {
x %>%
self$linear1() %>% nnf_relu() %>% self$linear2()
}
)

if (cuda_is_available()) {
device <- torch_device("cuda")
} else {
device <- torch_device("cpu")
}

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N <- 64
D_in <- 1000
H <- 100
D_out <- 10

# Create random input and output data
# Setting requires_grad=FALSE (the default) indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x <- torch_randn(N, D_in, device=device)
y <- torch_randn(N, D_out, device=device)

# Construct our model by instantiating the class defined above
model <- two_layer_net(D_in, H, D_out)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn <- nnf_mse_loss

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algorithms. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate <- 1e-4
optimizer <- optim_sgd(model$parameters, lr=learning_rate) for (t in seq_len(500)) { # Forward pass: compute predicted y by passing x to the model. Module objects # can be called like functions. When doing so you pass a Tensor of input # data to the Module and it produces a Tensor of output data. y_pred <- model(x) # Compute and print loss. We pass Tensors containing the predicted and true # values of y, and the loss function returns a Tensor containing the # loss. loss <- loss_fn(y_pred, y) if (t %% 100 == 0 || t == 1) cat("Step:", t, ":", as.numeric(loss), "\n") # Before the backward pass, use the optimizer object to zero all of the # gradients for the variables it will update (which are the learnable # weights of the model). This is because by default, gradients are # accumulated in buffers( i.e, not overwritten) whenever$backward()
# is called. Checkout docs of autograd_backward for more details.
optimizer$zero_grad() # Backward pass: compute gradient of the loss with respect to model # parameters loss$backward()

# Calling the step function on an Optimizer makes an update to its
# parameters
optimizer\$step()
}
#> Step: 1 : 1.057115
#> Step: 100 : 1.045371
#> Step: 200 : 1.03376
#> Step: 300 : 1.022367
#> Step: 400 : 1.011174
#> Step: 500 : 1.000201

In the next example we will about dynamic graphs in torch.