Note: This is an R port of the official tutorial available here. All credits goes to Justin Johnson.

A fully-connected ReLU network with one hidden layer and no biases, trained to predict y from x using Euclidean error.

This implementation uses pure R to manually compute the forward pass, loss, and backward pass.

An R array is a generic n-dimensional array; it does not know anything about deep learning or gradients or computational graphs, and is just a way to perform generic numeric computations.

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N <- 64
D_in <- 1000
H <- 100
D_out <- 10

# Create random input and output data
x <- array(rnorm(N*D_in), dim = c(N, D_in))
y <- array(rnorm(N*D_out), dim = c(N, D_out))

# Randomly initialize weights
w1 <- array(rnorm(D_in*H), dim = c(D_in, H))
w2 <- array(rnorm(H*D_out), dim = c(H, D_out))

learning_rate <- 1e-6
for (t in seq_len(500)) {
   # Forward pass: compute predicted y
   h <- x %*% w1
   h_relu <- ifelse(h < 0, 0, h)
   y_pred <- h_relu %*% w2
   
   # Compute and print loss
   loss <- sum((y_pred - y)^2)
   if (t %% 100 == 0 || t == 1)
      cat("Step:", t, ":", loss, "\n")
   
   # Backprop to compute gradients of w1 and w2 with respect to loss
   grad_y_pred <- 2 * (y_pred - y)
   grad_w2 <- t(h_relu) %*% grad_y_pred
   grad_h_relu <- grad_y_pred %*% t(w2)
   grad_h <- grad_h_relu
   grad_h[h < 0] <- 0
   grad_w1 <- t(x) %*% grad_h
   
   # Update weights
   w1 <- w1 - learning_rate * grad_w1
   w2 <- w2 - learning_rate * grad_w2
}
#> Step: 1 : 24544741 
#> Step: 100 : 548.1007 
#> Step: 200 : 2.193206 
#> Step: 300 : 0.01385914 
#> Step: 400 : 0.0001005609 
#> Step: 500 : 7.614407e-07

In the next example we will replace the R array for a torch Tensor.