Let’s dig deeper into torch tensors. You’ll learn:

how to create them;
how to manipulate their contents and/or modify their shapes;
how to convert them to R arrays, matrices or vectors;
and of course, given the omnipresent need for speed: how to get all those operations executed on the GPU.

Creating tensors

Tensors may be created by specifying individual values. Here we create two one-dimensional tensors (vectors), of types float and bool, respectively:

library(torch)
# a 1d vector of length 2
t <- torch_tensor(c(1, 2))
t

## torch_tensor
##  1
##  2
## [ CPUFloatType{2} ]

# also 1d, but of type boolean
t <- torch_tensor(c(TRUE, FALSE))
t

## torch_tensor
##  1
##  0
## [ CPUBoolType{2} ]

And here are two ways to create two-dimensional tensors (matrices). Note how in the second approach, you need to specify byrow = TRUE in the call to matrix() to get values arranged in row-major order.

# a 3x3 tensor (matrix)
t <- torch_tensor(rbind(c(1,2,0), c(3,0,0), c(4,5,6)))
t

## torch_tensor
##  1  2  0
##  3  0  0
##  4  5  6
## [ CPUFloatType{3,3} ]

# also 3x3
t <- torch_tensor(matrix(1:9, ncol = 3, byrow = TRUE))
t

## torch_tensor
##  1  2  3
##  4  5  6
##  7  8  9
## [ CPULongType{3,3} ]

In higher dimensions especially, it can be easier to specify the type of tensor abstractly, as in: “give me a tensor of <…> of shape n1 x n2”, where <…> could be “zeros”; or “ones”; or, say, “values drawn from a standard normal distribution”:

# a 3x3 tensor of standard-normally distributed values
t <- torch_randn(3, 3)
t

## torch_tensor
## -0.2148  0.3871 -0.6968
##  0.1063 -0.3069 -1.5072
##  0.0432 -1.5837  0.2322
## [ CPUFloatType{3,3} ]

# a 4x2x2 (3d) tensor of zeroes
t <- torch_zeros(4, 2, 2)
t

## torch_tensor
## (1,.,.) = 
##   0  0
##   0  0
## 
## (2,.,.) = 
##   0  0
##   0  0
## 
## (3,.,.) = 
##   0  0
##   0  0
## 
## (4,.,.) = 
##   0  0
##   0  0
## [ CPUFloatType{4,2,2} ]

Many similar functions exist, including, e.g., torch_arange() to create a tensor holding a sequence of evenly spaced values, torch_eye() which returns an identity matrix, and torch_logspace() which fills a specified range with a list of values spaced logarithmically.

If no dtype argument is specified, torch will infer the data type from the passed-in value(s). For example:

t <- torch_tensor(c(3, 5, 7))
t$dtype

## torch_Float

t <- torch_tensor(1L)
t$dtype

## torch_Long

But we can explicitly request a different dtype if we want:

t <- torch_tensor(2, dtype = torch_double())
t$dtype

## torch_Double

torch tensors live on a device. By default, this will be the CPU:

t$device

## torch_device(type='cpu')

But we could also define a tensor to live on the GPU:

t <- torch_tensor(2, device = "cuda")
t$device

## torch_device(type='cuda', index=0)

We’ll talk more about devices below.

There is another very important parameter to the tensor-creation functions: requires_grad. Here though, I need to ask for your patience: This one will prominently figure in the next section.

Conversion to built-in R data types

To convert torch tensors to R, use as_array():

t <- torch_tensor(matrix(1:9, ncol = 3, byrow = TRUE))
as_array(t)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Depending on whether the tensor is one-, two-, or three-dimensional, the resulting R object will be a vector, a matrix, or an array:

t <- torch_tensor(c(1, 2, 3))
as_array(t) %>% class()

## [1] "numeric"

t <- torch_ones(c(2, 2))
as_array(t) %>% class()

## [1] "matrix" "array"

t <- torch_ones(c(2, 2, 2))
as_array(t) %>% class()

## [1] "array"

For one-dimensional and two-dimensional tensors, it is also possible to use as.integer() / as.matrix(). (One reason you might want to do this is to have more self-documenting code.)

If a tensor currently lives on the GPU, you need to move it to the CPU first:

t <- torch_tensor(2, device = "cuda")
as.integer(t$cpu())

## [1] 2

Indexing and slicing tensors

Often, we want to retrieve not a complete tensor, but only some of the values it holds, or even just a single value. In these cases, we talk about slicing and indexing, respectively.

In R, these operations are 1-based, meaning that when we specify offsets, we assume for the very first element in an array to reside at offset 1. The same behavior was implemented for torch. Thus, a lot of the functionality described in this section should feel intuitive.

The way I’m organizing this section is the following. We’ll inspect the intuitive parts first, where by intuitive I mean: intuitive to the R user who has not yet worked with Python’s NumPy. Then come things which, to this user, may look more surprising, but will turn out to be pretty useful.

Indexing and slicing: the R-like part

None of these should be overly surprising:

t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))
t

## torch_tensor
##  1  2  3
##  4  5  6
## [ CPUFloatType{2,3} ]

# a single value
t[1, 1]

## torch_tensor
## 1
## [ CPUFloatType{} ]

# first row, all columns
t[1, ]

## torch_tensor
##  1
##  2
##  3
## [ CPUFloatType{3} ]

# first row, a subset of columns
t[1, 1:2]

## torch_tensor
##  1
##  2
## [ CPUFloatType{2} ]

Note how, just as in R, singleton dimensions are dropped:

t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))

# 2x3
t$size()

## [1] 2 3

# just a single row: will be returned as a vector
t[1, 1:2]$size()

## [1] 2

# a single element
t[1, 1]$size()

## integer(0)

And just like in R, you can specify drop = FALSE to keep those dimensions:

t[1, 1:2, drop = FALSE]$size()

## [1] 1 2

t[1, 1, drop = FALSE]$size()

## [1] 1 1

Indexing and slicing: What to look out for

Whereas R uses negative numbers to remove elements at specified positions, in torch negative values indicate that we start counting from the end of a tensor – with -1 pointing to its last element:

t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))

t[1, -1]

## torch_tensor
## 3
## [ CPUFloatType{} ]

t[ , -2:-1]

## torch_tensor
##  2  3
##  5  6
## [ CPUFloatType{2,2} ]

This is a feature you might know from NumPy. Same with the following.

When the slicing expression m:n is augmented by another colon and a third number – m:n:o –, we will take every oth item from the range specified by m and n:

t <- torch_tensor(1:10)
t[2:10:2]

## torch_tensor
##   2
##   4
##   6
##   8
##  10
## [ CPULongType{5} ]

Sometimes we don’t know how many dimensions a tensor has, but we do know what to do with the final dimension, or the first one. To subsume all others, we can use ..:

t <- torch_randint(-7, 7, size = c(2, 2, 2))
t

## torch_tensor
## (1,.,.) = 
##  -1 -3
##  -7  1
## 
## (2,.,.) = 
##   1 -7
##  -6 -6
## [ CPUFloatType{2,2,2} ]

t[.., 1]

## torch_tensor
## -1 -7
##  1 -6
## [ CPUFloatType{2,2} ]

t[2, ..]

## torch_tensor
##  1 -7
## -6 -6
## [ CPUFloatType{2,2} ]

Now we move on to a topic that, in practice, is just as indispensable as slicing: changing tensor shapes.

Reshaping tensors

Changes in shape can occur in two fundamentally different ways. Seeing how “reshape” really means: keep the values but modify their layout, we could either alter how they’re arranged physically, or keep the physical structure as-is and just change the “mapping” (a semantic change, as it were).

In the first case, storage will have to be allocated for two tensors, source and target, and elements will be copied from the latter to the former. In the second, physically there will be just a single tensor, referenced by two logical entities with distinct metadata.

Not surprisingly, for performance reasons, the second operation is preferred.

Zero-copy reshaping

We start with zero-copy methods, as we’ll want to use them whenever we can.

A special case often seen in practice is adding or removing a singleton dimension.

unsqueeze() adds a dimension of size 1 at a position specified by dim:

t1 <- torch_randint(low = 3, high = 7, size = c(3, 3, 3))
t1$size()

## [1] 3 3 3

t2 <- t1$unsqueeze(dim = 1)
t2$size()

## [1] 1 3 3 3

t3 <- t1$unsqueeze(dim = 2)
t3$size()

## [1] 3 1 3 3

Conversely, squeeze() removes singleton dimensions:

t4 <- t3$squeeze()
t4$size()

## [1] 3 3 3

The same could be accomplished with view(). view(), however, is much more general, in that it allows you to reshape the data to any valid dimensionality. (Valid meaning: The number of elements stays the same.)

Here we have a 3x2 tensor that is reshaped to size 2x3:

t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t1

## torch_tensor
##  1  2
##  3  4
##  5  6
## [ CPUFloatType{3,2} ]

t2 <- t1$view(c(2, 3))
t2

## torch_tensor
##  1  2  3
##  4  5  6
## [ CPUFloatType{2,3} ]

(Note how this is different from matrix transposition.)

Instead of going from two to three dimensions, we can flatten the matrix to a vector.

t4 <- t1$view(c(-1, 6))

t4$size()

## [1] 1 6

t4

## torch_tensor
##  1  2  3  4  5  6
## [ CPUFloatType{1,6} ]

In contrast to indexing operations, this does not drop dimensions.

Like we said above, operations like squeeze() or view() do not make copies. Or, put differently: The output tensor shares storage with the input tensor. We can in fact verify this ourselves:

t1$storage()$data_ptr()

## [1] "0x55cbf2def000"

t2$storage()$data_ptr()

## [1] "0x55cbf2def000"

What’s different is the storage metadata torch keeps about both tensors. Here, the relevant information is the stride:

A tensor’s stride() method tracks, for every dimension, how many elements have to be traversed to arrive at its next element (row or column, in two dimensions). For t1 above, of shape 3x2, we have to skip over 2 items to arrive at the next row. To arrive at the next column though, in every row we just have to skip a single entry:

t1$stride()

## [1] 2 1

For t2, of shape 3x2, the distance between column elements is the same, but the distance between rows is now 3:

t2$stride()

## [1] 3 1

While zero-copy operations are optimal, there are cases where they won’t work.

With view(), this can happen when a tensor was obtained via an operation – other than view() itself – that itself has already modified the stride. One example would be transpose():

t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t1

## torch_tensor
##  1  2
##  3  4
##  5  6
## [ CPUFloatType{3,2} ]

t1$stride()

## [1] 2 1

t2 <- t1$t()
t2

## torch_tensor
##  1  3  5
##  2  4  6
## [ CPUFloatType{2,3} ]

t2$stride()

## [1] 1 2

In torch lingo, tensors – like t2 – that re-use existing storage (and just read it differently), are said not to be “contiguous”¹. One way to reshape them is to use contiguous() on them before. We’ll see this in the next subsection.

Reshape with copy

In the following snippet, trying to reshape t2 using view() fails, as it already carries information indicating that the underlying data should not be read in physical order.

t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))

t2 <- t1$t()

#t2$view(6) # error!

However, if we first call contiguous() on it, a new tensor is created, which may then be (virtually) reshaped using view().²

t3 <- t2$contiguous()

t3$view(6)

## torch_tensor
##  1
##  3
##  5
##  2
##  4
##  6
## [ CPUFloatType{6} ]

Alternatively, we can use reshape(). reshape() defaults to view()-like behavior if possible; otherwise it will create a physical copy.

t2$storage()$data_ptr()

## [1] "0x55cb93d5fc40"

t4 <- t2$reshape(6)

t4$storage()$data_ptr()

## [1] "0x55cb93669e80"

Operations on tensors

Unsurprisingly, torch provides a bunch of mathematical operations on tensors. We’ve already seen some of them in action in the neural network code, and you’ll encounter lots more when you continue your torch journey. Here, we quickly take a look at the overall tensor method semantics.

Tensor methods normally return references to new objects. Here, we add to t1 a clone of itself:

t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t2 <- t1$clone()

t1$add(t2)

## torch_tensor
##   2   4
##   6   8
##  10  12
## [ CPUFloatType{3,2} ]

In this process, t1 has not been modified:

t1

## torch_tensor
##  1  2
##  3  4
##  5  6
## [ CPUFloatType{3,2} ]

Many tensor methods have variants for mutating operations. These all carry a trailing underscore:

t1$add_(t1)

## torch_tensor
##   2   4
##   6   8
##  10  12
## [ CPUFloatType{3,2} ]

# now t1 has been modified
t1

## torch_tensor
##   2   4
##   6   8
##  10  12
## [ CPUFloatType{3,2} ]

Alternatively, you can of course assign the new object to a new reference variable:

t3 <- t1$add(t1)

t3

## torch_tensor
##   4   8
##  12  16
##  20  24
## [ CPUFloatType{3,2} ]

There is one thing we need to discuss before we wrap up our introduction to tensors: How can we have all those operations executed on the GPU?

Running on GPU

To check if your GPU(s) is/are visible to torch, run

cuda_is_available()

## [1] TRUE

cuda_device_count()

## [1] 1

Tensors may be requested to live on the GPU right at creation:

device <- torch_device("cuda")

t <- torch_ones(c(2, 2), device = device)

Alternatively, they can be moved between devices at any time:

t2 <- t$cuda()
t2$device

## torch_device(type='cuda', index=0)

t3 <- t2$cpu()
t3$device

## torch_device(type='cpu')

That’s it for our discussion on tensors — almost. There is one torch feature that, although related to tensor operations, deserves special mention. It is called broadcasting, and “bilingual” (R + Python) users will know it from NumPy.

Broadcasting

We often have to perform operations on tensors with shapes that don’t match exactly.

Unsurprisingly, we can add a scalar to a tensor:

t1 <- torch_randn(c(3,5))

t1 + 22

## torch_tensor
##  21.4376  20.4072  21.0580  22.1885  23.7295
##  20.6981  21.8371  22.3340  21.7595  21.2021
##  20.7050  21.7176  21.8482  23.0839  21.1010
## [ CPUFloatType{3,5} ]

The same will work if we add tensor of size 1:

t1 <- torch_randn(c(3,5))

t1 + torch_tensor(c(22))

## torch_tensor
##  20.8751  22.3712  23.3040  19.7106  22.7791
##  21.5658  21.1157  23.7083  21.9827  23.2701
##  21.0319  22.8206  20.9895  22.3276  21.9052
## [ CPUFloatType{3,5} ]

Adding tensors of different sizes normally won’t work:

t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5,5))

#t1$add(t2) # error

However, under certain conditions, one or both tensors may be virtually expanded so both tensors line up. This behavior is what is meant by broadcasting. The way it works in torch is not just inspired by, but actually identical to that of NumPy.

The rules are:

We align array shapes, starting from the right.

Say we have two tensors, one of size 8x1x6x1, the other of size 7x1x5.

Here they are, right-aligned:

# t1, shape:     8  1  6  1
# t2, shape:        7  1  5

Starting to look from the right, the sizes along aligned axes either have to match exactly, or one of them has to be equal to 1: in which case the latter is broadcast to the larger one.

In the above example, this is the case for the second-from-last dimension. This now gives

# t1, shape:     8  1  6  1
# t2, shape:        7  6  5

, with broadcasting happening in t2.

If on the left, one of the arrays has an additional axis (or more than one), the other is virtually expanded to have a size of 1 in that place, in which case broadcasting will happen as stated in (2).

This is the case with t1’s leftmost dimension. First, there is a virtual expansion

# t1, shape:     8  1  6  1
# t2, shape:     1  7  1  5

and then, broadcasting happens:

# t1, shape:     8  1  6  1
# t2, shape:     8  7  1  5

According to these rules, our above example

t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5,5))

#t1$add(t2)

could be modified in various ways that would allow for adding two tensors.

For example, if t2 were 1x5, it would only need to get broadcast to size 3x5 before the addition operation:

t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(1,5))

t1$add(t2)

## torch_tensor
## -0.8691 -0.7508  1.0622  0.8551  0.4162
##  0.2965 -0.3144  1.3594  0.4468 -0.6887
## -1.8092  0.3860  1.1249  0.5361 -1.2814
## [ CPUFloatType{3,5} ]

If it were of size 5, a virtual leading dimension would be added, and then, the same broadcasting would take place as in the previous case.

t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5))

t1$add(t2)

## torch_tensor
## -1.8092 -0.0934  1.5282 -0.7218 -1.0960
## -0.8556  1.2954 -0.8784 -0.6804  1.1016
## -1.9484  2.3079  0.6009 -0.8610 -0.7090
## [ CPUFloatType{3,5} ]

Here is a more complex example. Broadcasting how happens both in t1 and in t2:

t1 <- torch_randn(c(1,5))
t2 <- torch_randn(c(3,1))

t1$add(t2)

## torch_tensor
##  0.5589  1.4205  0.5445  0.7654 -0.2538
## -0.0216  0.8400 -0.0360  0.1850 -0.8343
##  1.3144  2.1760  1.2999  1.5209  0.5017
## [ CPUFloatType{3,5} ]

As a nice concluding example, through broadcasting an outer product can be computed like so:

t1 <- torch_tensor(c(0, 10, 20, 30))

t2 <- torch_tensor(c(1, 2, 3))

t1$view(c(4,1)) * t2

## torch_tensor
##   0   0   0
##  10  20  30
##  20  40  60
##  30  60  90
## [ CPUFloatType{4,3} ]

That’s it for tensors – now we get back to that network, and see how we make use of torch’s automatic differentiation capabilities!

Although the assumption may be tempting, “contiguous” does not correspond to what we’d call “contiguous in memory” in casual language. ^[return]
For correctness’ sake, contiguous() will only make a copy if the tensor it is called on is not contiguous already. ^[return]

2 Tensors