torch.cat along negative dimension

1.5k views Asked by At

In the following,

 x_6 = torch.cat((x_1, x_2_1, x_3_1, x_5_1), dim=-3)
 Sizes of tensors x_1, x_2_1, x_3_1, x_5_1 are
 torch.Size([1, 256, 7, 7])
 torch.Size([1, 256, 7, 7]) 
 torch.Size([1, 256, 7, 7])
 torch.Size([1, 256, 7, 7]) respectively.
        
 The size of x_6 turns out to be torch.Size([1, 1024, 7, 7])

I couldn't understand & visualise this concatenation along a negative dimension(-3 in this case). What exactly is happening here? How does the same go if dim = 3? Is there any constraint on dim for a given set of tensors?

2

There are 2 answers

0
kmario23 On BEST ANSWER

The answer by danin is not completely correct, actually wrong when looked from the perspective of tensor algebra, since the answer indicates that the problem has to do with accessing or indexing a Python list. It isn't.

The -3 means that we concatenate the tensors along the 2nd dimension. (you could've very well used 1 instead of the confusing -3).


From taking a closer look at the tensor shapes, it seems that they represent (b, c, h, w) where b stands for batch_size, c stands for number of channels, h stands for height and w stands for width.

This is usually the case, somewhere at the final stages of encoding (possibly) images in a deep neural network and we arrive at these feature maps.

The torch.cat() operation with dim=-3 is meant to say that we concatenate these 4 tensors along the dimension of channels c (see above).

4 * 256 => 1024

Hence, the resultant tensor ends up with a shape torch.Size([1, 1024, 7, 7]).


Notes: It is hard to visualize a 4 dimensional space since we humans live in an inherently 3D world. Nevertheless, here are some answers that I wrote a while ago which will help to get some mental picture.

0
danin On

Python provides negative indexing, so you can access elements starting from the end of the list e.g, -1 is the last element of a list. In this case the tensor has 4 dimensions, so -3 is actually the 2nd element.