I am in the early stages of learning Pytorch for deep learning and have come across something I don't understand. I have written a very simple script to just make sure I fully understand the broadcasting mechanism, but I am getting an error that I find confusing.
import torch
X = torch.tensor([[1,5,2,7],[8,2,5,3]])
Y = torch.tensor([[2,9],[11,4],[9,2],[22,7]])
print(X.shape, Y.shape)
outputs
>>> torch.Size([2, 4]) torch.Size([4, 2])
But when I try to execute a basic mathematical operation on these tensors, where I would expect the broadcasting mechanism to bring them to the same size, I get the following error.
print(X + Y)
outputs
RuntimeError Traceback (most recent call last)
<ipython-input-7-e4a642f73c42> in <cell line: 1>()
----> 1 X + Y
RuntimeError: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 1
All the explanations I have seen say that the matrices simply need to be compatible for matrix multiplication. which to my knowledge in this case they are.
X = 2x4 Y = 4x2
The amount of rows to amount of columns are the same so I don't understand the error.
First of all, in PyTorch you do need to use
matmul()for matrix multiplication. (I assume you are talking about multiplication even though your example uses+)Output:
Second, this has nothing to do with broadcasting. Broadcasting is when you you have an operation that requires two tensors to be of compatible shape (usually the same) and they are not but it is possible to broadcast one of them to an equivalent shape so they are compatible.
An example from the broadcasting documentation:
Output:
Another example would be adding an additional outer dimension to X in your example:
But
matmul()still works as Y is broadcasted to a (1,4,2) tensor (by prepending the so called batch-dimension) leading to a (1,2,2) tensor:Output: