My input after the following x = self.preTransformerInput(x)
is of shape (2,16,4) (batch size, sequence length, embedding dimension). How can we access the attention score for each head using TransformerEncoderLayer
? I do not see a flag that can be used to return the attention scores too.
import torch
import torch.nn as nn
# define input tensor
batch_size = 2
sequence_length = 4
input_size = 16 # set input size to match d_model
x = torch.randn(batch_size, sequence_length, input_size)
# define transformer encoder
d_model = 16
nhead = 4
d_hid = 64
dropout = 0.1
nlayers = 2
encoder_layers = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, dim_feedforward=d_hid, dropout=dropout)
transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers=nlayers)
# pass input through transformer encoder
output = transformer_encoder(x)
# print output shape
print(output.shape)