Access attention score when using TransformerEncoderLayer, TransformerEncoder

118 views Asked by At

My input after the following x = self.preTransformerInput(x) is of shape (2,16,4) (batch size, sequence length, embedding dimension). How can we access the attention score for each head using TransformerEncoderLayer? I do not see a flag that can be used to return the attention scores too.

import torch
import torch.nn as nn

# define input tensor
batch_size = 2
sequence_length = 4
input_size = 16  # set input size to match d_model
x = torch.randn(batch_size, sequence_length, input_size)

# define transformer encoder
d_model = 16
nhead = 4
d_hid = 64
dropout = 0.1
nlayers = 2
encoder_layers = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, dim_feedforward=d_hid, dropout=dropout)
transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers=nlayers)

# pass input through transformer encoder
output = transformer_encoder(x)

# print output shape
print(output.shape)
0

There are 0 answers