How does padding work when using a pytorch TransformerEncoder?

Question

How does padding work when using a pytorch TransformerEncoder?

231 views Asked by Tom T. At 03 January 2025 at 13:40

I'm trying to make a TransformerEncoder work with variable length sequences. I understand I can pass a src_key_padding_mask to the forward method.

Here's some example code.

import torch
import torch.nn as nn

embedding_dim = 4
num_heads = 1
ff_dim = 16

encoder = nn.TransformerEncoderLayer(
        d_model=embedding_dim,
        nhead=num_heads,
        dim_feedforward=ff_dim,
        batch_first=True
    )

input_tensor = torch.randn(3, 6, embedding_dim)
input_tensor[0,5,:] = 0
input_tensor[0,4,:] = 0
input_tensor[1,5,:] = 0
print(f"input\n{input_tensor}")

print(f"no mask\n{encoder(input_tensor)}")

bool_src_key_padding_mask = torch.tensor(
        [[False, False, False, False, True, True],
         [False, False, False, False, False, True],
         [False, False, False, False, False, False]])

print(f"mask\n{encoder(input_tensor, src_key_padding_mask=bool_src_key_padding_mask)}")

I would expect the result of the last line to print out a tensor containing padding tokens (0 in this case), but it doesn't. I'm not sure what I'm doing wrong?

Original Q&A

There are 1 answers

**Karl** · Answer 1 · 2023-10-26T03:59:28+00:00

Karl On 26 October 2023 at 03:59

src_key_padding_mask causes the masked items to contribute nothing to the attention calculation for the other items in the sequence. It does not stop computation on the masked inputs. Such is the nature of GPUs.

TechQA.

How does padding work when using a pytorch TransformerEncoder?

There are 1 answers

Related Questions in PYTORCH

Related Questions in TRANSFORMER-MODEL

Related Questions in SELF-ATTENTION

Popular Questions

Popular Tags

Trending Questions