I am creating a class which can generate sentence embedding for both a single sentence and a list of sentences using pretrained BertModel. From a sample code, I see the statement
outputs = self.model(tokens_tensor, segments_tensors)
which is without the attention_mask argument. Yet it produces the same result if I do input the attention mask tensor argument
outputs = self.model(tokens_tensor, attention_tensors, segments_tensors)
When running the code for an entire dataset, then the attention_tensors is absolutely needed.
I understand the reason for not needing attention mask for a single sentence, but how does the python code know the second argument is actually segments_tensor, since in the document, it is expecting attention_tensors to be the second argument.
If the
attention_mask
is not set (and is thusNone
), is explicitly set to ones everywhere.See l. 803 in
modeling_bert.py
.