how does the BertModel know to skip attention_mask argument when applied to a single sentence?

Question

how does the BertModel know to skip attention_mask argument when applied to a single sentence?

336 views Asked by bhomass At 12 October 2020 at 14:03

I am creating a class which can generate sentence embedding for both a single sentence and a list of sentences using pretrained BertModel. From a sample code, I see the statement

outputs = self.model(tokens_tensor, segments_tensors)

which is without the attention_mask argument. Yet it produces the same result if I do input the attention mask tensor argument

outputs = self.model(tokens_tensor, attention_tensors, segments_tensors)

When running the code for an entire dataset, then the attention_tensors is absolutely needed.

I understand the reason for not needing attention mask for a single sentence, but how does the python code know the second argument is actually segments_tensor, since in the document, it is expecting attention_tensors to be the second argument.

https://huggingface.co/transformers/model_doc/bert.html

Original Q&A