I am following the tutorial https://www.depends-on-the-definition.com/named-entity-recognition-with-bert/ to do Named Entity Recognition with BERT.
While fine-tuning, before feeding the tokens to the model, the author does:
input_ids = pad_sequences([tokenizer.convert_tokens_to_ids(txt) for txt in tokenized_texts],
maxlen=MAX_LEN, dtype="long", value=0.0,
truncating="post", padding="post")
According to my tests, this doesn't add special tokens to the ids. So am I missing something or i it not always necessary to include [CLS] (101) [SEP] (102)?
I'm also following this tutorial. It worked for me without adding these tokens, however, I found in another tutorial (https://vamvas.ch/bert-for-ner) that it is better to add them, because the model was trained in this format.
[Update] Actually just checked it, it turned out that the accuracy improved by 20% after adding the tokens. But note that I am using it on a different dataset