As far as I understood, the RoBERTa model implemented by the huggingface library, uses BPE tokenizer. Here is the link for the documentation:
However, I have a custom tokenizer based on WordPiece tokenization and I used the BertTokenizer.
Because my customized tokenizer is much more relevant for my task, I prefer not to use BPE.
When I pre-trained the RoBERTa from scratch (RobertaForMaskedLM) with my custom tokenizer the loss for the MLM task was much better than the loss with BPE. However, when it comes to fine-tuning, the model (RobertaForSequenceClassification) perform poorly. I am almost sure the problem is not about the tokenizer. I wonder if the huggingface library for the RobertaForSequenceClassification is not compatible with my tokenizer.
Details about the fine-tuning:
task: multilabel classification with imbalanced labels.
epochs: 20
loss: BCEWithLogitsLoss()
optimizer: Adam, weight_decay_rate:0.01, lr: 2e-5, correct_bias: True
The F1 and AUC was very low because the output probabilities for the labels was not in accordance with the actual labels (even with a very low threshold) which means the model couldn't learn anything.
*
Note: The pre-trained and fine-tuned RoBERTa with BPE tokenizer performs better than the pre-trained and fine-tuned with custom tokenizer although the loss for MLM with custom tokenizer was better than BPE.