How to add new token to T5 tokenizer which uses sentencepieace

Question

2.2k views Asked by Ahmad At 21 April 2021 at 09:57

I train the t5 transformer which is based on tensorflow at the following link:

Here is a sample (input, output):

input:

b'[atomic]:<subject>PersonX plays a ___ in the war</subject><relation>oReact</relation>'

output:

<object>none</object>

However, for the prediction I get:

 ⁇ object>none ⁇ /object>

which replaces < with ??, what should I do to resolve this problem?

Update: I found that strangely < is out of vocabulary for t5 tokenizer, which is sentencepiece, I just don't know how to add it

There are 1 answers

**Arij Aladel** · Answer 1 · 2021-06-20T14:32:56+00:00

Arij Aladel On 20 June 2021 at 14:32

To my knowledge, you can add new tokens using the Tokenizer.add_tokens(). More details can be found at huggingface here