I train the t5
transformer which is based on tensorflow
at the following link:
https://github.com/google-research/text-to-text-transfer-transformer
Here is a sample (input, output):
input:
b'[atomic]:<subject>PersonX plays a ___ in the war</subject><relation>oReact</relation>'
output:
<object>none</object>
However, for the prediction I get:
⁇ object>none ⁇ /object>
which replaces <
with ??
, what should I do to resolve this problem?
Update: I found that strangely <
is out of vocabulary for t5 tokenizer, which is sentencepiece
, I just don't know how to add it
To my knowledge, you can add new tokens using the Tokenizer.add_tokens(). More details can be found at huggingface here