I would like to append words to the vocabulary created by tft.vocabulary
that are not a part of the training samples (i.e. <mask>
and <pad>
tokens).
I see in the docs that the tft.vocabulary
function can take an argument key_fn
which the docs says:
Supply key_fn if you would like to generate a vocabulary with coverage over specific keys.
but with the key_fn below it still does not append the <mask>
and <pad>
tokens to the vocabulary.
def _key_fn(x):
return tf.constant(['<mask>', '<pad>'])
vocab = tft.vocabulary(
words,
key_fn = lambda x : _key_fn(x),
top_k = config.VOCAB_SIZE
)
What is it that you're trying to achieve?
I don't think that
key_fn
is related as it only affects the ordering of the vocabulary (and top k when provided)Could you compute the vocabulary after appending the added information?
tft.vocabulary(tf.strings.join([words, <mask>, <pad>]), ...)
This would result in the vocabulary including the added suffix