I would like to append words to the vocabulary created by tft.vocabulary that are not a part of the training samples (i.e. <mask> and <pad> tokens).
I see in the docs that the tft.vocabulary function can take an argument key_fn which the docs says:
Supply key_fn if you would like to generate a vocabulary with coverage over specific keys.
but with the key_fn below it still does not append the <mask> and <pad> tokens to the vocabulary.
def _key_fn(x):
return tf.constant(['<mask>', '<pad>'])
vocab = tft.vocabulary(
words,
key_fn = lambda x : _key_fn(x),
top_k = config.VOCAB_SIZE
)
What is it that you're trying to achieve?
I don't think that
key_fnis related as it only affects the ordering of the vocabulary (and top k when provided)Could you compute the vocabulary after appending the added information?
tft.vocabulary(tf.strings.join([words, <mask>, <pad>]), ...)This would result in the vocabulary including the added suffix