I'm experimenting with text generators, like OpenAI's GPT-2, Hugging Face's transformers, and Facebook's ParlAI, and I'm wondering if I can limit or weight the output to a specified list of words? For example, how can I limit the output to only words that start with the letter 'a'?
One obvious idea is to train on a dataset that is limited by that vocabulary, but I only have a laundry list of words, not a natural corpus that only has those words.
yes, for instance if you're using huggingface, have a look at
force_words_ids
(Generation). In this way, the model will generate using only the list of token ids that you've created.