NLP - Specify custom vocabulary / word list for text generation

389 views Asked by At

I'm experimenting with text generators, like OpenAI's GPT-2, Hugging Face's transformers, and Facebook's ParlAI, and I'm wondering if I can limit or weight the output to a specified list of words? For example, how can I limit the output to only words that start with the letter 'a'?

One obvious idea is to train on a dataset that is limited by that vocabulary, but I only have a laundry list of words, not a natural corpus that only has those words.

1

There are 1 answers

0
Minions On

yes, for instance if you're using huggingface, have a look at force_words_ids (Generation). In this way, the model will generate using only the list of token ids that you've created.