I am writing a program that should spit out a random sentence of a complexity of my choosing. As a concrete example, I would like to aid my language learning by spitting out valid sentences of a grammar structure and using words that I have already learned. I would like to use python and nltk to do this, although I am open to other ideas.
It seems like there are a couple of approaches:
- Define a grammar file that uses the grammar and lexicon I know about, and then generate all valid sentences from this list, then selecting a random answer.
- Load in corpora to train ngrams, which then can be used to construct a sentence.
Am I thinking about this correctly? Is one approach preferred over the other? Any tips are appreciated. Thanks!
If I'm getting it right and if the purpose is to test yourself on the vocabulary you already have learned, then another approach could be taken:
Instead of going through the difficult labor of NLG (Natural Language Generation), you could create a search program that goes online, reads news feeds or even simply Wikipedia, and finds sentences with only the words you have defined.
In any case, for what you want, you will have to create lists of words that you have learned. You could then create search algorithms for sentences that contain only / nearly only these words.
That would have the major advantage of testing yourself on real sentences, as opposed to artificially-constructed ones (which are likely to sound not quite right in a number of cases).
An app like this would actually be a great help for learning a foreign language. If you did it nicely I'm sure a lot of people would benefit from using it.