Generate Random Sentence From Grammar or Ngrams?

1.3k views Asked by At

I am writing a program that should spit out a random sentence of a complexity of my choosing. As a concrete example, I would like to aid my language learning by spitting out valid sentences of a grammar structure and using words that I have already learned. I would like to use python and nltk to do this, although I am open to other ideas.

It seems like there are a couple of approaches:

  • Define a grammar file that uses the grammar and lexicon I know about, and then generate all valid sentences from this list, then selecting a random answer.
  • Load in corpora to train ngrams, which then can be used to construct a sentence.

Am I thinking about this correctly? Is one approach preferred over the other? Any tips are appreciated. Thanks!

2

There are 2 answers

4
Karpov On BEST ANSWER

If I'm getting it right and if the purpose is to test yourself on the vocabulary you already have learned, then another approach could be taken:

Instead of going through the difficult labor of NLG (Natural Language Generation), you could create a search program that goes online, reads news feeds or even simply Wikipedia, and finds sentences with only the words you have defined.

In any case, for what you want, you will have to create lists of words that you have learned. You could then create search algorithms for sentences that contain only / nearly only these words.

That would have the major advantage of testing yourself on real sentences, as opposed to artificially-constructed ones (which are likely to sound not quite right in a number of cases).

An app like this would actually be a great help for learning a foreign language. If you did it nicely I'm sure a lot of people would benefit from using it.

0
alexis On

If your purpose is really to make a language learning aid, you need to generate grammatical (i.e., correct) sentences. If so, do not use ngrams. They stick together words at random, and you just get intriguingly natural-looking nonsense.

You could use a grammar in principle, but it will have to be a very good and probably very large grammar.

Another option you haven't considered is to use a template method. Get yourself a bunch of sentences, identify some word classes you are interested in, and generate variants by fitting, e.g., different nouns as the subject or object. This method is much more likely to give you usable results in a finite amount of time. There's any number of well-known bots that work on this principle, and it's also pretty much what language-teaching books do.