I'm trying to make a method which can check whether a given phrase matches at least one item from list of phrases and returns them. Input is the phrase, a list of phrases and a dictionary of lists of synonyms. The point is to make it universal.
Here is the example:
phrase = 'This is a little house'
dictSyns = {'little':['small','tiny','little'],
'house':['cottage','house']}
listPhrases = ['This is a tiny house','This is a small cottage','This is a small building','I need advice']
I can create a code which can do that on this example which returns bool:
if any('This'+' '+'is'+' '+'a'+x+' '+y == phrase for x in dictSyns['little'] for y in dictSyns['house']):
print 'match'
The first point is that I have to create the function which would be universal (depends on results). The second is that I want this function to returns list of matched phrases.
Can you give me an advice how to do that so the method returns ['This is a tiny house','This is a small cottage']
in this case?
The output would be like:
>>> getMatches(phrase, dictSyns, listPhrases)
['This is a tiny house','This is a small cottage']
I would approach this as follows:
The root of the code is the assignment of
words
, innew_phrases
, which transforms thephrase
andsyns
into a more usable form, a list where each element is a list of the acceptable choices for that word:Note the following:
set
for efficient (O(1)
, vs.O(n)
for a list) membership testing;itertools.product
to generate the possible combinations ofphrase
based on thesyns
(you could also useitertools.ifilter
in implementing this); andIn use:
Things to think about:
"House of Commons"
be treated)?