Conceptual: Collect "synonyms" from a list of "words"

220 views Asked by At

This question is inspired by: Generating a list of repetitions regardless of the order and its accepted answer: https://stackoverflow.com/a/20336020/1463143

Here, "alphabet" is any set of letters e.g. '012' or 'EDCRFV'

"words" are obtained by doing a cartesian product over the alphabet. We should be able to specify n for getting n-lettered words. Example:

from itertools import product
alphabet = '012'
wordLen = 3
wordList = [''.join(letter) for letter in product(alphabet,repeat=wordLen)]
print wordList

which gives:

['000', '001', '002', '010', '011', '012', '020', '021', '022', '100', '101', '102', '110', '111', '112', '120', '121', '122', '200', '201', '202', '210', '211', '212', '220', '221', '222']

a "synonym" is obtained by... uh... if only I could articulate this...

these lists contain all the possible "synonyms" within wordList:

['000',
 '111',
 '222'] 

['001',
 '002',
 '110',
 '112',
 '220',
 '221']

['010',
 '020',
 '101',
 '121',
 '202',
 '212']

['011',
 '022',
 '100',
 '122',
 '200',
 '211']

['012',
 '021',
 '102',
 '120',
 '201',
 '210']

Sadly, I'm unable to articulate how I obtained the above lists of "synonyms". I would like to do something as above for an arbitrary alphabet forming n-lettered words.

3

There are 3 answers

0
georg On BEST ANSWER

Looks quite easy:

syns = collections.defaultdict(list)

for w in wordList:
    hash = tuple(w.index(c) for c in w)
    syns[hash].append(w)

print syns.values()
0
ile On

It seems the rule you want (for larger n as well) is the following:

A word u is a synonym of v iff u can be obtained from v by swapping two characters in the alphabet, i.e. all the words obtained from all alphabet's permutations will be synonyms.

Example: Let u = 001, and alphabet be 012.

There are six permutations of the alphabet: '012', '021', '102', '120', '201', '210'. Map u with all this permutations to get synonyms for u:

'001'
'002'
'110'
'112'
'220'
'221'
2
akaRem On

A:

[ word for word in wordList 
    if  word[0] == word[1]
    and word[0] == word[2] ]

B:

[ word for word in wordList 
    if  word[0] == word[1]
    and word[0] != word[2] ]

C:

[ word for word in wordList 
    if  word[0] != word[1]
    and word[0] == word[2] ]

D:

[ word for word in wordList 
    if  word[0] != word[1]
    and word[1] == word[2] ]

E:

[ word for word in wordList 
    if  word[0] != word[1]
    and word[0] != word[2] ]

So, its groups of all variations of equality letters in word:
'abc' -> a<>b, b=c, c<>a ; a=b, b=c, c=a ; etc..

every empty result (for ex: a<>b, b=c, c=a) is excluded