I guess you could classify this as a Scrabble style problem, but it started out due to a friend mentioning the UK TV quiz show Countdown. Various rounds in the show involve the contestants being presented a scrambled set of letters and they have to come up with the longest word they can. The one my friend mentioned was "RAEPKWAEN".
In fairly short order I whipped up something in Python to handle this problem, using PyEnchant to handle the dictionary look-ups, however I'm noticing that it really can't scale all that well.
Here's what I have currently:
#!/usr/bin/python
from itertools import permutations
import enchant
from sys import argv
def find_longest(origin):
s = enchant.Dict("en_US")
for i in range(len(origin),0,-1):
print "Checking against words of length %d" % i
pool = permutations(origin,i)
for comb in pool:
word = ''.join(comb)
if s.check(word):
return word
return ""
if (__name__)== '__main__':
result = find_longest(argv[1])
print result
That's fine on a 9 letter example like they use in the show, 9 factorial = 362,880 and 8 factorial = 40,320. On that scale even if it would have to check all possible permutations and word lengths it's not that many.
However once you reach 14 characters that's 87,178,291,200 possibly combinations, meaning you're reliant on luck that a 14 character word is quickly found.
With the example word above it's taking my machine about 12 1/2 seconds to find "reawaken". With 14 character scrambled words we could be talking on the scale of 23 days just to check all possible 14 character permutations.
Is there any more efficient way to handle this?
Implementation of Jeroen Coupé idea from his answer with letters count:
Output (for my small 58000 words dict):
Notes:
It's simple implementation without optimizations.
words_list.txt
- can be/usr/share/dict/words
on Linux.UPDATE
In case we need to find word only once, and we have dictionary with words sorted by length, e.g. by this script:
We can find longest word without loading full dict to memory: