Running a file through a function

93 views Asked by At

I'm trying to create a new file of rows of anagrams from a current file.

def Ana(str1, str2):
    str1_list = list(str1)
    str1_list.sort()
    str2_list = list(str2)
    str2_list.sort()
    return (str1_list == str2_list)

newerfile=open("ana.txt","w")
f = open("words.txt")
for word in f:
    s = str(word)
    for word2 in f:
        if word!=word2:
            if Ana(word, word2) is True:
                s += (' ') + str(word2)

if s!=str(word):
    newerfile.write(s)

The above is my current code, but all it gives me is an empty file. After experimenting a bit, I believe the problem is at the 4th last line - "if isAnagram(word, word2) if True:"

The function is not working for the file I have. I've tried a more basic version of the code to test the entire file against one word. Since the words 'was' and 'saw' are in the file, I should be getting those, but nothing is being printed.

y = 'was'
for line in open('real_words.txt'):
    if isAnagram(line,'was') is True:
        y += (' ') + str(line)
print(y)

The function is working fine when I provide a list of words, but not for a file. Any help is appreciated.

Also is there any way to delete the word and all anagrams from the file if function returns true?

2

There are 2 answers

0
tobias_k On

(Assuming words.txt is a file with one word per line, and you are trying to find all pairs of anagrams of those words and print them to a second file, one pair per line.)

There seem to be two problems with your code:

  1. When you do for word in f you are using an iterator, and with for word2 in f you use the same iterator, i.e., that iterator will be exhausted after the first iteration of the loop!
  2. You seem to write to file only after the loop has completed, but s will always hold just the last pair of words, so you would write just that pair to the file. (Could just be a problem with indentation.)

For getting all the combinations of two words, it would be best to use itertools.combinations, somewhat like this (untested, kind of pseudo-code):

words = infile.read().splitlines()
for w1, w2 in itertools.combinations(words, 2):
     if isAnagram(w1, w2):
          outfile.write("%s %s" % (w1, w2))

However, this will write just one pair of anagrams per line. If you want to write entire groups into one line, I guess you will need two loops, like in your code. Just remember not to use the same iterator for both loops, e.g. put the contents of the filt into a list first, and then use that list for the loops.

You could also use a list comprehension for this:

words = infile.read().splitlines()
for w1 in words:
    outfile.write(" ".join(w2 for w2 in words if isAnagram(w1, w2)))

(Note that this still is not perfect, as the lines will be repeated, once for each word in the group. But I'm sure you can figure out the rest by yourself.)

1
Peter DeGlopper On

The best data structure for this is a dict of lists, where the sorted version of each string is the key. Each word that shares that sorted version will go into the list. After generating that, keys with only one word go into the file for words with no anagrams, while keys with multiple words go into the anagrams file.

from collections import defaultdict
words_by_sorted = defaultdict(list)
f = open("words.txt")
for line in f:
    word = line.strip() # remove the newline
    sorted_key = tuple(sorted(word))
    words_by_sorted[sorted_key].append(word)
f.close()
unanagrammed = open("unanagrammed.txt", "w")
anagrammed = open("anagrammed.txt", "w")
for words in words_by_sorted.itervalues():
    if len(words) == 1:
        unanagrammed.write(words[0] + '\n')
    else:
        anagrammed.write(' '.join(words) + '\n')
unanagrammed.close()
anagrammed.close()

This does not maintain order in either file. If you need to do that, you could maintain a list of the sorted keys in the order you first saw them, or use an ordered dict (and explicitly create the lists as needed rather than use a defaultdict) if you're on 2.7.

It also creates two files - you can't really "delete them from the old file", but you could overwrite it if you like. This approach lets you examine your output more closely before removing the input.

It would also be a good practice to use with statements for opening the files, but your base code uses raw open so I stuck with that.