Removing Stopwords in Python

3.5k views Asked by At

I'm trying to remove the stopwords from a user input string using the .join function. It looks like this:

while True:
    line = raw_input()
    if line.strip() == stopword:
        break
    remove_stopwords = ''.join(word for word in line.split() if word not in stop_words)

I've defined stop_words in a list at the top. The problem is that when I type in the string for the stop words to be removed from, it only removes the first word and leaves the rest. Any help would be great. I'm new to this so it's probably something stupid.

1

There are 1 answers

1
AudioBubble On BEST ANSWER

Here is a one liner using the filter function:

" ".join(filter(lambda word: word not in stop_words, line.split()))

Additionally, consider storing your stop words in a set rather than a list. The average algorithmic complexity of the search operation (in) is constant for a set and linear for a list.

Edit: Your program appears to be working as expected with an additional space for the join string. This makes sense as (x for x in y if f(x)) is roughly equivalent to filter:

  stop_words = set(["hi", "bye"])
  stopword = "DONE"
  while True:
      line = raw_input()
      if line.strip() == stopword:
          break
      print(" ".join(word for word in line.split() if word not in stop_words))

input:

hello hi my name is bye justin

output:

hello my name is justin

Your bug must be somewhere else in your program. What else are you doing?