Though this is a common question, I couldn't find a solution for it that works for my case. I have data, which is comma separated like below.
['my scientific','data']['is comma-separated','frequency']
I'm trying to remove stop words using
from nltk.corpus import stopwords
stopword = stopwords.words('english')
mynewtext = [w for w in transposed if w not in stopword]
out_file.writerow(w)
But it gives me an error saying 'UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal'. I'm not sure where I'm committing a mistake. I want my output in a csv file to be like
scientific,data
comma-separated,frequency
Also, I'd want it to work for both the cases, upper and lower. casefield doesn't work in my Python version 2.7
Try
in the header of your source code.
It tells Python that the source file you've saved is
utf-8
. The default for Python 2 is ASCII (for Python 3 it'sutf-8
). This just affects how the interpreter reads the characters in the file.