I am trying to perform a stemming and count vectorizer on the disaster tweet from Kaggle (https://www.kaggle.com/datasets/vstepanenko/disaster-tweets/data). I dropped the keyword, location, and target columns. I got an error when I run this code,FileNotFoundError: [Errno 2] No such file or directory: 'id'. How do I fix this?
from nltk.stem.porter import PorterStemmer
STEMMER=PorterStemmer()
# Use NLTK's PorterStemmer in a function
def MY_STEMMER(str_input):
words = re.sub(r"[^A-Za-z\-]", " ", str_input).lower().split()
words = [STEMMER.stem(word) for word in words]
return words
## Create a CountVectorizer object that you can use
MyCV1 = CountVectorizer(input="filename",
#stop_words='english',
tokenizer=MY_STEMMER,
lowercase=True)
## Call your MyCV1 on the data
DTM1 = MyCV1.fit_transform(tweet)
## get col names
ColNames=MyCV1.get_feature_names_out()
print(ColNames)
## convert DTM to DF
MyDF1 = pd.DataFrame(DTM1.toarray(), columns=ColNames)
print(MyDF1)