I used TfidfVectorizer
to extract TF-IDF but don't know how it calculates the results. When I calculate it manually, I get a different answer, so I want to extract the values that the function calculates in order to learn how it works.
data = ['Souvenir shop|Architecture and art|Culture and history', 'Souvenir shop|Resort|Diverse cuisine|Fishing|Folk games|Beautiful scenery', 'Diverse cuisine|Resort|Beautiful scenery']
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(data)
Have a look in the scikit documentation at the
attributes
section.Try this:
Output
You get the idf calculations with
print(vectorizer.idf_)
Output
For your case you can do this (with pandas):
Output