How to add a word_count column to a dataFrame using scikit-learn?

163 views Asked by At

I am trying to build a sentiment analysis on amazon data. So, I started by reading the data

data = pd.read_csv("amazon_baby.csv")


---name---|------review-----|----rating----|
__________|_________________|______________|
          |                 |              |
          |                 |              |
          |                 |              |
___________________________________________|

I want to add another column that contains the word count vector for each review to be like this

---name---|------review-----|----rating----|----word_count----|
__________|_________________|______________|__________________|
          |                 |              |                  |
          |                 |              |                  |
          |                 |              |                  |
______________________________________________________________|

I use the following code

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
data['word_count'] = vectorizer.fit_transform(data['review'])
data.head()

While I expect to see cells contain an output like this:

[{'recommend': 1.0, 'disappointed': 1.0, 'wise': 1.0, 'love': 1.0, 'it': 3.0, 'planet': 1.0, 'and': 3.0, 'bags': 1.0, 'wipes': 1.0, 'highly': 1.0, 'not': 2.0, 'early': 1.0, 'came': 1.0, 'i': 1.0, 'does': 1.0, 'my': 2.0, 'was': 1.0, 'now': 1.0, 'wipe': 1.0, 'holder': 1.0, 'leak': 1.0, 'keps': 1.0, 'osocozy': 1.0, 'moist': 1.0}]

Instead I have the following output:

(0, 60077)\t1\n  (0, 24510)\t1\n  (0, 66612)...
0

There are 0 answers