I am trying to build a sentiment analysis on amazon data. So, I started by reading the data
data = pd.read_csv("amazon_baby.csv")
---name---|------review-----|----rating----|
__________|_________________|______________|
| | |
| | |
| | |
___________________________________________|
I want to add another column that contains the word count vector for each review to be like this
---name---|------review-----|----rating----|----word_count----|
__________|_________________|______________|__________________|
| | | |
| | | |
| | | |
______________________________________________________________|
I use the following code
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
data['word_count'] = vectorizer.fit_transform(data['review'])
data.head()
While I expect to see cells contain an output like this:
[{'recommend': 1.0, 'disappointed': 1.0, 'wise': 1.0, 'love': 1.0, 'it': 3.0, 'planet': 1.0, 'and': 3.0, 'bags': 1.0, 'wipes': 1.0, 'highly': 1.0, 'not': 2.0, 'early': 1.0, 'came': 1.0, 'i': 1.0, 'does': 1.0, 'my': 2.0, 'was': 1.0, 'now': 1.0, 'wipe': 1.0, 'holder': 1.0, 'leak': 1.0, 'keps': 1.0, 'osocozy': 1.0, 'moist': 1.0}]
Instead I have the following output:
(0, 60077)\t1\n (0, 24510)\t1\n (0, 66612)...