I have been working with the sentiments dataset and found that the bing and nrc datasets contain a few words that have both positive and negative sentiment.
** bing – three words with positive and negative sentiment **
env_test_bing_raw <- get_sentiments("bing") %>%
filter(word %in% c("envious", "enviously","enviousness"))
# A tibble: 6 x 2
word sentiment
<chr> <chr>
1 envious positive
2 envious negative
3 enviously positive
4 enviously negative
5 enviousness positive
6 enviousness negative
** nrc – 81 words with positive and negative sentiment **
test_nrc <- as.data.frame(
get_sentiments("nrc") %>%
filter(sentiment %in% c("positive","negative")) %>%
group_by(word) %>%
summarize(count = n()) %>%
filter(count > 1))
env_test_nrc <- get_sentiments("nrc") %>%
filter(sentiment %in% c("positive","negative")) %>%
filter(word %in% test_nrc$word)
# A tibble: 162 x 2
word sentiment
<chr> <chr>
1 abundance negative
2 abundance positive
3 armed negative
4 armed positive
5 balm negative
6 balm positive
7 boast negative
8 boast positive
9 boisterous negative
10 boisterous positive
# ... with 152 more rows
I was curious if I have done something wrong or how a word can have both negative and positive sentiments in a single source dataset. What are the standard practices for handling these situations?
Thank you!
Nope! You have not done anything wrong.
These lexicons were built in different ways. For example, the NRC lexicon was built via Amazon Mechanical Turk, showing human beings lots of words and asking them if they associated each word with joy, sadness, a positive or negative affect, etc. Then the researchers did a careful job of validation, calibration, etc. There are some English words that we as human language users can associate with both positive and negative feeling, such as "boisterous", and the researchers who built these particular lexicons decided to include these words as both.
If you have a text dataset that has the word "boisterous" in it and use a lexicon like this one, it will contribute in both the positive and negative direction (and also toward anger, anticipation, and joy, in that particular case). If you end up calculating a net sentiment (positive minus negative) for some sentiment, section, or document, the effect of that particular word will cancel out.