tidytext words with both positive and negative sentiment

318 views Asked by At

I have been working with the sentiments dataset and found that the bing and nrc datasets contain a few words that have both positive and negative sentiment.

** bing – three words with positive and negative sentiment **

env_test_bing_raw <- get_sentiments("bing") %>%
  filter(word %in% c("envious", "enviously","enviousness"))

# A tibble: 6 x 2
         word sentiment
        <chr>     <chr>
1     envious  positive
2     envious  negative
3   enviously  positive
4   enviously  negative
5 enviousness  positive
6 enviousness  negative

** nrc – 81 words with positive and negative sentiment **

test_nrc <- as.data.frame(
        get_sentiments("nrc") %>%
        filter(sentiment %in% c("positive","negative")) %>%
        group_by(word) %>%
        summarize(count = n()) %>%
        filter(count > 1))

env_test_nrc <- get_sentiments("nrc") %>%
  filter(sentiment %in% c("positive","negative")) %>%
  filter(word %in% test_nrc$word)

# A tibble: 162 x 2
         word sentiment
        <chr>     <chr>
 1  abundance  negative
 2  abundance  positive
 3      armed  negative
 4      armed  positive
 5       balm  negative
 6       balm  positive
 7      boast  negative
 8      boast  positive
 9 boisterous  negative
10 boisterous  positive
# ... with 152 more rows

I was curious if I have done something wrong or how a word can have both negative and positive sentiments in a single source dataset. What are the standard practices for handling these situations?

Thank you!

1

There are 1 answers

0
Julia Silge On

Nope! You have not done anything wrong.

These lexicons were built in different ways. For example, the NRC lexicon was built via Amazon Mechanical Turk, showing human beings lots of words and asking them if they associated each word with joy, sadness, a positive or negative affect, etc. Then the researchers did a careful job of validation, calibration, etc. There are some English words that we as human language users can associate with both positive and negative feeling, such as "boisterous", and the researchers who built these particular lexicons decided to include these words as both.

If you have a text dataset that has the word "boisterous" in it and use a lexicon like this one, it will contribute in both the positive and negative direction (and also toward anger, anticipation, and joy, in that particular case). If you end up calculating a net sentiment (positive minus negative) for some sentiment, section, or document, the effect of that particular word will cancel out.

library(tidytext)
library(dplyr)

get_sentiments("nrc") %>%
  filter(word == "boisterous")

#> # A tibble: 5 x 2
#>         word    sentiment
#>        <chr>        <chr>
#> 1 boisterous        anger
#> 2 boisterous anticipation
#> 3 boisterous          joy
#> 4 boisterous     negative
#> 5 boisterous     positive