qdap ngram polarity dictionary

1.4k views Asked by At

Dear Stackoverlow crowd

I managed to use the qdap polarity function to calculate the polarity of some blog entries, loading my own dictionary, based on sentiWS. Now I do have a new sentiment dictionary (SePL) which not only contains single words, but as well phrases. For example "simply good", where "simply" is neither a negator nor an amplifier, but makes it more precise. So i was wondering, wether I could search for ngrams using the polarity function of qdap.

As an example:

library(qdap)
phrase <- "This is simply the best"
key <- sentiment_frame(c("simply", "best", "simply the best"), "", c(0.1,0.3,0.8))
counts(polarity(phrase, polarity.frame=key))

gives:

  all wc polarity    pos.words neg.words                text.var
1 all  5    0.179 simply, best         - This is simply the best

However, I would like to get an output like:

  all wc polarity    pos.words neg.words                text.var
1 all  5    0.76 simply the best         - This is simply the best

Anyone an Idea how to get that working like that?

All the best, Ben

1

There are 1 answers

2
Tyler Rinker On BEST ANSWER

This is a bug reintroduced with chages to the bag_o_word function earlier this year. This is the second time a bug like this has affected ngram polarity since I enble the usage of ngrams in polarity.frame: https://github.com/trinker/qdap/issues/185

I have fixed the bug and added a unit test to ensure this bug doesn't creep back into the code. Your code in qdap 2.2.1 now gives the desired output, though the warning against the original intention of the algorithm remains:

> library(qdap)
> phrase <- "This is simply the best"
> key <- sentiment_frame(c("simply", "best", "simply the best"), "", c(0.1,0.3,0.8))
> counts(polarity(phrase, polarity.frame=key))

  all wc polarity       pos.words neg.words                text.var
1 all  5    0.358 simply the best         - This is simply the best

qdap's polarity function uses an algorithm that was not designed to operate like this. You can do it using the following hack but know that it is out of the intent of the underlying theory used in the function's algorithm:

library(qdap)
phrase <- "This is simply the best"

terms <- c("simply", "best", "simply the best")
key <- sentiment_frame(space_fill(terms, terms, sep="xxx"), NULL, c(0.1,0.3,0.8))

counts(polarity(space_fill(phrase, terms, "xxx"), polarity.frame=key))

##   all wc polarity           pos.words neg.words                    text.var
## 1 all  3    0.462 simplyxxxthexxxbest         - This is simplyxxxthexxxbest