Linked Questions

Popular Questions

how is PcGw computed in quanteda's Naive Bayes?

Asked by At

Consider the usual example that replicates example from 13.1 of An Introduction to Information Retrieval https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

txt <- c(d1 = "Chinese Beijing Chinese",
         d2 = "Chinese Chinese Shanghai",
         d3 = "Chinese Macao",
         d4 = "Tokyo Japan Chinese",
         d5 = "Chinese Chinese Chinese Tokyo Japan")

trainingset <- dfm(txt, tolower = FALSE)
trainingclass <- factor(c("Y", "Y", "Y", "N", NA), ordered = TRUE)

tmod1 <- textmodel_nb(trainingset, y = trainingclass, prior = "docfreq")

According to the docs, PcGw is the posterior class probability given the word. How it is computed? I thought what we cared about was the other way around, that is P(word / class).

> tmod1$PcGw
       features
classes   Chinese   Beijing  Shanghai     Macao     Tokyo     Japan
      N 0.1473684 0.2058824 0.2058824 0.2058824 0.5090909 0.5090909
      Y 0.8526316 0.7941176 0.7941176 0.7941176 0.4909091 0.4909091

Thanks!

Related Questions