I am implementing a simple Naive Bayes classifier but I did not understand how to properly calculate the class conditional probability (P(d|c)). Just for completeness I shortly would like to explain the used terminology. Naive Bayes probabilities are computed by:
c denotes an arbitrary class while d is a document. Let x = {x1,x2,...,xn} be a list of n features e.g. 50 most frequent bigrams).
In my training set there are i classes (represented by a folder called c_i) and each of them has k documents (represented by normal text files).
The a-priori probability P(c) can be calculated easily:
Now I want to calculate P(d|c). This should be done by
Now I don't understand well how to compute P(x_i|c). I take feature x_i (let's say bigram "th") and now check how often it appears in class c. But how do I do it? Each class is represented by k documents. Do I have to concatenate all those files? Later I certaintly have to divide by "total count of all features". Would this be the frequency of bigram "th" in all (concatenated) documents?
The Bayes approach makes the assumption that a document is a set of words that were independently drawn from some probability distribution. Based on this independence assumption, you can indeed concatenate all the documents in a class and use the word frequencies of the class documents union as your estimate of the class probability distribution.