Can tfidf be weighed to improve classification of sparse data in a corpus?

130 views Asked by filpa At 16 December 2014 at 11:52

I am currently using tfidf prior to performing classification on a number of websites based on their content. Unfortunately, my training data is not uniform: about 70% of the pre-labeled websites are news sites, while the rest (tech, arts, entertainment, etc.) are each a vast minority.

My questions are the following:

Is it possible to adjust tfidf so that it weighs different labels differently and make it behave as if the data were uniform? Should I perhaps be using a different approach in this case? I am currently using the Gaussian Naive Bayes classifier after the tfidf analysis, would something else be better suited in this specific case?
Is it possible to have tfidf give me a list of possible labels when the probability that it is exactly a given label is below a certain threshold? For example, if the vector entries are close enough that it is only slightly (< 1-2%) more probable that it is one class rather than another, can it print both?

Original Q&A

TechQA.

Can tfidf be weighed to improve classification of sparse data in a corpus?

There are 0 answers

Related Questions in CLASSIFICATION

Related Questions in TF-IDF

Related Questions in DOCUMENT-CLASSIFICATION

Popular Questions

Popular Tags

Trending Questions