allocating category to a comment pandas

70 views Asked by At

My task is to allocate broad and fine category to the text I have in a pandas dataframe.

My df is something like this:

          Text  
     I like this pen
     this is the worst light bulb ever
     these pants fit me just fine

Desired output:

          Text                                Broad_cat                Fine_cat 
     I like this pen                         Stationary                  Pen
     this is the worst light bulb ever       Electrical                  light Bulb
     these pants fit me just fine            Clothing                    Pants

The text could be from any category, so I cant use a prepared dictionary. These are reviews that I can get from any source. I was hoping that there is an open source python package that can help me with the specific task of categorization of a comment. I already tried YAKE, RAKE, Summa and KeyBERT methods and while each of them are giving me key words, they dont always turn out to be the category. Is this even possible? Any help in this regard is much appreciated.

1

There are 1 answers

13
Att Righ On

I presume you have a list of allowed categories?

This a multiclass classification problem.

A fiddly approach is you embed the sentences into some sort of vector space then use a somethign like the softmax function to select the class and then train your model based on training data. This post discusses this.

I think you might be more interested in zero-shot text classification. Hugging face has a pipeline (what of using models for certain tasks) for this with the property candidate_labels. So you should be able to use this with an appropriate model and specify candidate labels... though the underlying model would have support this in some way, but presumably some do. cross-encoder/nli-distilroberta-base appears to support this.

`