keyword extraction and Keyword based text classification

546 views Asked by At

Currently i am working on a project which requires keywords extraction or we can say keyword based text classification . The dataset contains 3 columns text, keywords and cc terms, I need to extract keywords from text and then classify the text based on those keywords, each row in dataset has their own keywords, i want to extract similar kind of keywords. I want to train the by providing text and keyword column so that the model is able to extract keywords for unknown text.please help

image contains dataset

1

There are 1 answers

2
Jindřich On BEST ANSWER

Keyword extraction is typically done using TF-IDF scores simply by setting a score threshold. When training a classifier, it does not make much sense to cut off the keywords at a certain threshold, knowing that something is not likely to be a keyword might also be a valuable piece of information for the classifier.

The simplest way to get the TF-IDF scores for particular words is using TfIdfVectorizer in scikit-learn that does all the laborious text preprocessing steps (tokenization, removing stop words).

You can probably achieve better results by fine-tuning BERT for your classification task (but of course at the expense of much higher computational costs).