Assign scores to sentences for their 'quality'

67 views Asked by At

I have scrapped a lot of pages from a specific domain and I would like to identify which sentences from the text of these pages are more useful in terms of information they carry. Is there an NLP technique to do it? An example would be:

sent0 = "The cat is white"
sent1 = "Cat"
sent2 = "The reason why the cat is white is due to a certain type of pigmentation its fur contains"

Where the scores would be decrescent in the order: sent2, sent0, sent1.

1

There are 1 answers

2
harry On

I guess one thing that you can try is Information Retrieval Score. You could use traditional information retrieval methods like TF-IDF (Term Frequency-Inverse Document Frequency) to score the sentences. Sentences with higher scores are considered to be more important.

However, it seems to me TF-IDF doesn't handle semantics well, it's more about how often certain terms appear.