What is the optimal value of limit_phrases for the summary method in pyTextRank

Question

What is the optimal value of limit_phrases for the summary method in pyTextRank

70 views Asked by Ire00 At 19 October 2023 at 13:20

I am summarizing documents using TextRank pipeline in SpaCy. I need to summarize both long and short documents. Can you suggest a good approach to choose the right parameter of limit_phrases?

this is the approach I am currently using, but I am sure it can be improved:

import spacy
import pytextrank

nlp = spacy.load(spacy_model)
nlp.add_pipe("textrank", last=True)

# Process the input text
doc = nlp(text)

doc_sentences = len(list(doc.sents))
print(f'Number of document sentences = {doc_sentences}')
limit_sentences = int(doc_sentences * percentage)
limit_phrases = int(limit_sentences * 2)

top_sentences = doc._.textrank.summary(limit_phrases=limit_phrases, limit_sentences=limit_sentences, preserve_order=True)

Original Q&A

There are 1 answers

**Paco** · Accepted Answer · 2023-10-19T20:31:47+00:00

The optimal values for limit_phrases will depend strongly on your content. Do you have any kind of benchmark against which you could run test, essentially doing a grid search to find a reasonable setting for this parameter?

FWIW, I'm one of the authors of pytextrank, and this is really good question. There's no analytic way to determining how to set this parameter, as far as our team knows.

TechQA.

What is the optimal value of limit_phrases for the summary method in pyTextRank

There are 1 answers

Related Questions in PYTHON

Related Questions in SPACY

Related Questions in SUMMARY

Related Questions in PYTEXTRANK

Popular Questions

Popular Tags

Trending Questions