Pytextrank - avoid lowercasing tags into key phrases extraction

88 views Asked by At

I want to avoid lowercasing tags in pytextrank. Any suggestions on how that can be achieved?

1

There are 1 answers

0
Paco On

As of PyTextRank version 2.1.0 (released on 2021-01-31) when an application iterates through the ranked phrases, such as:

for phrase in doc._.phrases[:10]:
    print(phrase.text)

... the default text for each phrase is its most popular instance appearing in the document. That's what gets set in the text field of the Phrase data class.

However, check out the chunks field for all instances of the phrase that occur in the document. Since these are extracted from the document's raw text, these do not get forced to lowercase.

OTOH, when the algorithm constructs its internal lemma graph data structure, the lemmatized tokens are forced to lowercase. However, you don't need to use the lemma graph as the end results. Perhaps that may be some source of confusion?