Module 'pytextrank' has no attribute 'parse_doc'

1.8k views Asked by At

I am performing a nlp task. I have written the following code. While executing, it is showing the following error. Any suggestion to resolve the error will be helpful. I am having python 3 env in google colab .

# Pytextrank
import pytextrank
import json

# Sample text
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'

# Create dictionary to feed into json file

file_dic = {"id" : 0,"text" : sample_text}
file_dic = json.dumps(file_dic)
loaded_file_dic = json.loads(file_dic)

# Create test.json and feed file_dic into it.
with open('test.json', 'w') as outfile:
json.dump(loaded_file_dic, outfile)

path_stage0 = "test.json"
path_stage1 = "o1.json"

# Extract keyword using pytextrank
with open(path_stage1, 'w') as f:
for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))

print(pytextrank.pretty_print(graf._asdict()))

I am getting the following error :

  AttributeError                            Traceback (most recent call last)      
  <ipython-input-33-286ce104df34> in <module>()      
       20 # Extract keyword using pytextrank      
       21 with open(path_stage1, 'w') as f:      
  ---> 22   for graf in 
  pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):     
       23     f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))       
       24     print(pytextrank.pretty_print(graf._asdict()))      

      AttributeError: module 'pytextrank' has no attribute 'parse_doc'   
3

There are 3 answers

1
Kum_R On

Implementation of TextRank in Python for use in spaCy pipelines

import spacy
import pytextrank
nlp = spacy.load('en_core_web_sm')
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name='textrank', last=True)
# Sample text
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'
#funct
for p in doc._.phrases:
    print(p.text)
2
Paco On

There's a newer release of PyTextRank which simplifies the calling code, and makes these steps unnecessary: https://spacy.io/universe/project/spacy-pytextrank

0
patme On

AttributeError: module 'pytextrank' has no attribute 'TextRank'

reproduce err:

run:

def summarize_text_returns_expected_summary(nlp, text):
    doc = process_text(nlp, text)
    if 'textrank' not in nlp.pipe_names:
        tr = pytextrank.TextRank()
        nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
    doc = nlp(text)
    return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]

error:

AttributeError: module 'pytextrank' has no attribute 'TextRank'

fix:

step_1

check pytextrank installation

pip list | grep pytextrank

step_2

replace:

tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)

with:

nlp.add_pipe("textrank")

updated code:

def summarize_text_returns_expected_summary(nlp, text):
    doc = process_text(nlp, text)
    if 'textrank' not in nlp.pipe_names:
        nlp.add_pipe("textrank")
    doc = nlp(text)
    return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]

omitting the if statement, risks encountering errors when accessing textrank: the script won't check if textrank is present in the pipeline.

why?

spacy pipeline: sequence of processing steps (tokenization, POS tagging, NER).

incorrect code manually uses pytextrank.TextRank(), then attempts to add it to the pipeline.

tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)

correct code:

nlp.add_pipe("textrank")

auto adds textrank component correctly, ensuring proper registration and accessibility.

adding TextRank to the spacy pipeline registers its methods, attributes, and allows access via ._ on documents (e.g., doc._.textrank.summary()).

notes on module 'pytextrank' has no attribute 'parse_doc

a parser is often a necessary component in NLP pipeline.

it can be added to the pipeline alongside PyTextRank.

since:

error msg indicates that the parse_doc function is not found in the pytextrank module. potentially, due to changes in the pytextrank library: some functions might have been removed; or simply, do not exist.

do instead:

load a spacy parser, and add it to the pipeline along pytextrank.

i.e. the spacy small english model en_core_web_sm tokenizes the text before parsing it.

example:

import spacy
import pytextrank
import json

def get_top_ranked_phrases(text):
   nlp = spacy.load("en_core_web_sm")

   nlp.add_pipe("textrank")
   doc = nlp(text)

   top_phrases = []

   for phrase in doc._.phrases:
       top_phrases.append({
           "text": phrase.text,
           "rank": phrase.rank,
           "count": phrase.count,
           "chunks": phrase.chunks
       })

   return top_phrases

sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'

top_phrases = get_top_ranked_phrases(sample_text)

for phrase in top_phrases:
   print(phrase["text"], phrase["rank"], phrase["count"], phrase["chunks"])

output:

output_of_sample.py

code notes:

✔︎ load spacy small english model

✔︎ add pytextrank to pipeline

✔︎ store the top-ranked phrases

✔︎ examine the top-ranked phrases in the document

✔︎ print the top-ranked phrases

references:

-DerwenAI

-(https://spacy.io/universe/project/spacy-pytextrank)

-textrank: bringing order into text

-keywords and sentence extraction with textrank (pytextrank)

-模块'pytextrank'没有属性'parse_doc'

-scattertext/issues/92

-AttributeError: module 'pytextrank' has no attribute 'TextRank' #2