Value error in Spacy when using pytextrank(Python implementation of textrank)

1.3k views Asked by At

I have used pytextrank to extract keywords. I installed both pytextrank and spacy using below commands.

pip install pytextrank
pip install -U spacy
python -m spacy download en

Here is my code

import pytextrank
import sys

path_stage0 = jsonPath
path_stage1 = "data/json/temp/o1.json"

with open(path_stage1, 'w') as f:
    for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
        f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
        # to view output in this notebook
        print(pytextrank.pretty_print(graf))

I get below error when I try to execute this

ValueError                                Traceback (most recent call last)
<ipython-input-12-07819fc6acea> in <module>()
  6 
  7 with open(path_stage1, 'w') as f:
  ----> 8     for graf in 
  pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
  9         f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
 10         # to view output in this notebook

 /home/sameera/anaconda2/lib/python2.7/site-
 packages/pytextrank/pytextrank.pyc in parse_doc(json_iter)
259                 print("graf_text:", graf_text)
260 
--> 261             grafs, new_base_idx = parse_graf(meta["id"], graf_text, base_idx)
262             base_idx = new_base_idx
263 

/home/sameera/anaconda2/lib/python2.7/site-packages/pytextrank/pytextrank.pyc in parse_graf(doc_id, graf_text, base_idx, spacy_nlp)
193     doc = spacy_nlp(graf_text, parse=True)
194 
--> 195     for span in doc.sents:
196         graf = []
197         digest = hashlib.sha1()

/home/sameera/anaconda2/lib/python2.7/site-packages/spacy/tokens/doc.pyx in __get__ (spacy/tokens/doc.cpp:9664)()
432 
433             if not self.is_parsed:
--> 434                 raise ValueError(
435                     "sentence boundary detection requires the dependency parse, which "
436                     "requires data to be installed. If you haven't done so, run: "

ValueError: sentence boundary detection requires the dependency parse, which 
requires data to be installed. If you haven't done so, run: 
python -m spacy download en
to install the data

I am using python 2.7, anaconda 4.3, jupyter notebook and ubuntu 14.04

2

There are 2 answers

0
dukeluke On

This may just be an error in how you copied your code to StackOverflow, but if not:

Be sure to indent everything underneath the "with" statement, including the for loop.

Basically:

with open(path_stage1, 'w') as f:
    for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
        f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
        print(pytextrank.pretty_print(graf))
0
Paco On

It might be better to use the requirements.txt in the pytextrank package instead of pip install -U spacy -- since spaCy is evolving rapidly and the -U will install the latest version. Those updates haven't always been backwards compatible.

Also, feel free to post issues on the GitHub repo for pytextrank: https://github.com/ceteri/pytextrank/issues

Glad to hear about usage :)