Value error in Spacy when using pytextrank(Python implementation of textrank)

Question

Value error in Spacy when using pytextrank(Python implementation of textrank)

1.3k views Asked by Pathmila Kariyawasam At 18 July 2017 at 23:21

I have used pytextrank to extract keywords. I installed both pytextrank and spacy using below commands.

pip install pytextrank
pip install -U spacy
python -m spacy download en

Here is my code

import pytextrank
import sys

path_stage0 = jsonPath
path_stage1 = "data/json/temp/o1.json"

with open(path_stage1, 'w') as f:
    for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
        f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
        # to view output in this notebook
        print(pytextrank.pretty_print(graf))

I get below error when I try to execute this

ValueError                                Traceback (most recent call last)
<ipython-input-12-07819fc6acea> in <module>()
  6 
  7 with open(path_stage1, 'w') as f:
  ----> 8     for graf in 
  pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
  9         f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
 10         # to view output in this notebook

 /home/sameera/anaconda2/lib/python2.7/site-
 packages/pytextrank/pytextrank.pyc in parse_doc(json_iter)
259                 print("graf_text:", graf_text)
260 
--> 261             grafs, new_base_idx = parse_graf(meta["id"], graf_text, base_idx)
262             base_idx = new_base_idx
263 

/home/sameera/anaconda2/lib/python2.7/site-packages/pytextrank/pytextrank.pyc in parse_graf(doc_id, graf_text, base_idx, spacy_nlp)
193     doc = spacy_nlp(graf_text, parse=True)
194 
--> 195     for span in doc.sents:
196         graf = []
197         digest = hashlib.sha1()

/home/sameera/anaconda2/lib/python2.7/site-packages/spacy/tokens/doc.pyx in __get__ (spacy/tokens/doc.cpp:9664)()
432 
433             if not self.is_parsed:
--> 434                 raise ValueError(
435                     "sentence boundary detection requires the dependency parse, which "
436                     "requires data to be installed. If you haven't done so, run: "

ValueError: sentence boundary detection requires the dependency parse, which 
requires data to be installed. If you haven't done so, run: 
python -m spacy download en
to install the data

I am using python 2.7, anaconda 4.3, jupyter notebook and ubuntu 14.04

Original Q&A

There are 2 answers

**dukeluke** · Answer 1 · 2017-07-18T23:42:08+00:00

This may just be an error in how you copied your code to StackOverflow, but if not:

Be sure to indent everything underneath the "with" statement, including the for loop.

Basically:

with open(path_stage1, 'w') as f:
    for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
        f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
        print(pytextrank.pretty_print(graf))

**Paco** · Answer 2 · 2017-09-26T16:26:52+00:00

It might be better to use the requirements.txt in the pytextrank package instead of pip install -U spacy -- since spaCy is evolving rapidly and the -U will install the latest version. Those updates haven't always been backwards compatible.

Also, feel free to post issues on the GitHub repo for pytextrank: https://github.com/ceteri/pytextrank/issues

Glad to hear about usage :)

TechQA.

Value error in Spacy when using pytextrank(Python implementation of textrank)

There are 2 answers

Related Questions in PYTHON-2.7

Related Questions in JUPYTER-NOTEBOOK

Related Questions in SPACY

Related Questions in PYTEXTRANK

Popular Questions

Trending Questions