read pyTextRank file

147 views Asked by At

I have a piece of text that I wish to present as a graph using pytextrank. The code (copied from source) is

    import spacy
    nlp = spacy.load("en_core_web_sm")
    import pytextrank
    import graphviz
    tr = pytextrank.TextRank()
    nlp.add_pipe(tr.PipelineComponent, name='textrank', last=True)
    
    line = "the ballistic nuclear threat can be thwarted by building a nuclear shield"
    doc = nlp(line)
    tr.write_dot(path="graph.dot")

"it" writes something to the file "graph.dot". This looks like a json file with as first field "digraph {}". At this point I'm lost. How do I create a nice graph of the text (or a graph at all, for that matter)

thanks,

Andreas

using ubuntu 20.04.1LTS, python 3.8, pytextrank 2.0.3

1

There are 1 answers

0
Paco On

There are updates in the new online documentation for PyTextRank, and in particular see the "Getting Started" page at https://derwen.ai/docs/ptr/start/ for example code. Similar code is also shown in the sample.py script in the GitHub repo.

BTW, the most recent release is 3.0.1, which is tracking the new spaCy 3.x updates.

Here's a simple usage:

import spacy
import pytextrank

# example text
text = "the ballistic nuclear threat can be thwarted by building a nuclear shield"

# load a spaCy model, depending on language, scale, etc.
nlp = spacy.load("en_core_web_sm")

# add PyTextRank to the spaCy pipeline
nlp.add_pipe("textrank", last=True)
doc = nlp(text)

# examine the top-ranked phrases in the document
for p in doc._.phrases:
    print("{:.4f} {:5d}  {}".format(p.rank, p.count, p.text))
    print(p.chunks)

The output would be:

0.1712     1  a nuclear shield
[a nuclear shield]
0.1652     1  the ballistic nuclear threat
[the ballistic nuclear threat]

If you want to visualize the lemma graph in Graphviz or other libraries which read the DOT file format, you can add:

tr = doc._.textrank
tr.write_dot(path="graph.dot")

That will write output to a "graph.dot" file. See the Graphviz docs for examples of how to read and render.

FWIW, we are currently working on integration of the kglab library, which will open up a much broader range of graph manipulation and visualization capabilities, since it integrates with may other libraries and file formats.

Also, if you have any suggestions or requests in terms of how you'd like to visualize results from PyTextRank, it's really helpful to create an issue at https://github.com/DerwenAI/pytextrank/issues and our developer community can help more there.

My apologies if I'm not interpreting correctly about "present the text as a graph", since another way to think about that would be to use the displaCy dependency visualizer which shows a grammatical dependency graph of tokens in a sentence. There's an example given in the spaCy tuTorial.