Stanford Stanza -- Dependency Parsing Module -- Output for document with more than one sentence

766 views Asked by At

I have a query about formatting the output of the dependency parsing module when the document to be parsed contains more than one sentence.

One of the examples of using the dependency parsing module in the Stanza manual (https://stanfordnlp.github.io/stanza/depparse.html) is as follows:

import stanza
nlp = stanza.Pipeline(lang='fr', processors='tokenize,mwt,pos,lemma,depparse')
doc = nlp('Nous avons atteint la fin du sentier.')
print(*[f'id: {word.id}\tword: {word.text}\thead id: {word.head}\thead: {sent.words[word.head-1].text if word.head > 0 else "root"}\tdeprel: {word.deprel}' for sent in doc.sentences for word in sent.words], sep='\n')

This example contains only one sentence. I would like to revise this code for a document that has more than one sentence. More specifically, I would like to revise the code so that all lines include a reference to the relevant sentence number.

The following is what I myself have come up with:

import stanza 
nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos,lemma,depparse ')
doc = nlp("Chris Manning teaches at Stanford University. He lives in the Bay Area.")
for i, sentence in enumerate(doc.sentences):
     print(*[f'sentence: {i+1}\tid: {word.id}\tword: {word.text}\thead id: {word.head}\thead: {sentence.words[word.head-1].text if word.head > 0 else "root"}\tdeprel: {word.deprel}' for word in sentence.words], sep='\n')

This seems to work properly. However, given that I have little experience with writing code, I would very much appreciate knowing if what I have come up with is okay or whether you would recommend doing something different.

Thank you in advance for your time and assistance.

0

There are 0 answers