How to get Dependency Tree in JSON format in SyntaxNet?

Question

How to get Dependency Tree in JSON format in SyntaxNet?

566 views Asked by Fr_nkenstien At 21 September 2017 at 18:56

I am trying to get a dependency tree in JSON format from SyntaxNet but all I get from the examples is a Sentence Object which is providing no accessors to access the parsed object or even iterate through the items listed.

When I run the examples from the docker file provided by TensorFlow/SyntaxNet, what I get is as below

text: "Alex saw Bob"
token {
  word: "Alex"
  start: 0
  end: 3
  head: 1
  tag: "attribute { name: \"Number\" value: \"Sing\" } attribute { name: \"fPOS\" value: \"PROPN++NNP\" } "
  category: ""
  label: "nsubj"
  break_level: NO_BREAK
}
token {
  word: "saw"
  start: 5
  end: 7
  tag: "attribute { name: \"Mood\" value: \"Ind\" } attribute { name: \"Tense\" value: \"Past\" } attribute { name: \"VerbForm\" value: \"Fin\" } attribute { name: \"fPOS\" value: \"VERB++VBD\" } "
  category: ""
  label: "root"
  break_level: SPACE_BREAK
}
token {
  word: "Bob"
  start: 9
  end: 11
  head: 1
  tag: "attribute { name: \"Number\" value: \"Sing\" } attribute { name: \"fPOS\" value: \"PROPN++NNP\" } "
  category: ""
  label: "parataxis"
  break_level: SPACE_BREAK
}

The class type of this object is class 'syntaxnet.sentence_pb2.Sentence' which in it self does not have any documentation.

I need to be able to access the above output programmatically.

As seen in this question, It prints a table in string format and does not give me a programmatic response.

How can i get the response and not a print output. or should i write a parser for this output..?

Original Q&A

There are 1 answers

**Ido.Schwartzman** · Accepted Answer · 2018-08-09T10:01:09+00:00

TL;DR Code at the end...

The Sentence object is an instance of the sentence_pb2.Setnence class, which is generated from protobuf definition files, specifically sentence.proto. This means that if you look at sentence.proto, you will see the fields that are defined for that class and their types. So you have a field called "tag" which is a string, a field called "label" which is a string, a field called head which is an integer and so on. In theory if you just convert to json using python's built-in functions it should work, but since protobuf classes are runtime generated metaclasses, they may produce some undesired results.

So what I did was first created a map object with all the info I wanted, then converted that to json:

def parse_attributes(attributes):
    matches = attribute_expression.findall(attributes)
    return {k: v for k, v in matches}

def token_to_dict(token):
    def extract_pos(fpos):
        i = fpos.find("++")
        if i == -1:
            return fpos, "<error>"
        else:
            return fpos[:i], fpos[i + 2:]

    attributes = parse_attributes(token.tag)
    if "fPOS" not in attributes:
        logging.warn("token {} has no fPos attribute".format(token.word))
        logging.warn("attributes are: {}".format(attributes))
        fpos = ""
    else:
        fpos = attributes["fPOS"]

    upos, xpos = extract_pos(fpos)
    return {
        'word': token.word,
        'start': token.start,
        'end': token.end,
        'head': token.head,
        'features': parse_attributes(token.tag),
        'tag': token.tag,
        'deprel': token.label,
        'upos': upos,
        'xpos': xpos
    }

def sentence_to_dict(anno):
    return {
        'text': anno.text,
        'tokens': [token_to_dict(token) for token in anno.token]
    }

If you call sentence_to_dict on the sentence object, you'll get a nice map which can then be serialized as json.

TechQA.

How to get Dependency Tree in JSON format in SyntaxNet?

There are 1 answers

Related Questions in JSON

Related Questions in TENSORFLOW

Related Questions in NLP

Related Questions in SYNTAXNET

Popular Questions

Popular Tags

Trending Questions