Syntaxnet / Parsey McParseface python API

4.4k views Asked by At

I've installed syntaxnet and am able to run the parser with the provided demo script. Ideally, I would like to run it directly from python. The only code I found was this:

import subprocess
import os
os.chdir(r"../models/syntaxnet")
subprocess.call([    
"echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh"
], shell = True)

which is a complete disaster - inefficient and over-complex (calling python from python should be done with python).

How can I call the python APIs directly, without going through shell scripts, standard I/O, etc?

EDIT - Why isn't this as easy as opening syntaxnet/demo.sh and reading it?

This shell script calls two python scripts (parser_eval and conll2tree) which are written as python scripts and can't be imported into a python module without causing multiple errors. A closer look yields additional script-like layers and native code. These upper layers need to be refactored in order to run the whole thing in a python context. Hasn't anyone forked syntaxnet with such a modification or intend to do so?

4

There are 4 answers

0
David Batista On

The best way to integrate SyntaxNet with your own code is to have it as a web service. I did that to parse Portuguese text.

I started by adapting an existing Docker Container with SyntaxNet and Tensorflow serving, to run only for Portuguese, to keep memory low. It runs really fast and it's easy to integrate with your code.

I did a blog post about it, and you can easily adapt it to any other language:

http://davidsbatista.net/blog/2017/07/22/SyntaxNet-API-Portuguese/

1
Steven Du On

There is a Rest API here for both syntaxnet and dragnn.

I had run them successfully on my cloud server. Some points I want to share:

  1. build docker

    sudo docker build -< ./Dockerfile

    Some error may occur when build syntaxnet, just follow the ./Dockerfile and build the docker manually, it's easy to follow.

  2. download pre-trained model

    model for syntaxnet is here, eg the Chinese model http://download.tensorflow.org/models/parsey_universal/Chinese.zip

    model for dragnn located here

    unzip them into folders eg ./synataxnet_data, so you have something like ./synataxnet_data/Chinese

  3. run and test

    3.1 Synataxnet

    run 
    
        docker run -p 9000:9000 -v ./synataxnet_data/:/models ljm625/syntaxnet-rest-api
    
    test
    
         curl -X POST -d '{ "strings": [["今天天气很好","猴子爱吃 桃子"]] }' -H "Content-Type: application/json" http://xxx.xxx.xxx.xxx:9000/api/v1/query/Chinese
    

    3.2 dragnn

    run
    
        sudo docker run -p 9001:9000 -v ./dragnn_data:/models ljm625/syntaxnet-rest-api:dragnn
    
    test
    
        http://Yourip:9001/api/v1/use/Chinse
    
        curl -X POST -d '{ "strings": ["今天 天气 很好","猴子 爱  吃 桃子"],"tree":true }' -H "Content-Type: application/json" http://xxx.xx.xx.xx:9001/api/v1/query
    

    4.test results and problems

From my testing with Chinese model, the syntaxnet is slow , it spend 3 seconds to process one query, and 9 seconds for a batch of 50 queries. There is a fixed cost for loading model.

For the dragnn model, it's fast, but I'm not satisfied with the parsing result (only test with Chinese).

PS: I don't like the way synataxnet works, like using bazel and reading data from stdin, if you want to customize it, you could find some info here

Other resource that help https://github.com/dsindex/syntaxnet/blob/master/README_api.md

0
micimize On

From what I can tell, the currently recommended way to use syntaxnet from python is via DRAGNN.

2
AKX On

All in all it doesn't look like it would be a problem to refactor the two scripts demo.sh runs (https://github.com/tensorflow/models/blob/master/syntaxnet/syntaxnet/parser_eval.py and https://github.com/tensorflow/models/blob/master/syntaxnet/syntaxnet/conll2tree.py) into a Python module that exposes a Python API you can call.

Both scripts use Tensorflow's tf.app.flags API (described here in this SO question: What's the purpose of tf.app.flags in TensorFlow?), so those would have to be refactored out to regular arguments, as tf.app.flags is a process-level singleton.

So yeah, you'd just have to do the work to make these callable as a Python API :)