Why am I getting String where should get a dict when using pycorenlp.StanfordCoreNLP.annotate?

242 views Asked by At

I'm running this example using pycorenlp Stanford Core NLP python wrapper, but the annotate function returns a string instead of a dict, so, when I iterate over it to get each sentence sentiment value I get the following error: "string indices must be integers".

What could I do to get over it? Anyone could help me? Thanks in advance. The code is below:

from pycorenlp import StanfordCoreNLP
nlp_wrapper = StanfordCoreNLP('http://localhost:9000')
doc = "I like this chocolate. This chocolate is not good. The chocolate is delicious. Its a very 
    tasty chocolate. This is so bad"
annot_doc = nlp_wrapper.annotate(doc,
                                 properties={
                                            'annotators': 'sentiment',
                                            'outputFormat': 'json',
                                            'timeout': 100000,
                                 })
for sentence in annot_doc["sentences"]:
      print(" ".join([word["word"] for word in sentence["tokens"]]) + " => "\
            + str(sentence["sentimentValue"]) + " = "+ sentence["sentiment"])
2

There are 2 answers

0
StanfordNLPHelp On BEST ANSWER

You should just use the official stanfordnlp package! (note: the name is going to be changed to stanza at some point)

Here are all the details, and you can get various output formats from the server including JSON.

https://stanfordnlp.github.io/stanfordnlp/corenlp_client.html

from stanfordnlp.server import CoreNLPClient
with CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'], timeout=30000, memory='16G') as client:
    # submit the request to the server
    ann = client.annotate(text)
0
Lambert On

It would be great if you provide the error stack trace. The reason for this is that the annotator is meeting timeout sooner and returns a assertion message 'the text is too large..'. Its dtype is . Further, I would put more light on Petr Matuska comment. By looking at your example it is clear that your goal is to find sentiment for the sentence along with its sentiment score. The sentiment score is not found with result in using CoreNLPCLient. I faced similar issue, but i did work around which fixed this issue. If the text is large you must set the timeout value to much higher (eg., timeout = 500000). Also the annotator results in a dictionary and therefore it consumes a lot of memory. For a larger text corpus, this will be a great problem!! So it is upto us how we can handle the data structure in the code. There are alternatives such using slot, tupple or named tupple for faster access.