Match byte spans from an annotation into a text document, Python or Java

Question

Match byte spans from an annotation into a text document, Python or Java

147 views Asked by user2587333 At 16 July 2013 at 12:42

I'm using the MPQA opinion corpus in which annotations and documents are saved in separate files. The annotation files contain character offsets (byte spans) into the documents
e.g. 850,861

string  GATE_direct-subjective   
expression-intensity="medium"
attitude-link="a4"
nested-source="w, patient" 
intensity="medium" 
polarity="negative"

How can I match these byte spans into the text document? I'm grateful for any ideas! I prefer using Python but a solution in Java is also fine.

Original Q&A

There are 1 answers

**GrantD71** · Answer 1 · 2013-07-16T17:47:07+00:00

I'm not 100% sure I'm understanding the question properly, but if you need a substring and you have character positions the solution is simple.

Python solution:

>>> sometext = "Grant D is a great guy."
>>> character_offset = [0, 7]
>>> subString = sometext[character_offset[0]:character_offset[1]]
>>> print subString
Grant D
>>>

TechQA.

Match byte spans from an annotation into a text document, Python or Java

There are 1 answers

Related Questions in JAVA

Related Questions in PYTHON

Related Questions in NLP

Related Questions in BYTE

Related Questions in TAGGED-CORPUS

Popular Questions

Popular Tags

Trending Questions