How to store relation data in ConLL-U format

47 views Asked by At

I have a CSV dataset which has four columns: "sentence", "term1", "term2", and "relation". The "sentence" column provides a sentence where there is a relation between term1 and term2. I then apply stanza.Pipeline() from stanza library to process this dataset and I would like to store it in CoNLL-U format. Later on, this dataset will be used to train a model which can extract triples of a form <term1><relation type><term2> given a sentence.

What is the best practice for storing the term1, term2 and relation information in the ConLL-U format?

For example, given this row of data, where should the annotation for term1, term2 and relation be included in the CoNLL-U format?

A row from the CSV file:

"sentence", "term1", "term2", "relation"
"Ibuprofen helps with headaches.", "Ibuprofen", "headaches", "treat"

Is it fine to add this information in the miscellaneous field like below (tag=term1|relation=treat)?

# text = Ibuprofen helps with headaches.
# sent_id = 0
# constituency = (ROOT (S (NP (NNP Ibuprofen)) (VP (VBZ helps) (PP (IN with) (NP (NNS headaches)))) (. .)))
# sentiment = 0
1   Ibuprofen   Ibuprofen   PROPN   NNP Number=Sing 2   nsubj   _   tag=term1|relation=treat|start_char=0|end_char=9|ner=O
2   helps   help    VERB    VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   0   root    _   start_char=10|end_char=15|ner=O
3   with    with    ADP IN  _   4   case    _   start_char=16|end_char=20|ner=O
4   headaches   headache    NOUN    NNS Number=Plur 2   obl _   tag=term2|relation=treat|start_char=21|end_char=30|ner=O
5   .   .   PUNCT   .   _   2   punct   _   start_char=30|end_char=31|ner=O

0

There are 0 answers