I have been searching for a while now but haven't found any solution to my problem. For a relation classification task I have annotated several news like text documents with prodigy annotation software. Prodigy outputs the format in a JSONL file that can be converted into a .spacy file. In the JSONL format, each line represents one news article with its annotations.
Now I want to convert my annotations into a more standardized format like CONLL, so that I can work with my annotations with other open source software like Inception (Unfortunatly Prodigy has not been a good choice). Unfortunatly, I haven't found any lib, script or tool that can convert prodigy Jsonl/Spacy to CONLL.
Here is an example, how the prodigy JSONL format looks like:
{
"text": "My mother’s name is Sasha Smith. She likes dogs and pedigree cats.",
"tokens": [
{"text": "My", "start": 0, "end": 2, "id": 0, "ws": true},
{"text": "mother", "start": 3, "end": 9, "id": 1, "ws": false},
{"text": "’s", "start": 9, "end": 11, "id": 2, "ws": true},
{"text": "name", "start": 12, "end": 16, "id": 3, "ws": true },
{"text": "is", "start": 17, "end": 19, "id": 4, "ws": true },
{"text": "Sasha", "start": 20, "end": 25, "id": 5, "ws": true},
{"text": "Smith", "start": 26, "end": 31, "id": 6, "ws": true},
{"text": ".", "start": 31, "end": 32, "id": 7, "ws": true, "disabled": true},
{"text": "She", "start": 33, "end": 36, "id": 8, "ws": true},
{"text": "likes", "start": 37, "end": 42, "id": 9, "ws": true},
{"text": "dogs", "start": 43, "end": 47, "id": 10, "ws": true},
{"text": "and", "start": 48, "end": 51, "id": 11, "ws": true, "disabled": true},
{"text": "pedigree", "start": 52, "end": 60, "id": 12, "ws": true},
{"text": "cats", "start": 61, "end": 65, "id": 13, "ws": true},
{"text": ".", "start": 65, "end": 66, "id": 14, "ws": false, "disabled": true}
],
"spans": [
{"start": 20, "end": 31, "token_start": 5, "token_end": 6, "label": "PERSON"},
{"start": 43, "end": 47, "token_start": 10, "token_end": 10, "label": "NP"},
{"start": 52, "end": 65, "token_start": 12, "token_end": 13, "label": "NP"}
],
"relations": [
{
"head": 0,
"child": 1,
"label": "POSS",
"head_span": {"start": 0, "end": 2, "token_start": 0, "token_end": 0, "label": null},
"child_span": {"start": 3, "end": 9, "token_start": 1, "token_end": 1, "label": null}
},
{
"head": 1,
"child": 8,
"label": "COREF",
"head_span": {"start": 3, "end": 9, "token_start": 1, "token_end": 1, "label": null},
"child_span": {"start": 33, "end": 36, "token_start": 8, "token_end": 8, "label": null}
},
{
"head": 9,
"child": 13,
"label": "OBJECT",
"head_span": {"start": 37, "end": 42, "token_start": 9, "token_end": 9, "label": null},
"child_span": {"start": 52, "end": 65, "token_start": 12, "token_end": 13, "label": "NP"}
}
]
}
Thanks in advance
I want to to convert either the prodigy jsonl into CONLL or the .spacy annotation file into conll
You can load in your spaCy Docs from the
.spacy
file and use spacy-conll to dump them as CoNLL files.