I'm currently doing NER with 3 Labels:
- PERSON
- PHONE
- ADDRESS
I am able to train my model with python code but I want to use CLI Training which gives more flexibility.
I have converted my data to spacy offset training format which looks like :
[
["Bonjour\r\n\r\n\r\n\r\ncordialement, Thomas\r\n\r\n tel 0102030405",{"entities": [[70,79,"PHONE"],[56,61,"PER"]]}]
]
In order to use CLI to train/Evaluate my model I need to transform these data to a Gold format.
I'm already aware of below methods but it needs an existing nlp to be used:
doc = nlp(text)
tags = biluo_tags_from_offsets(doc, offsets)
My Question is : How can I convert spacy offset to gold if I need to create a model with specific LABELS.
You only need the model here for tokenization and sentence segmentation, so it would also work to say: