I am new to Prodigy and haven't fully figured out the paradigm. For a project, I would like to manually annotate names from texts. My team has developed our own model to recognize the names, so I only want to use the annotated texts (produced with Prodigy) as a golden standard for our model.
To do so, I have a csv file texts.csv
with the text in one of the columns. Do I need to convert this file into a json, or can I also run Prodigy on the csv file?
Also, what is the code that I need to run to start the ner_manual
with this dataset?
I suppose, I have to start with:
!python -m prodigy ner.manual
However, it is unclear to me how I should run the rest. Can someone help me with this?
File Format
I believe for the recipes that say "Text Source" you can use jsonl, json, csv, or txt (reference the section that says "Text Source": https://prodi.gy/docs/api-loaders). Ner.manual says "Text Source" so I think it should work. (reference: https://prodi.gy/docs/recipes#ner-manual)
ner.manual
In regards to running ner.manual try taking a look at this documentation https://prodi.gy/docs/
The documentation contains a good example:
python -m prodigy ner.manual ner_news_headlines blank:en ./news_headlines.jsonl --label PERSON,ORG,PRODUCT,LOCATION
I'm also pretty new to prodigy so someone else may have a better answer.