Run ner.manual in Prodigy on csv file

489 views Asked by At

I am new to Prodigy and haven't fully figured out the paradigm. For a project, I would like to manually annotate names from texts. My team has developed our own model to recognize the names, so I only want to use the annotated texts (produced with Prodigy) as a golden standard for our model.

To do so, I have a csv file texts.csv with the text in one of the columns. Do I need to convert this file into a json, or can I also run Prodigy on the csv file?

Also, what is the code that I need to run to start the ner_manual with this dataset?

I suppose, I have to start with:

!python -m prodigy ner.manual

However, it is unclear to me how I should run the rest. Can someone help me with this?

1

There are 1 answers

0
yoghurt On

File Format

I believe for the recipes that say "Text Source" you can use jsonl, json, csv, or txt (reference the section that says "Text Source": https://prodi.gy/docs/api-loaders). Ner.manual says "Text Source" so I think it should work. (reference: https://prodi.gy/docs/recipes#ner-manual)

ner.manual

In regards to running ner.manual try taking a look at this documentation https://prodi.gy/docs/

The documentation contains a good example:

python -m prodigy ner.manual ner_news_headlines blank:en ./news_headlines.jsonl --label PERSON,ORG,PRODUCT,LOCATION

  1. ner_news_headlines is the name of the dataset (it could be named anything)
  2. blank:en is a blank english model
  3. ./news_headlines.jsonl is the name of the jsonl file that you will be annotating (use whatever file name your file is)
  4. PERSON,ORG,PRODUCT,LOCATION are the labels that you will annotate your data with (change these to whatever labels you want to use, be sure to separate with commas not spaces)

I'm also pretty new to prodigy so someone else may have a better answer.