KeyError: 'answers' error when using BioASQ dataset using Huggingface Transformers

Question

KeyError: 'answers' error when using BioASQ dataset using Huggingface Transformers

466 views Asked by user8720570 At 31 March 2020 at 01:32

I am using run_squad.py https://github.com/huggingface/transformers/blob/master/examples/run_squad.py from Huggingface Transformers for fine-tuning on BioASQ Question Answering dataset.

I have converted the tensorflow weights provided by the authors of BioBERT https://github.com/dmis-lab/bioasq-biobert to Pytorch as discussed here https://github.com/huggingface/transformers/issues/312.

Further, I am using the preprocessed data of BioASQ https://github.com/dmis-lab/bioasq-biobert which is converted to the SQuAD form. However, when I am running the run_squad.py script with the below parameters

 --model_type bert \
  --model_name_or_path /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/BioBERTv1.1-SQuADv1.1-Factoid-PyTorch/ \
  --do_train \
  --do_eval \
  --save_steps 1000 \
  --train_file $data/BioASQ-train-factoid-6b.json \
  --predict_file $data/BioASQ-test-factoid-6b-1.json \
  --per_gpu_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/QA_output_squad/BioASQ-factoid-6b/BioASQ-factoid-6b-1-issue-23mar/


I get the below error:

03/23/2020 12:53:12 - INFO - transformers.modeling_utils -   loading weights file /scratch/oe7/uk1594/BioBERT/BioBERT-PyTorch/QA_output_squad/BioASQ-factoid-6b/BioASQ-factoid-6b-1-issue-23mar/pytorch_model.bin
03/23/2020 12:53:15 - INFO - __main__ -   Creating features from dataset file at .

  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run_squad.py", line 856, in <module>
    main()
  File "run_squad.py", line 845, in main
    result = evaluate(args, model, tokenizer, prefix=global_step)
  File "run_squad.py", line 299, in evaluate
    dataset, examples, features = load_and_cache_examples(args, tokenizer, evaluate=True, output_examples=True)
  File "run_squad.py", line 475, in load_and_cache_examples
    examples = processor.get_dev_examples(args.data_dir, filename=args.predict_file)
  File "/scratch/oe7/uk1594/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 522, in get_dev_examples
    return self._create_examples(input_data, "dev")
  File "/scratch/oe7/uk1594/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 549, in _create_examples
    answers = qa["answers"]
KeyError: 'answers'

Really appreciate your help.

Thanks a lot for your guidance.

The evaluaton dataset is looks like this:

{
  "version": "BioASQ6b", 
  "data": [
    {
      "title": "BioASQ6b", 
      "paragraphs": [
        {
          "context": "emMAW: computing minimal absent words in external memory. Motivation: The biological significance of minimal absent words has been investigated in genomes of organisms from all domains of life. For instance, three minimal absent words of the human genome were found in Ebola virus genomes",
          "qas": [
            {
              "question": "Which algorithm is available for computing minimal absent words using external memory?", 
              "id": "5a6a3335b750ff4455000025_000"
            }
          ]
        }
    ]
}
]
}

Original Q&A

There are 1 answers

**Abdullah Bashir** · Answer 1 · 2021-01-25T09:59:30+00:00

Abdullah Bashir On 25 January 2021 at 09:59

The BioASQ evaluation files are test files that don't contain answers, only used for predictions. for evaluation during training you can use a portion of the training files

TechQA.

KeyError: 'answers' error when using BioASQ dataset using Huggingface Transformers

There are 1 answers

Related Questions in TENSORFLOW

Related Questions in PYTORCH

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in NLP-QUESTION-ANSWERING

Related Questions in SQUAD

Popular Questions

Popular Tags

Trending Questions