Is there any NLP question answering dataset with multiple answers?

1.5k views Asked by At

I'm building a QA machine. I have a problem that one question maybe have multiple answers, and the answers are located in different position in context. For example:

Question: What does Chris have to do?

Context: ....Chris have to wash dishes....(more text)....Chris have to do his homework....

Correct answers:

  • wash dishes
  • do homework

When I got the answers out for a question, I use a clustering algorithm to deduplicate and get "separate" answers. Therefore, I need a dataset having some pair of 1 question - many answers like above to evaluate my clustering algorithm and sentence embedding model.

Is there any public dataset that support a pair of one question - multiple correct answers (not duplicated)? I tried MS MARCO but most of multiple answers in this dataset are duplicated.

2

There are 2 answers

0
Ibtesam Ahmed On

I was looking for something similar, question answering techniques or datasets with multiple non-redundant answers.

This is the dataset:https://github.com/mingzhu0527/MASHQA

and the paper : https://www.aclweb.org/anthology/2020.findings-emnlp.342.pdf[enter link description here]1

However, this paper poses the problem of QA as a sentence classification task, where the task is really to tell whether each sentence in the context answers the query or not.

Now, if your multiple answers don't span a sentence and are just phrases, I wouldn't recommend you to go for this.

0
Tary Tong On

Muc2004 is a document-level event extraction dataset, for each event role, there are multiple answers. For example,

Question: Who are the victims of the attack?

Context: ....because of Carlos Valencia Garcia's death sentence is the last night....(more text)...The assassination of Maria Elena Diaz...

Correct answers:

  • Carlos Valencia Garcia
  • Maria Elena Diaz