While looking through the Information Retrieval Resarch, I stumbled upon two metrices. Namely Accuracy@k, and Success@k. However, I have hard time understanding the difference between them.
Success@k is defined as:
Specifically, we award a point to the system for each query where it finds an accepted or upvoted (score ≥ 1) answer from the target page in the top-5 hits. https://arxiv.org/pdf/2112.01488.pdf
Similar to Chen et al. (2017), Guu et al. (2020), and Karpukhin et al. (2020), we study the retrieval quality in OpenQA by reporting Success@k (S@k for short), namely, the percentage of questions for which a retrieved passage (up to depth k) contains the gold answer string. https://arxiv.org/pdf/2007.00814.pdf
And Accuracy@k I am using comes from the BEIR resarch. The repo can be found here: https://github.com/beir-cellar/beir/blob/main/beir/retrieval/custom_metrics.py
def top_k_accuracy(
qrels: Dict[str, Dict[str, int]],
results: Dict[str, Dict[str, float]],
k_values: List[int]) -> Tuple[Dict[str, float]]:
top_k_acc = {}
for k in k_values:
top_k_acc[f"Accuracy@{k}"] = 0.0
k_max, top_hits = max(k_values), {}
logging.info("\n")
for query_id, doc_scores in results.items():
top_hits[query_id] = [item[0] for item in sorted(doc_scores.items(), key=lambda item: item[1], reverse=True)[0:k_max]]
for query_id in top_hits:
query_relevant_docs = set([doc_id for doc_id in qrels[query_id] if qrels[query_id][doc_id] > 0])
for k in k_values:
for relevant_doc_id in query_relevant_docs:
if relevant_doc_id in top_hits[query_id][0:k]:
top_k_acc[f"Accuracy@{k}"] += 1.0
break
for k in k_values:
top_k_acc[f"Accuracy@{k}"] = round(top_k_acc[f"Accuracy@{k}"]/len(qrels), 5)
logging.info("Accuracy@{}: {:.4f}".format(k, top_k_acc[f"Accuracy@{k}"]))
return top_k_acc
So, the question is.
Are they the same?