Watson Discovery deciphering passage score and result score

1.3k views Asked by At

I am trying to decipher what the passage_score and result[].score mean (in percentile terms) in the discovery results. This is so that we can filter out passages and results which do not meet a minimum confidence threshold.

For example in this result set:

{
...
"passages": [
    {
        "document_id": "AA",
        "passage_score": 14.303232050575723,
        ...
    },
    {
        "document_id": "BB",
        "passage_score": 14.089714658115533,
        ...
    }
],
"results": [
    {
        "id": "AA",
        "score": 1.5188946,
        ...
    },
    {
        "id": "BB",
        "score": 1.5188946,
        ...
    }
]

}

how would I convert the scores into a percentile equivalent for comparison ? In RnR, I used to do this using the ranker.confidence field.

2

There are 2 answers

2
Sayuri Mizuguchi On

According to Official Documentation about Watson Discovery, the passages are generated by sophisticated Watson algorithms to determine the best passages of text from all of the documents returned by the query.

I think maybe you can use the highlight parameter, highlight: A boolean that specifies whether the returned output includes a highlight object in which the keys are field names and the values are arrays that contain segments of query-matching text highlighted by the HTML * tag.

Or the top_hits parameter: Returns the documents ranked by the score of the query or enrichment. Can be used with any query parameter or aggregation. This example returns the 10 top hits for a term aggregation.

  • Check the list inside the Query Building reference about queries with Discovery.
  • Check these: article 1, article 2 using Watson Discovery with more examples.
  • Playlist by IBM using Watson Discovery.
0
Will Chaparro On

The passages score and the document score is not a confidence score, nor is it a normalized score. Its a score that is calculated based on the query and how "good" the documents are related to the query that the user submitted.

It would not be correct to compare scores between multiple different queries, and normalization, while it can be done, is not appropriate to do so with the score we generate. You could attempt to normalize the scores, but any normalization factor you come up with will be thrown off if you add or delete documents from your index.

The score calculation is completely dependent upon the documents and the relevance of those documents to the specific query. In other words, its calculated based on term frequencies (how often the word appears) in the documents, as well as some other sophisticated algorithm adjustments made to the score. It is a score that is specific to the query, and is calculated using an algorithm that is trying to predict the "likelihood" that the document is most relevant to the query. Its not a normalized score.

I would instead recommend using top n documents as a more reasonable threshold, where n is the max number of documents you return to the user. Passages uses additional algorithms that are also focused on generating the best passages for that particular query. The score is again, calculated specific to the query.

There are plans to improve scores in the future for re-ranked documents.