Lucene documents scoring/ranking with regex query

255 views Asked by At

I am using Azure Search, but suppose my question is more relevant to Lucene. Can't find any information of how documents' ranks (scores) are being calculated when a query fully of partly consists of regex. For example:

Searching for "microsoft" returns normally calculated scores:

{ score: 6.088776, name: "Microsoft Research" }
{ score: 5.9090853, name: "Microsoft Corporation" }
{ score: 5.0747375, name: "Microsoft Philippines, Inc." }
{ score: 4.93202, name: "Microsoft Dynamics, Inc." }

When searching for "/.micro./" returns with scores equal to 1:

{ score: 1, name: "Microsoft Dynamics, Inc." }
{ score: 1, name: "Microsoft Philippines, Inc." }
{ score: 1, name: "Microsoft Startup Alley" }

And searhing for "microsoft /.micro./", returns I suppose sum of "microsoft" term score and /.micro./ term score (always equals to 1):

{ score: 5.2132897, name: "Microsoft Research" }
{ score: 5.198583, name: "Microsoft Corporation" }
{ score: 4.973414, name: "Microsoft Philippines, Inc." }

What I need is to run fully regex query and have calculated scores.

1

There are 1 answers

0
Nate Ko On BEST ANSWER

In Azure Search, wildcard search queries like prefix, regex and fuzzy search queries go through an internal query rewriting process and return constant scores. This is mainly due to performance reasons and also to prevent our default term-frequency based scoring (TF-IDF) from biasing towards matches from less frequent unique terms. The behavior is documented in https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search#bkmk_searchscoreforwildcardandregexqueries. There currently isn't a way to change this default behavior. If you feel that the feature is important, please create an entry in our user voice (https://feedback.azure.com/forums/263029-azure-search) to help us prioritize. Thank you.

Nate