Azure Search Highlight Partial Match

2.1k views Asked by At

I have turned Hit Highlighting on and it is working well for entire word matches. But we append a wildcard character at the end of each word the user specifies and highlighting is not working on the partial matches. We are getting the results back, but the .Highlights object is null so no highlighting is available for partial matching.

Here is how we configure the SearchParameters:

var parameters = new SearchParameters
{
    Filter = newFilter,
    QueryType = QueryType.Full,
    Top = recordsPerPage,
    Skip = skip,
    SearchMode = SearchMode.Any,
    IncludeTotalResultCount = true,
    HighlightFields = new List<string> { "RESULT" },
    HighlightPreTag = "<font style=\"color:blue; background-color:yellow;\">",
    HighlightPostTag = "</font>"
};
return parameters;

response = indexClient.Documents.Search<SearchResultReturn>(query, parameters);

Here is an example of our query string: ("the") the*^99.95

The idea is we search for the exact string the user specified (multiple words) and then we do a wild-card search for each individual word specified.

So for the above example we are getting all the results that contain "the" and "the*" but only the words "the" have the highlighting. "They", "There", etc do not have any highlighting even if "They" is the only matching entry in the result ("the" was not in the result).

Again the query is bringing back the correct results, it's just the highlighting is not working for partial matches.

Is there some other setting I need to be able to highlight partial matches?

2

There are 2 answers

5
77Vetter On

Thanks for the reply, but it doesn't seem to be the issue, it seems to be an issue with the Boosting function I have on the search.

When I removed the boosting function then partial highlighighting worked as expected. When I added the boosting function back in partial highlighting stopped working. Can you verify that is a bug?

Here is my boosting function:

"scoringProfiles":[{"name":"PreRiskBoost",
                    "text":null,"functions":     
                     [{"fieldName":"PreRiskCount", 
                      "freshness":null, 
                      "interpolation":"linear",
                      "magnitude":{"boostingRangeStart":1,
                                   "boostingRangeEnd":99,
                                   "constantBoostBeyondRange":true},
                      "distance":null,
                      "tag":null,
                      "type":"magnitude","boost":10}],
                      "functionAggregation":"sum"}],
                      "defaultScoringProfile":"PreRiskBoost"

Do you know why having the Boosting function prevents partial highlighting from working?

1
Nate Ko On

Thanks for reporting the issue.

Unfortunately, it is a known limitation in Azure Search that matches are sometimes not highlighted for broad wildcard search. Highlighting is an independent process after search. Once matching documents are retrieved, the highlighter looks up the search index for all terms that match the wildcard criteria, and use the terms in highlighting the retrieved documents. For broad wildcard search queries, like a* (or the*), the highlighter only uses the top N most significant terms based on their frequencies in the corpus for performance reasons. In your example, 'they' and 'there' are not included in the highlights probably because their appearances in most documents.

As this is a limitation in wildcard queries, one workaround is to preprocess the index to avoid issuing wildcard/prefix queries. Please take a look at custom analysis (https://learn.microsoft.com/en-us/rest/api/searchservice/custom-analyzers-in-azure-search) You can, for example, use edgeNgram tokenfilter and store prefixes of words in the index and issue a regular term query with the prefix (with out the '*' operator)

Hope this helps. Please let me know if you have any further questions.

Nate