This is a question for Azure Cognitive Search team.
Currently we are constantly facing issues with Hit-Highlighting mechanism in Azure Cognitive Search. Maximum size of the highlight is limited to 1000 characters, and can not be increased using API parameters.
The problem is that fairly often we see a highlights without any keywords highlighted in them at all, and the length of this 'highlights' is 1,000, and it is very likely that they were just cropped to fit 1,000 characters limit. Thus, there is no much sense for our users to see highlight, if hits are not actually highlighted.
What is the point of trimming the highlight without any logic behind it? Because sometimes we are even facing situations when the highlight was cropped right in the middle of the match, in other words the highlight ends with text: ' ... some highlighted text [match]keyword[/ma'. As you can see closing tag was cropped, and we see '[/ma' instead of '[/match]'.
How do you expect somebody to use this? ... Is there any workaround?
I am an engineer on the Azure Cognitive Search team. We are aware of these edge cases with the highlight trimming and apologize for the negative impact on your use-case. This is a recent change intended to serve as a stop-gap measure against service stability issues arising from highlighting extremely large fragments.
We are working on upgrading hit highlighting experience overall and it will be available to the customers from 15th July, 2020. More details can be found here. However the new experience is only enabled for services that are created after that day. For older services, the only workaround at the moment is to pre-process the field text such that the length of each sentence (highlighting boundary) is less than 1000.
Feel free to reach out to the PG at
[email protected]
with more details about your scenario and we will try our best to alleviate your issues.