I am using Solr to do a fuzzy search (e.g., foo~2 bar~2
). Highlighting allows me to highlight matching document fragments from the resultset.
For example:
Result 1: <em>food</em> <em> bars</em>
Result 2: mars <em>bar</em>
and so on.
For each match highlighted from the document, I need to figure out which query terms did these fragments matched against along with offsets of those query terms in the query. Something like:
Result 1: {<em>food</em> MATCHED_AGAINST foo QUERY_OFFSET 0,2} {<em> bars</em> MATCHED_AGAINST bar QUERY_OFFSET 3,5}
Result 2: mars {<em>bar</em> MATCHED_AGAINST bar QUERY_OFFSET 3,5}
Is there a way to do this in Solr?
One hack I could figure out is to use different (unique)
boost
factors for each term in the query, and then retrieving boost factors for each matched term from thedebug
score so as to deduce which term that score came from.For example, we can query with
foo~2^3.0 bar~2^2.0
(boost scores from bar by 2.0, keep scores from matching against foo untouched). From the debug score output, check the boost factors:From which it is clear that
food
matched with boost factor of3.0
, andbars
as well asbar
matched with boost factor of2.0
. Maintaining a lookup dictionary for which term had what boost to begin with, it is easy to figure out which terms matched.Two factors to consider:
1.0
, solr debug score does not print it.Hope this helps someone.