I am using Lucene's Highlighter class to highlight fragments of matched search results and it works well. I would like to switch from searching with the StandardAnalyzer to the EnglishAnalyzer, which will perform stemming of terms.
The search results are good, but now the highlighter doesn't always find a match. Here's an example of what I'm looking at:
document field text 1: Everyone likes goats.
document field text 2: I have a goat that eats everything.
Using the EnglishAnalyzer and searching for "goat", both documents are matched, but the highlighter is only able to find a matched fragment from document 2. Is there a way to have the highlighter return data for both documents?
I understand that the characters are different for the tokens, but the same tokens are still there, so it seems reasonable for it to just highlight whatever token is present at that location.
If it helps, this is using Lucene 3.5.
I found a solution to this problem. I changed from using the
Highlighter
class to using theFastVectorHighlighter
. It looks like I'll pick up some speed improvements too (at the expense of storage of term vector data). For the benefit of anyone coming across this question later, here's a unit test showing how this all works together: