Goal: execute fuzzy search, then wildcard search with those similar terms
I have a boolean query in place at the moment, shown below:
$query = new Zend_Search_Lucene_Search_Query_Boolean();
$pattern = new Zend_Search_Lucene_Index_Term("*$string*");
$subquery1 = new Zend_Search_Lucene_Search_Query_Wildcard($pattern);
$term = new Zend_Search_Lucene_Index_Term("$string");
$subquery2 = new Zend_Search_Lucene_Search_Query_Fuzzy($term);
$query->addSubquery($subquery1, null /* optional */);
$query->addSubquery($subquery2, null /* optional */);
$hits = $index->find($query);
This seems to be executing an either/or search. For example: if I search for the term
"berry"
I hit everything with "berry" anywhere in the title
berry, wild berry, strawberry, blueberry
But if I search for
"bery"
I only hit results like
berry
I'm not exactly sure how the fuzzy search is powered. Is there a way to modify my query so that I can wildcard search after the fuzzy search returns the similar terms?
I suspect that field is not analyzed when indexed.
So, with the first query, you are getting hits from the wildcard query.
*berry*matches all of the examples you've given.*bery*doesn't match any of the documents, though, since it's not actually a substring of any of them.For the fuzzy query, terms are compared by edit distance (Damerau–Levenshtein distance). An edit distance of two is the default maximum for a match.
berytoberry- edit distance: 1berytowild berry- edit distance: 6berytostrawberry- edit distance: 6berytoblueberry- edit distance: 5This could be handled in part by using an analyzer, instead of indexing the entire string as a single token. Standard analyzer would split
wild berryup into the tokenswildandberry, and you could expect a fuzzy match on that.As far as strawberry and blueberry, unless your analyzer splits apart
strawandberrysomehow, you could manually specify terms to split apart by incorporating aSynonymFilterinto your analyzer.Another option would be to attempt to correct the query spelling before searching, using lucene's
SpellChecker