I have set up an Amazon CloudSearch domain with records that hold addresses. I want to do a fuzzy text search on an address field.
Say I have a record with the following address:
1600 Amphitheatre Parkway, Mountain View, CA 94043.
If I search for 'Amphitheatre Parkway, Muntain View'~5
I get no results. I basically deleted the 'o' in "Mountain" and it doesn't find any results.
If I search for Muntain~5
it finds it, but again if I search for Miunntain~5
it doesn't find anything.
I should add I created a free text Analysis Scheme, with no stemming, stopwords or synonyms. This is what is used for the address field which is of type text
.
How should I set up CloudSearch to be able to do these sort of queries?
Querying
'Amphitheatre Parkway, Muntain View'~5
is actually performing a fuzzy/sloppy phrase search, where it's searching for those words within 5 words of one another. I don't think that's what you intended.The
Miunntain~5
query is really interesting: it does indeed return no results, butmiunntain~5
(lowercase m) does:I did notice that switching between lower and uppercase in my queries does slightly affect the match scores, so perhaps the capital M just makes it too weak a match. I don't have a good explaination for that; it's certainly counterintuitive so maybe it is a bug.
Finally your actual question about setting up CloudSearch to handle those queries: unfortunately CloudSearch doesn't expose the "Did you mean..." spellcheck feature from Solr so there isn't really a good way to do this; slapping some tildas on things is about the best you can do.
See http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html