CloudSearch fuzzy matching of whole string doesn't work

3.7k views Asked by At

I have set up an Amazon CloudSearch domain with records that hold addresses. I want to do a fuzzy text search on an address field.

Say I have a record with the following address:

1600 Amphitheatre Parkway, Mountain View, CA 94043.

If I search for 'Amphitheatre Parkway, Muntain View'~5 I get no results. I basically deleted the 'o' in "Mountain" and it doesn't find any results.

If I search for Muntain~5 it finds it, but again if I search for Miunntain~5 it doesn't find anything.

I should add I created a free text Analysis Scheme, with no stemming, stopwords or synonyms. This is what is used for the address field which is of type text.

How should I set up CloudSearch to be able to do these sort of queries?

1

There are 1 answers

0
alexroussos On BEST ANSWER
  1. Querying 'Amphitheatre Parkway, Muntain View'~5 is actually performing a fuzzy/sloppy phrase search, where it's searching for those words within 5 words of one another. I don't think that's what you intended.

  2. The Miunntain~5 query is really interesting: it does indeed return no results, but miunntain~5 (lowercase m) does:
    enter image description here

    I did notice that switching between lower and uppercase in my queries does slightly affect the match scores, so perhaps the capital M just makes it too weak a match. I don't have a good explaination for that; it's certainly counterintuitive so maybe it is a bug.

  3. Finally your actual question about setting up CloudSearch to handle those queries: unfortunately CloudSearch doesn't expose the "Did you mean..." spellcheck feature from Solr so there isn't really a good way to do this; slapping some tildas on things is about the best you can do.

See http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html