Azure-Search, partial word search don't work

1.3k views Asked by At

Searching for partial part of the phrase returns results in a strange order, for example, giving this two documents

{
            "@search.score": 0.5696786,
            "Guid": "ce73ca06-f170-46df-b0ef-a6e6e72b76ce",
            "FirstName": "Ruy",
            "LastName": "Bssaf",
            "Phone": "560523791699",
            "CustomerId": "-1",
            "CustomerEmail": "guy@twingocoil",
            "MySuperpharm": "True"
        },
        {
            "@search.score": 0.5619051,
            "Guid": "090c623f-5993-458e-93cc-8ef3d885eb29",
            "FirstName": "ruy",
            "LastName": "reffen",
            "Phone": "0522545833",
            "CustomerId": "76016443160",
            "CustomerEmail": "guy@geffenmedicalcom",
            "MySuperpharm": "False"
        }, 

and searching for "guy@twingoco" will return the second doc before the first one, although clearly one would expect to see the first one first, which have the "CustomerEmail" field almost identical to the phrase term.

The search is done inside the portal, no extra parameters except for the search term. When searching the full email the expected result does come first.

Please do not refer to this specific case of "email phrase", I'm asking in general how to make the search take in account also partial phrase.

1

There are 1 answers

2
PartlyCloudy On

This issue has to do with how Lucene handles email addresses. Azure search uses the Lucene analyzer as its default analyzer: https://lucene.apache.org/core/5_2_0/core/org/apache/lucene/analysis/Analyzer.html

The standard Lucene analyzer looks at emails as a single token, that is why the partial search will not create a hit for you. (Similarly to if you search for "car" you will not get a hit for "careful" even though it is a prefix). More on this issue is explained here: Querying email addresses indexed by lucene

The good news is that you can create a custom tokanizer that will help you address this issue: Check the accepted answer Using Lucene to search for email addresses to see an approach how to implement such a tokenizer, and see this documentation by Azure search to see how to use custom analyzers: https://azure.microsoft.com/en-gb/blog/custom-analyzers-in-azure-search

Good luck!