How to handle noise words (NN) in Full text search

1.2k views Asked by At

We are using Full-text search to check postcodes and addresses, we have found that NNx, where x is a number, is considered a noise word. We know the issue but can't think of a clean solution, besides checking if the search word contains NN and then using LIKE instead.

We are using this code.

SELECT  DISTINCT
    *
    FROM    CONTAINSTABLE(dbo.AddressData, Data, N'NN1', 1000) C
        INNER JOIN dbo.AddressData AD ON AD.Id = C.[KEY]
        INNER JOIN dbo.Address A ON A.Id = AD.AddressId

Does anyone have any ideas on how to escape or stop full text from seeing the NN as a noise word?

UPDATE: We added an if that checks for NN in the input string, and if NN is there we just use Like instead but this has a big effect on performance. went from less than a seconds to over 5 :/ any one have a better solution?

2

There are 2 answers

0
Alex from Jitbit On

This is a bug (?) in MS SQL server. "NN123" is how numbers are stored in the index. See this post at dba.stackexchange for more details:

https://dba.stackexchange.com/questions/316848/full-text-search-wont-find-post-codes-nnx-where-x-is-a-digit/316849#316849

0
Tim van Lint On

You are using SQL server right?

From SQL Server 2005 noise words have been replaced by stopwords. You can edit the "stoplists" so that NNx is not a stopword anymore.

https://learn.microsoft.com/en-us/sql/relational-databases/search/configure-and-manage-stopwords-and-stoplists-for-full-text-search?view=sql-server-2017