Find Calulated Match on Existing Data Using Levensthein Method

51 views Asked by At

I currently have a function that performs a Levensthein comparison between two values. I am interested in using this type of function in two ways.

Option 1: I would like to first go through an existing database field (Product Description) and compare all of the existing products against each other and return a calculated percent value that indicates the likelihood that a similar product already exists. The results could look something like this:

[product1]   [percentmatch]   [product2]
'Large Cup'      66.66%        'Lg Cup'

I would assume that my select statement would then loop to the next product in the table and perform the same comparison to find a similar product. Therefore, eventually I would see a duplicate result in the reverse order from what is listed above.

Option 2: I would also like to use a script to search the database for similar products when I provide a variable to search for and have the same results be shown for any product that exceeds a 50% match for example.

[My New Variable]     [percentmatch]      [existingproductname]
X-Large Cup                81.81%              Large Cup
X-Large Cup                54.54%              Lg Cup

Thank you for your help with this in advance. I look forward to seeing how I can do this.

Thanks, D

1

There are 1 answers

0
dnoeth On

You probably want to calculate percentage based on the number of edit operations vs. length of the 1st string:

100 * (1 - CAST(EDITDISTANCE(product1,product2) AS FLOAT) / CHAR_LENGTH(product1))

Option #1 will be a CROSS JOIN:

select product1
   ,100 * (1 - CAST(EDITDISTANCE(product1,product2) AS FLOAT) / CHAR_LENGTH(product1)) as percentmatch
   ,product2
from tab as t1
cross join t2
where product1 < product2  -- don't compare product with itself and don't compare two products twice
order by percent match desc

Option #2:

select 'X-Large Cup'
   ,100 * (1 - CAST(EDITDISTANCE('X-Large Cup',product) AS FLOAT) / CHAR_LENGTH('X-Large Cup')) as percentmatch
   ,product
from tab
order by percent match desc