Can anyone improve on the below Fuzzyfind function for VBA?

1.4k views Asked by At

This function lets you find similar strings from a range without having to do an exact search.

The formula looks like this: =FuzzyFind(A1,B$1:B$20) assuming the string you are performing a search for is in A1 and your reference or options table is B1:B20

The code is here:

Function FuzzyFind(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
For Each cell In tbl_array
  str = cell
  For i = 1 To Len(lookup_value)
    If InStr(cell, Mid(lookup_value, i, 1)) > 0 Then
      a = a + 1
      cell = Mid(cell, 1, InStr(cell, Mid(lookup_value, i, 1)) - 1) & Mid(cell, InStr(cell, Mid(lookup_value, i, 1)) + 1, 9999)
    End If
  Next i
  a = a - Len(cell)
  If a > b Then
    b = a
    Value = str
  End If
  a = 0
Next cell
FuzzyFind = Value
End Function

The results from this function are hit and miss. Can anyone improve the intelligence of this algorithm?

Thank you :)

3

There are 3 answers

3
Holmes IV On BEST ANSWER

Try this out, I think it will find the best match

Function FuzzyFind2(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
Dim Found As Boolean

b = 0
For Each cell In tbl_array
  str = cell
  i = 1
  Found = True
  Do While Found = True
    Found = False
    If InStr(i, str, lookup_value) > 0 Then
        a = a + 1
        Found = True
        i = InStr(i, str, lookup_value) + 1
    End If
  Loop

  If a > b Then
    b = a
    Value = str
  End If
  a = 0
Next cell
FuzzyFind2 = Value
End Function
0
user1274820 On

I'm not sure exactly what "FuzzyFind" entails, but this is a VLOOKUP that uses the Levenshtein distance to find similar data.

The Levenshtein distance lets you select a "percentage match" that you can specify instead of the typical TRUE or FALSE from a normal VLOOKUP:

Usage is: DTVLookup(A1,$C$1:$C$100,1,90) where 90 is the Levenshtein Distance.

DTVLookup(Value To Find, Range to Search, Column to Return, [Percentage Match])

I typically use this when comparing names that come from different databases like:

Correct Name    Example Lookup  Percentage Match    Other Report
John S Smith    John Smith      83                  John Smith
Barb Jones      Barbara Jones   77                  Barbara Jones
Jeffrey Bridge  Jeff Bridge     79                  Jeff Bridge
Joseph Park     Joseph P. Park  79                  Joseph P. Park
Jefrey Jones    jefre jon       75                  jefre jon
Peter Bridge    peter f. bridge 80                  peter f. bridge

Here's the code:

Function DTVLookup(TheValue As Variant, TheRange As Range, TheColumn As Long, Optional PercentageMatch As Double = 100) As Variant
If TheColumn < 1 Then
    DTVLookup = CVErr(xlErrValue)
    Exit Function
End If
If TheColumn > TheRange.Columns.Count Then
    DTVLookup = CVErr(xlErrRef)
    Exit Function
End If
Dim c As Range
For Each c In TheRange.Columns(1).Cells
    If UCase(TheValue) = UCase(c) Then
        DTVLookup = c.Offset(0, TheColumn - 1)
        Exit Function
    ElseIf PercentageMatch <> 100 Then
        If Levenshtein3(UCase(TheValue), UCase(c)) >= PercentageMatch Then
            DTVLookup = c.Offset(0, TheColumn - 1)
            Exit Function
        End If
    End If
Next c
DTVLookup = CVErr(xlErrNA)
End Function

Function Levenshtein3(ByVal string1 As String, ByVal string2 As String) As Long

Dim i As Long, j As Long, string1_length As Long, string2_length As Long
Dim distance(0 To 60, 0 To 50) As Long, smStr1(1 To 60) As Long, smStr2(1 To 50) As Long
Dim min1 As Long, min2 As Long, min3 As Long, minmin As Long, MaxL As Long

string1_length = Len(string1):  string2_length = Len(string2)

distance(0, 0) = 0
For i = 1 To string1_length:    distance(i, 0) = i: smStr1(i) = Asc(LCase(Mid$(string1, i, 1))): Next
For j = 1 To string2_length:    distance(0, j) = j: smStr2(j) = Asc(LCase(Mid$(string2, j, 1))): Next
For i = 1 To string1_length
    For j = 1 To string2_length
        If smStr1(i) = smStr2(j) Then
            distance(i, j) = distance(i - 1, j - 1)
        Else
            min1 = distance(i - 1, j) + 1
            min2 = distance(i, j - 1) + 1
            min3 = distance(i - 1, j - 1) + 1
            If min2 < min1 Then
                If min2 < min3 Then minmin = min2 Else minmin = min3
            Else
                If min1 < min3 Then minmin = min1 Else minmin = min3
            End If
            distance(i, j) = minmin
        End If
    Next
Next

' Levenshtein3 will properly return a percent match (100%=exact) based on similarities and Lengths etc...
MaxL = string1_length: If string2_length > MaxL Then MaxL = string2_length
Levenshtein3 = 100 - CLng((distance(string1_length, string2_length) * 100) / MaxL)

End Function
0
Orange_guy On

I've been looking for this theme a lot and definitely Holmes IV answer is the best. I would just add a small update to compare always in uppercase. For my problems it recommended me more accurate options.

Function FuzzyFind3(lookup_value As String, tbl_array As Range) As String
Dim i As Integer, str As String, Value As String
Dim a As Integer, b As Integer, cell As Variant
Dim Found As Boolean

b = 0
For Each cell In tbl_array
  str = UCase(cell)
  i = 1
  Found = True
  Do While Found = True
    Found = False
    If InStr(i, str, UCase(lookup_value)) > 0 Then
        a = a + 1
        Found = True
        i = InStr(i, str, UCase(lookup_value)) + 1
    End If
  Loop

  If a > b Then
    b = a
    Value = str
  End If
  a = 0
Next cell
FuzzyFind3 = Value