Excel word matching

2.8k views Asked by At

Hi there I'm hoping someone here can help me out.

I've got some pretty untidy data and I am looking to do a VLOOKUP / INDEX MATCH sort of function but having no luck as the data is quite untidy.

So basically I've got a list of names and there websites in one sheet.

Then a second sheet with just a list of names that I want to populate with the websites from the other sheet as to cut down the work load.

The problem is the company names are not entered in the correct format (i.e for an abbreviated company spaces are used instead of .'s). Another example is when there is multiple branches of a company.

So worksheet 1 would be

Company Name: Company
Website: www.company.com

Worksheet 2 would be: Company Name : Company (UK) Company Name : Company (USA)

Where both should have the same website as found in worksheet 1 is there any possible way to do this. I have tried VLOOKUP and INDEX/MATCH and set the condition to TRUE but it does not return good results.

Thanks in advance for any advice.

3

There are 3 answers

0
Trum On

Fuzzy matching is a far from an exact science, particularly when it comes to the built in Excel functions.

If I was to recommend the safest bet - I would say create a de-duped list of the values in sheet 2 and create a matching lookup value for sheet one. This would certainly be the most accurate (but time consuming and depending on how many times you will use these values)

Alternatively, there are 'fuzzy matches' from various sources that can be used. One of the famous ones is the 'Jaro Winkler Distance' : a version of which can be found at this link: http://garonfolo.dk/herbert/2013/07/excel-vba-jaro-winkler-distance-fuzzy-matching/

0
Kalev Maricq On

Without seeing the data it's impossible to say for sure, but here are some options.
1. You could use MATCH("*Company*",Sheets1!A:A,0).
2. If there is always the same pattern, you could create a helper column using LEFT, RIGHT, and MID to extract the company name.
3. VBA offers some support for regular expressions, using Set reg = CreateObject("VBScript.RegExp") If you know how to use regular expressions, you could write a macro to match based on it.

0
AutomotiveIndustrialEngineer On

To expand on the answer by Trum because the link he posted is no longer available, I have translated the JaroWinklerProximity algorithm found in this C# implementation https://stackoverflow.com/a/19165108/8031589 by user leebickmtu to VBA for Excel:

Option Base 1

Function JaroWinklerProximity(String1 As Range, String2 As Range) As Double

Dim mWeightThreshold As Double
mWeightThreshold = 0.7
Dim mNumChars As Integer
mNumChars = 4
Dim aString1 As String
aString1 = LCase(String1.Text)
Dim aString2 As String
aString2 = LCase(String2.Text)
Dim lLen1 As Integer
lLen1 = Len(aString1)
Dim lLen2 As Integer
lLen2 = Len(aString2)

If lLen1 = 0 Then
    If lLen2 = 0 Then
        JaroWinklerProximity = 1
        Exit Function
    Else
        JaroWinklerProximity = 0
        Exit Function
    End If
End If
    
Dim lSearchRange As Integer
lSearchRange = WorksheetFunction.Max(1, WorksheetFunction.Max(lLen1, lLen2) / 2)

ReDim lMatched1(lLen1) As Boolean
ReDim lMatched2(lLen2) As Boolean
Dim lNumCommon As Integer
lNumCommon = 0

Dim i As Integer
For i = 1 To lLen1 Step 1
    Dim lStart As Integer
    lStart = WorksheetFunction.Max(1, i - lSearchRange)
    Dim lEnd As Integer
    lEnd = WorksheetFunction.Min(i + lSearchRange, lLen2)

    Dim j As Integer
    For j = lStart To lEnd - 1 Step 1
        If lMatched2(j) Then
            GoTo NextIteration1
        End If
        Dim charAtIndex1 As String
        charAtIndex1 = Mid(aString1, i, 1)
        Dim charAtIndex2 As String
        charAtIndex2 = Mid(aString2, j, 1)
        If charAtIndex1 <> charAtIndex2 Then
            GoTo NextIteration1
        End If
        lMatched1(i) = True
        lMatched2(j) = True
        lNumCommon = lNumCommon + 1
        Exit For
NextIteration1:
        Next j
Next i

If lNumCommon = 0 Then
    JaroWinklerProximity = 0
    Exit Function
End If

Dim lNumHalfTransposed As Integer
lNumHalfTransposed = 0
Dim k As Integer
k = 1
For i = 1 To lLen1 Step 1
    If Not lMatched1(i) Then
        GoTo NextIteration2
    End If
    
    Do While Not lMatched2(k)
        k = k + 1
    Loop

    If Mid(aString1, i, 1) <> Mid(aString2, j, 1) Then
        lNumHalfTransposed = lNumHalfTransposed + 1
    End If
    
    k = k + 1
NextIteration2:
Next

Dim lNumTransposed As Integer
lNumTransposed = lNumHalfTransposed / 2
Dim lNumCommonD As Double
lNumCommonD = lNumCommon
Dim lWeight As Double
lWeight = (lNumCommonD / lLen1 + lNumCommonD / lLen2 + (lNumCommon - lNumTransposed) / lNumCommonD) / 3
If lWeight <= mWeightThreshold Then
    JaroWinklerProximity = lWeight
    Exit Function
End If
Dim lMax As Integer
lMax = WorksheetFunction.Min(mNumChars, WorksheetFunction.Min(Len(aString1), Len(aString2)))
Dim lPos As Integer
lPos = 1

Do While lPos < lMax And Mid(aString1, lPos, 1) = Mid(aString2, lPos, 1)
    lPos = lPos + 1
Loop

If lPos = 1 Then
    JaroWinklerProximity = lWeight
    Exit Function
End If
JaroWinklerProximity = lWeight + 0.1 * lPos * (1# - lWeight)

End Function