Is it possible for Excel to recognize a pattern of comma separated numbers in a cell and remove duplicates of that pattern?

132 views Asked by At

Its an easy task for the human eye, but it will take me 2 weeks to do this if I do it manually.

I'm using a UDF that I got from this site to add the TextJoin functionality to my Excel 2011. Works great, but I've encountered a problem on a new spreadsheet that I'm working with that has nearly 50,000 lines. The problem, is that I can't just simply remove duplicates, because there are some instances where the data repeats, and I need to capture that. However, I located another column with unique data that correlates to the duplicate data. By looking at these 2nd data column I can easily tell what is a duplicate and what needs to be included.

I created a simplified spreadsheet example, because its much easier to show that explain with words.

Source Data:

Acct#   Lname,Fname Date    Data#1  Data#2
42  Doe, John   1/1/17  10001   1001
42  Doe, John   1/1/17  10001   1001
42  Doe, John   1/1/17  30003   1001
42  Doe, John   1/1/17  10001   1002
42  Doe, John   1/1/17  10001   1002
42  Doe, John   1/1/17  30003   1002
70  Smith, Jane 2/1/17  10001   2001
70  Smith, Jane 2/1/17  20002   2001
70  Smith, Jane 2/1/17  30003   2001
70  Smith, Jane 2/1/17  10001   2002
70  Smith, Jane 2/1/17  20002   2002
70  Smith, Jane 2/1/17  30003   2002
70  Smith, Jane 2/1/17  10001   2003
70  Smith, Jane 2/1/17  20002   2003
70  Smith, Jane 2/1/17  30003   2003
93  Blow, Joe   1/1/17  10001   3001
93  Blow, Joe   1/1/17  20002   3001
93  Blow, Joe   1/1/17  30003   3001
93  Blow, Joe   1/1/17  10001   3002
93  Blow, Joe   1/1/17  20002   3002
93  Blow, Joe   1/1/17  30003   3002
177 Bryant, Kobe    2/1/17  10001   4001
177 Bryant, Kobe    2/1/17  30003   4001
177 Bryant, Kobe    2/1/17  30003   4001
177 Bryant, Kobe    2/1/17  10001   4002
177 Bryant, Kobe    2/1/17  30003   4002
177 Bryant, Kobe    2/1/17  30003   4002
177 Bryant, Kobe    2/1/17  10001   4003
177 Bryant, Kobe    2/1/17  30003   4003
177 Bryant, Kobe    2/1/17  30003   4003

OUTPUT DATA:

Acct#   (Lname, Fname)  Date    Data#1  Data#2
42  Doe, John   1/1/17  10001, 10001, 30003, 10001, 10001, 30003    
70  Smith, Jane 2/1/17  10001, 20002, 30003, 10001, 20002, 30003, 10001, 20002, 30003   
93  Blow ,Joe   1/1/17  10001, 20002, 30003, 10001, 20002, 30003    
177 Bryant, Kobe    2/1/17  10001, 30003, 30003, 10001, 30003, 30003, 10001, 30003, 30003   

I don't know how to put the excel spreadsheet in my question so here is a screenshot of what I currently have, and the desired output (I'm just focusing on the Data#1 column right now).

enter image description here

As requested, this is the VBA code I'm using to extract the Data#1:

Function TEXTJOIN(delim As String, skipblank As Boolean, arr)
    Dim d As Long
    Dim c As Long
    Dim arr2()
    Dim t As Long, y As Long
    t = -1
    y = -1
    If TypeName(arr) = "Range" Then
        arr2 = arr.Value
    Else
        arr2 = arr
    End If
    On Error Resume Next
    t = UBound(arr2, 2)
    y = UBound(arr2, 1)
    On Error GoTo 0

    If t >= 0 And y >= 0 Then
        For c = LBound(arr2, 1) To UBound(arr2, 1)
            For d = LBound(arr2, 1) To UBound(arr2, 2)
                If arr2(c, d) <> "" Or Not skipblank Then
                    TEXTJOIN = TEXTJOIN & arr2(c, d) & delim
                End If
            Next d
        Next c
    Else
        For c = LBound(arr2) To UBound(arr2)
            If arr2(c) <> "" Or Not skipblank Then
                TEXTJOIN = TEXTJOIN & arr2(c) & delim
            End If
        Next c
    End If
        TEXTJOIN = Left(TEXTJOIN, Len(TEXTJOIN) - Len(delim))
    End Function

I tried the suggested subroutine and I got this output:

enter image description here

1

There are 1 answers

10
Ethan On

Here is some code that at least works with your sample data, though I suspect how well it works will drop off severely as soon as the data changes even a little bit.

Option Explicit
Sub PatternFilter()

    Dim ws As Worksheet

    Dim index1_col As String
    Dim index2_col As String
    Dim data1_col As String
    Dim data2_col As String
    Dim lastrow As Long
    Dim lastentryrow As Long
    Dim outputline As Long
    Dim iter1 As Long
    Dim iter2 As Long
    Dim datastring As String

    Set ws = ThisWorkbook.Sheets("Sheet1")

    index1_col = "A" ' the column with the unique identifier, such as name or acct#
    index2_col = "C" ' another column which can either be unique
    data1_col = "D" ' data1 column
    data2_col = "E" ' data2 column

    lastrow = ws.Range(index1_col & ws.Rows.Count).End(xlUp).Row ' so we know where to stop
    outputline = 2 ' just an incrementor to keep putting data on unique lines

    For iter1 = 2 To lastrow Step 1
        datastring = "" ' reset our output string
        For iter2 = iter1 + 1 To lastrow Step 1 ' a for loop to find the last row in each pattern set
            If (ws.Range(index1_col & iter2).Value <> ws.Range(index1_col & iter2 + 1).Value) _
                    Or (ws.Range(index2_col & iter2).Value <> ws.Range(index2_col & iter2 + 1).Value) Then
                lastentryrow = iter2
                Exit For
            End If
        Next
        For iter2 = iter1 To lastentryrow Step 1 ' a for loop to collect all of the data1 pattern
            If ws.Range(data2_col & iter2).Value <> ws.Range(data2_col & iter2 + 1).Value Then
                datastring = datastring & "," & ws.Range(data1_col & iter2).Value
                Exit For
            End If
            If datastring = "" Then
                datastring = ws.Range(data1_col & iter2).Value
            Else
                datastring = datastring & "," & ws.Range(data1_col & iter2).Value
            End If
        Next
        ws.Range("I" & outputline).Value = datastring ' save the data1 pattern
        datastring = "" ' reset the output string
        For iter2 = iter1 To lastentryrow Step 1 ' a for loop to collect all of the data2 pattern
            If ws.Range(data2_col & iter2).Value <> ws.Range(data2_col & iter2 - 1).Value Then
                If datastring = "" Then
                    datastring = ws.Range(data2_col & iter2).Value
                Else
                    datastring = datastring & "," & ws.Range(data2_col & iter2).Value
                End If
            End If
        Next
        ws.Range("J" & outputline).Value = datastring ' save the data2 pattern
        ws.Range("G" & outputline).Value = ws.Range(index1_col & iter1).Value ' put the unique identifier with the data so we know who the data belongs to
        ws.Range("H" & outputline).Value = ws.Range(index2_col & iter1).Value
        outputline = outputline + 1 ' increment the output line to avoid overwriting existing data
        iter1 = lastentryrow ' set this to last entry line (for loop will increment to the first line of the next entry for us)
    Next

End Sub

Of course, this can be manipulated to output to wherever you want, including a new sheet, etc. Let me know if there is anything in this you don't understand or that does not work for you

*EDIT: A second index column was added per OP request