Is there a quick way for excel to identify and remove duplicate series from a cell such as this?

53 views Asked by At

Is there a built in function, or a simple UDF that can identify the pattern in the information below and remove the duplicates?

Assume the following is all within a single excel cell:

80154, 80299, 80299, 82055, 82145, 82205, 82520, 82570, 83840, 83925, 83925, 83986, 83992, 84315, 80154, 80299, 80299, 82055, 82145, 82205, 82520, 82570, 83840, 83925, 83925, 83986, 83992, 84315

There are two sets of data (starts with 80154 ends with 84315). I want to end up with only one set, but I want to do it to 50,000 lines. The final output should be just the BOLD text. Also, sometimes the data repeats itself 3 times, again, I just want the unique set of data.

NOTE: I can't just remove duplicates, because sometimes there will be duplicates in the set that I need to capture in the final output. For example, (A,A,B,C,A,A,B,C) needs to be reduced to (A,A,B,C).

1

There are 1 answers

3
Scott Craner On BEST ANSWER

This finds where the first 20% is repeated and cuts the string at that point.

IF it does not find a duplicate it will return the whole string.

=IFERROR(LEFT(A1,FIND(LEFT(A1,LEN(A1)/5),A1,2)-3),A1)

Play with the 5 till you find the proper length of string that will get you the correct answer on all your strings. The higher the number the smaller the string it compares.

Also if it is cutting off too much or not enough, like leaving the , at the end adjust the -3 up and down.