Is there a built in function, or a simple UDF that can identify the pattern in the information below and remove the duplicates?
Assume the following is all within a single excel cell:
80154, 80299, 80299, 82055, 82145, 82205, 82520, 82570, 83840, 83925, 83925, 83986, 83992, 84315, 80154, 80299, 80299, 82055, 82145, 82205, 82520, 82570, 83840, 83925, 83925, 83986, 83992, 84315
There are two sets of data (starts with 80154 ends with 84315). I want to end up with only one set, but I want to do it to 50,000 lines. The final output should be just the BOLD text. Also, sometimes the data repeats itself 3 times, again, I just want the unique set of data.
NOTE: I can't just remove duplicates, because sometimes there will be duplicates in the set that I need to capture in the final output. For example, (A,A,B,C,A,A,B,C) needs to be reduced to (A,A,B,C).
This finds where the first 20% is repeated and cuts the string at that point.
IF it does not find a duplicate it will return the whole string.
Play with the
5till you find the proper length of string that will get you the correct answer on all your strings. The higher the number the smaller the string it compares.Also if it is cutting off too much or not enough, like leaving the
,at the end adjust the-3up and down.