I'm looking for an Algorithm (Preferably with a java implementation) for merging Strings.
my problem is as following :
suppose I have an Array/List of Strings {"myString1" , "my String1" , "my-String-1" ... } I'd like the algorithm to point out that there is a very high probability that all of these values denote the "myString1".
so I would like to compact my list. maybe this can be done with KMP or maybe there is something more suitable.
Thanks.
I think that Edit distance is good heuristic for merging strings.
EDIT:
You can modify the edit distance algorithm:
You can give different value for d(-,c) for character c.
So in the following example: "String1","String2", you can "punish" the score but letting d(1,2) be high, in contrast to "String 1","String1" that won't be punished because the score will be d(-,' ').