Identify Indian names in a given string of combined name tokens

305 views Asked by At

I do have a set of different name tokens and also data where the different names are combined. Eg. If the name has 3 tokens like "abc def ghi" and given a name "abcdef" or "abcdefghi", I would like to identify different valid tokens of that combined name string. Can we build a dictionary of name tokens and use some NLP techniques or python libraries to achieve this? Please give your inputs on how to start.

1

There are 1 answers

0
jaaq On

If you need to find a substring in a string, all you need is a list of tokens and a loop:

tokens = ['abc', 'def', 'ghi']
name = 'abcdef'
for token in tokens:
    if token in name:
        print(token, 'is part of', name)

See also if you need to find the position of the substring within the string.