My current project needs to improve the data quality of our customers details.
One issue we have is that customers names have seperate data capture input fields for First, Middle names and surnames, however in many cases each part of the name was entered incorrectly.
We need to clean up the data we hold.
This data quality issue impacts when we contact our customers in correspondance, because we do not know their first name, middle names and surnames we offend some customers by using an inappropriate salutation
We need a named entity recognition library that can not only detect PERSONS names, but also detct First, Middle and Surnames.
What makes this data quality task harder is that we have almost 100 million customers, our customer base is world wide so we need to be able to identify first , middle and surnames, e.g. Given name, patronymic, and different ordr of parts. what will help is that we also know the customers nationallity.
Does a named entity recognition exist specific for Person Name PARTS?
I realise that a "Perfect" solution is impossible, however I am sure I can improve the data quality we currently have.
I just mentioned First, Middle and surnames as thats name structure I am most familiar with, however i do understand the following are examples of what I am facing
In many parts of the world, parts of names are derived from titles, locations, genealogical information, caste, religious references, and so on. Here are a few examples:
the Indian name Kogaddu Birappa Timappa Nair follows the order villageName-fathersName-givenName-lastName.
the Rajasthani name Aditya Pratap Singh Chauhan is composed of givenName-fathersName-surname-casteName.
in another part of India the name Madurai Mani Iyer represents townName-givenName-casteName.
the Arabic Abu Karim Muhammad al-Jamil ibn Nidal ibn Abdulaziz al-Filistini translates as "Father of Karim, Muhammad (given name), The beautiful, Son of Nidal, Son of Abdulaziz, the Palestinian". Karim is Muhammad's first-born son.
There is a simple, universal solution that companies seem surprisingly unwilling to apply:
Include a salutation if, and only if, the communication really is from a human being who is preparing that communication specifically for the recipient. In that case, part of paying attention to the recipient is writing a correct salutation taking into account the recipient's culture.
If you are computer-generating a communication using names from a database, be honest about what you are doing. Simply show the name as it was supplied to you on whatever form it came from. Do not attempt to use it to construct a formal salutation. Do not change it in any way. Communications that are obviously computer-generated but that try to pretend individual attention just look silly, even if they are not sufficiently incorrect to cause actual annoyance.