I've browsed some of the questions on Stack Overflow, but can't seem to find an answer. I have imported a really large database with customer information (approximately 6 million entries) into MySQL database. I'm using PHP to query the database. The data has not been entered in a computer friendly way. When a customer checks their details, I need to also query the database for anyone else who has the exact same physical address and inform the user.
The problem is that the same address has been entered in all kinds of ways, for example,
105 Ocean Avenue
105 Ocean Ave.
There are also additional spaces between commas in some addresses or double spaces, for example:
105 Ocean Avenue, New York
105 Ocean Avenue , New York
This makes the equals = operator useless... Is there an easy way to query the database to find similarities that are (for example) 80% similar and above.
You can make the comparison from Php. For example use the Php similar_text or the levenshtein functions. Both functions provide a measure of similarity between two strings.
Alternately you can use the Mysql Natural language full text search.