I have two tables that I need to merge together in PostgreSQL, on the common variable "company name." Unfortunately many of the company names don't match exactly (i.e. MICROSOFT in one table, MICROSFT in the other). I've tried removing common words from both columns such as "corporation" or "inc" or "ltd" in order to try to standardize names across both tables, but I'm having trouble thinking of additional strategies. Any ideas?
Thanks.
Also, if necessary I can do this in R.
Have you considered the fuzzystrmatch module? You can use
soundex
,difference
,levenshtein
,metaphone
anddmetaphone
, or a combination.fuzzystrmatch documentation
For example the levenshtein distance from MICROSOFT to MICROSFT is one (1).
The above returns zero (0). Combining levenshtein and dmetaphone could help you match lots of misspellings.