Normalize human names in postgresql

366 views Asked by At

What is the easiest way to normalize a text field in postgresql table?

I am trying to find duplicates. For example, I want to consider O'Reilly a duplicate of oreilly. La Salle should be a duplicate of la'salle as well.

In a nutshell, we want to

  1. lowercase all text,
  2. strip accents
  3. strip punctuation marks such as these [.'-_] and
  4. strip spaces

Can this all be done in one or two simple steps? Ideally using built in postgresql functions.

Cheers

1

There are 1 answers

0
Belayer On

The following will give you what you want, using just standard Postgres functions;

regexp_replace (lower(unaccent(string_in)),'[^0-9a-z]','','g')

See example here. Or if you do not want digits the just

regexp_replace (lower(unaccent(string_in)),'[^a-z]','','g')