I have a data set of about 1 million employer names. These names are from a free-form text field so they include misspellings and variations in the way they are inputted (e.g. "Amazon" .. "Amzaon" .. "Amazon.com" .. "Amazon Web Services" .. "AWS").
I want to either A) group these 1 million so I have a somewhat accurate sense of how many unique employers are in the data set or B) be able to find all variations of any given employer.
So far, I've been using the data in Tableau, then filtering on "employer name" and searching all variations of the name I can think of. But it's tedious and I'm pretty sure I'm leaving many out.
I've also used the fuzzy add-in for excel but it hasn't worked that well on misspellings, special characters...
Try experimenting with Tableau Prep Builder - the companion tool that comes with your Tableau Creator license. It has a group feature that is designed for just these problems.
In Prep Builder, you’ll just need to connect to your data, add a cleaning step, and then add a group to your cleaning step.