How to Use faker in a Hash-like Anonymization Function

89 views Asked by At

I'm looking for a function to make reproducible string anonymizations. Right now I'm focused on names, and I'd like a hash-like function which maps an input name string to a unique output name.

>> anon_name('Alice Lopez')
'Grant Forsythe'

Since this project will be open-source, there can't be an explicit lookup table where the input name can be found from the output. If possible, I'd also like to avoid having to store any sort of external lookup table; I'd rather keep it all contained in a single function.

The faker library seems to provide the bulk of the functionality I want. It can generate lots of names based on a random number generator. But from what I've read there's no reproducible hash-like capability to map an input to a generated name. I thought that maybe I could use the input name to set the random seed every time before generating an output, something like:

class FakeMapper():
    def __init__(self):
        self.fake = Faker()

    def anon_name(name: str) -> str:
        Faker.seed(hash(name))
        return self.fake.name()
  1. Does this implementation have any performance or data security problems? I figure there is some collision risk with the random generator producing the same name for two different seeds, but I can think of some workarounds.
  2. Is faker's implemented with lookup tables such that someone would be able to back out what random seed(s) would produce a given initial output? For this application that's not a security issue since I'm calling hash() on the name, but it could matter for some other data types I'm making similar anonymization functions for.
  3. Is there another implementation that makes more sense for my requirements?
0

There are 0 answers