Managing Entity Resolution in Anchor Modeling

34 views Asked by At

I've been reading about anchor modeling and really like the concept. My hope is to possibly incorporate it into a data management framework where I consolidate multiple data sources into an anchor model, then either make it directly available or have it feed data marts for our data scientists.

But I'm not sure how to approach entity resolution. The guidelines state no updates, only inserts, with the option to delete only to remove erroneous data. So now lets say my source system(s) have duplicate entities (eg. John Smith appears more than once), and this makes its way into my anchor model? What is the best way to clean this up?

My rubber duck is telling me to create an entity resolution layer on top of my anchor model that looks for these issues and corrects them. Correcting would mean merging entities in anchors and fixing subsequent ties accordingly. But now I'm updating my anchor model...which is against best practices.

Or am I looking at this wrong....and entity resolution should be dealt with before data gets into the anchor model? But mistakes can happen, and it would be nice to know I could address the issue inside the anchor model should it present itself.

0

There are 0 answers