I'm trying to find an algorithm for checking similarities between two data entries. Say I have two data structures (fields in contact's list) with following data:
// UserA addressbook.
name: Frank Sinatra
mobile: +44 555 555 555 55
// UserB addressbook.
name: Frank Albert Sinatra
phone: 004455555555555
I got those entries from different providers, UserA
synced his Google account, while UserB
synced his Microsoft account, but I want my algorithm to tell me that both users know same guy (within some probability).
Does anyone know where should I look into? I've tried to find hashing algorithm that creates "unsafe" hashes, i.e. similar hashes for similar data, but that route wasn't productive.
some keywords you could further look into: data similarity, distance/similarity measures (metrics), correlation, inexact matching.