Comparing two data structures for similarties

640 views Asked by At

I'm trying to find an algorithm for checking similarities between two data entries. Say I have two data structures (fields in contact's list) with following data:

// UserA addressbook.
name: Frank Sinatra
mobile: +44 555 555 555 55

// UserB addressbook.
name: Frank Albert Sinatra
phone: 004455555555555

I got those entries from different providers, UserA synced his Google account, while UserB synced his Microsoft account, but I want my algorithm to tell me that both users know same guy (within some probability).

Does anyone know where should I look into? I've tried to find hashing algorithm that creates "unsafe" hashes, i.e. similar hashes for similar data, but that route wasn't productive.

2

There are 2 answers

0
ile On

some keywords you could further look into: data similarity, distance/similarity measures (metrics), correlation, inexact matching.

1
Daniel On

The similarity of strings can be determined with the Levenshtein distance. The strings should be prepared before the test, eg remove special character or split the string. For data structures have a look at How do you measure similarity between 2 series of data?