Should we consider two sets to be similar if their rows contain the same hashes but in different order?

53 views Asked by At

Suppose we have minhash signatures for two sets and we want to calculate the Jaccard similarity of the two sets. We have:

-> S1 S2

h1 0 1

h2 1 2

h3 2 0

h4 3 3

S1 and S2 have the same signatures in different orders. Is the Jaccard similarity 1/8 or 1(approximately)?

1

There are 1 answers

1
lejlot On

These are different hash functions, thus h2(S1) == h1(S2) means nothing. There is no sense in comparing values of different hashings. So to directly answer - similarity here is 0 (no collisions), so not 1/8 nor 1.