64bit HashCodes, IEqualityComparer & Intersect/Except

245 views Asked by At

I'm generating 64 bit hashcodes from strings, and storing this value in a database

Is it possible to override GetHashCode with a 64 bit long type instead of 32 byte int?

If this is not possible, is it possible to implement Equals and GetHashCode somewhere else, and still use Except and Intersect?

public class RecordComparer : IEqualityComparer<Record>
{
    public bool Equals(Record x, Record y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (x == null || y == null) return false;
        return x.RecordHash.Equals(y.RecordHash);
    }

    public long GetHashCode(Record obj)
    {
        return obj.RecordHash;
    }
 }
2

There are 2 answers

2
Martin Liversage On

Assuming that you do not care about the possible problems arising from different records having equal hash codes and thus being considered equal even though they are different you can simply implement RecordComparer like this:

public class RecordComparer : IEqualityComparer<Record>
{
    public bool Equals(Record x, Record y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (x == null || y == null) return false;
        return x.RecordHash.Equals(y.RecordHash);
    }

    public int GetHashCode(Record obj)
    {
        return unchecked((int) obj.RecordHash);
    }
}

IEqualityComparer<T> is correctly implemented by returning a 32 bit hash code created by truncating the 64 bit hash code identifying the record.

There is no requirement that GetHashCode should return unique hash codes for unequal records. However, avoiding collisions will make generic collections like Dictionary<Record> perform better and basing the 32 bit hash code on the 64 bit hash code is probably the best thing to do.

If you look at the source code for Enumerable.Except and Enumerable.Intersect you can see that they use the internal class Set<T> which is some sort of hash table so your implementation of GetHashCode can affect the performance of your code but not the correctness (as long as equal records return the same hash code).

2
usr On

None of the built-in collections, algorithms and interfaces support 64 bit hash codes. You must build everything yourself. You need a whole parallel infrastructure.

This will likely turn out to be not worth the effort. Rather, use a 32 bit hash code and rely on your equality comparison to make sure that no false matches happen. This is required for correctness anyway.

Maybe this question is based on a misunderstanding:

Basically I will have 2 lists of with 64 bit hash code integers. I need to be able to use Except/Intersect on these 2 lists to find the differences, based on the 64 bit hascode value. As everything stands, IEqualityComparer only works with 32 bit integers.

Simply treat this hash code as the key in the built-in collections and algorithms. You can process those lists using Except just fine.