According to MSDN GetHashCode Method:
public struct Point
{
private int x;
private int y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
public override bool Equals(Object obj)
{
if (!(obj is Point)) return false;
Point p = (Point) obj;
return x == p.x & y == p.y;
}
public override int GetHashCode()
{
return ShiftAndWrap(x.GetHashCode(), 2) ^ y.GetHashCode();
}
private int ShiftAndWrap(int value, int positions)
{
positions = positions & 0x1F;
// Save the existing bit pattern, but interpret it as an unsigned integer.
uint number = BitConverter.ToUInt32(BitConverter.GetBytes(value), 0);
// Preserve the bits to be discarded.
uint wrapped = number >> (32 - positions);
// Shift and wrap the discarded bits.
return BitConverter.ToInt32(BitConverter.GetBytes((number << positions) | wrapped), 0);
}
}
I'm confused about the ShiftAndWrap Method, I know that is used to avoid generating collision hashcode. But I have questions as follows:
Why the parameter positions is set as 2?
Why the method do right-shift (32-positions) first then do left-shift positons, Does it have specific meaning?
As mentioned above, this method is used to reduce the situation of having collision, e.g. new Point(5,8) vs new Point(8,5), but if I create an object like new Point(3,16), it will get the same hashcode as new Point(5,8) did, so... what's the real effect of this method?
I couldn't say why they chose this particular hash code implementation, but with regard to this question:
The
ShiftAndWrap()
method here is a generic implementation of an algorithm for left-shifting a value by N bits and wrapping the overflow back to the end. So before they do the shift, they first get the left-most N bits so they can then append those onto the end.So here's what calling
ShiftAndWrap()
would look like if we were just working with 8-bit values (byte
s) and called it withvalue
= (binary) 11010010 andpositions
= 3:We can see that the return value
10010110
is the result of shifting11010010
by three bits and wrapping around the result.As to the question of why they don't just use
x ^ y
, I suspect this is because this would mean thatPoint(N, M)
would always produce the same hash code asPoint(M, N)
. By doing a shift on thex
value, we can have a hash code that not only takes into account thex
andy
values, but also their order, whereasx ^ y
would ignore their order.When doing hashing on a data structure that contains sub-components of the same type, it's common to have the hash function treat each of the sub-components differently so that their position matters. For example, Java uses this hash formula for strings (here
^
denotes an exponent, not XOR):We can see that each character is multiplied by a different power of 31, so that
stop
has a different hash code frompots
.As for why they chose
2
as the number of positions to shift, that might be arbitrary, or they may have done some evaluations to see what degree of shifting would be likely to produce the best distribution.