In my initial attempt at creating a disjoint set data structure I created a Point
data type with a parent
pointer to another Point
:
data Point a = Point
{ _value :: a
, _parent :: Point a
, _rank :: Int
}
To create a singleton set, a Point
is created that has itself as its parent (I believe this is called tying the knot):
makeSet' :: a -> Point a
makeSet' x = let p = Point x p 0 in p
Now, when I wanted to write findSet
(i.e. follow parent pointers until you find a Point
whose parent is itself) I hit a problem: Is it possible to check if this is the case? A naïve Eq
instance would of course loop infinitely – but is this check conceptually possible to write?
(I ended up using a Maybe Point
for the parent field, see my other question.)
No, what you are asking for is known in the Haskell world as referential identity: the idea that for two values of a certain type, you can check whether they are the same value in memory or two separate values that happen to have exactly the same properties.
For your example, you can ask yourself whether you would consider the following two values the same or not:
Haskell considers both values completely equal. I.e. Haskell does not support referential identity. One of the reasons for this is that it would violate other features that Haskell does support. It is for example the case in Haskell that we can always replace a reference to a function by that function's implementation without changing the meaning (equational reasoning). For example if we take the implementation of
pl2
:Point 0 pl2 1
and replacepl2
by its definition, we getPoint 0 (Point 0 pl2 1) 1
, makingpl2
's definition equivalent topl1
's. This shows that Haskell cannot allow you to observe the difference betweenpl1
andpl2
without violating properties implied by equational reasoning.You could use unsafe features like
unsafePerformIO
(as suggested above) to work around the lack of referential identity in Haskell, but you should know that you are then breaking core principles of Haskell and you may observe weird bugs when GHC starts optimizing (e.g. inlining) your code. It is better to use a different representation of your data, e.g. the one you mentioned using aMaybe Point
value.