Self-reference in data structure – Checking for equality

1.3k views Asked by At

In my initial attempt at creating a disjoint set data structure I created a Point data type with a parent pointer to another Point:

data Point a = Point
  { _value  :: a
  , _parent :: Point a
  , _rank   :: Int
  }

To create a singleton set, a Point is created that has itself as its parent (I believe this is called tying the knot):

makeSet' :: a -> Point a
makeSet' x = let p = Point x p 0 in p

Now, when I wanted to write findSet (i.e. follow parent pointers until you find a Point whose parent is itself) I hit a problem: Is it possible to check if this is the case? A naïve Eq instance would of course loop infinitely – but is this check conceptually possible to write?

(I ended up using a Maybe Point for the parent field, see my other question.)

4

There are 4 answers

0
Dominique Devriese On BEST ANSWER

No, what you are asking for is known in the Haskell world as referential identity: the idea that for two values of a certain type, you can check whether they are the same value in memory or two separate values that happen to have exactly the same properties.

For your example, you can ask yourself whether you would consider the following two values the same or not:

pl1 :: Point Int
pl1 = Point 0 (Point 0 pl1 1) 1

pl2 :: Point Int
pl2 = Point 0 pl2 1

Haskell considers both values completely equal. I.e. Haskell does not support referential identity. One of the reasons for this is that it would violate other features that Haskell does support. It is for example the case in Haskell that we can always replace a reference to a function by that function's implementation without changing the meaning (equational reasoning). For example if we take the implementation of pl2: Point 0 pl2 1 and replace pl2 by its definition, we get Point 0 (Point 0 pl2 1) 1, making pl2's definition equivalent to pl1's. This shows that Haskell cannot allow you to observe the difference between pl1 and pl2 without violating properties implied by equational reasoning.

You could use unsafe features like unsafePerformIO (as suggested above) to work around the lack of referential identity in Haskell, but you should know that you are then breaking core principles of Haskell and you may observe weird bugs when GHC starts optimizing (e.g. inlining) your code. It is better to use a different representation of your data, e.g. the one you mentioned using a Maybe Point value.

0
wit On

In Haskell data is immutable (except IO a, IORef a, ST a, STRef a, ..) and data is function. So, any data is "singleton"

When you type

data C = C {a, b :: Int}
changeCa :: C -> Int -> C
changeCa c newa = c {a = newa}

You don't change a variable, you destroy old data and create a new one. You could try use links and pointers, but it is useless and complex

And last,

data Point a = Point {p :: Point a}

is a infinite list-like data = Point { p=Point { p=Point { p=Point { ....

0
J. Abrahamson On

In order the observed the effect you're most likely interested in—pointer equality instead of value equality—you most likely will want to write your algorithm in the ST monad. The ST monad can be thought of kind of like "locally impure IO, globally pure API" though, by the nature of Union Find, you'll likely have to dip the entire lookup process into the impure section of your code.

Fortunately, monads still contain this impurity fairly well.

There is also already an implementation of Union Find in Haskell which uses three methods to achieve the kind of identity you're looking for: using the IO monad, using IntSets, and using the ST monad.

0
Alexey Romanov On

You could try to do this using StableName (or StablePtr) and unsafePerformIO, but it seems like a worse idea than Maybe Point for this case.