I'm looking for input on a way to deal with issues arising with the use of id/ref links in immutable data (specifically with React using Scala-js and scala-js-react, but I think the solutions are probably common to any similar system, e.g. React in javascript, or another reactive system).
The concept is to use a Ref[A]
in place of an A
in my data where that same item will be referenced from multiple places in the data.
The Ref[A]
would contain at least an Id[A]
allowing the data to be looked up somehow - more details later.
That allows for updating the data in one place and then having that updated version still be referenced from other places.
To describe the problem, we can start from a system with an immutable data model without refs - in that case I can know that if two versions of the data x and y are equal that nothing in the data has changed. We might have something like Post(User("Alice", "[email protected]"), "Hello World")
- all data is included in the model, and we can just compare different Post
s as plain data.
We can also use lenses to navigate through the data and the same logic applies throughout.
However it's entirely possible that a User's email will change in future, and we may not want to either leave out of date emails scattered around all our Posts, or have to update the Posts' data to match the new email. To allow this we could introduce Refs.
To update the example, we could introduce a Ref[A]
with an id field containing an Id[A]
for the referent of type A. We now have Post(Ref(Id(42)), "Hello World")
, and User(Id(42), "Alice", "[email protected]")
. To display the Post, we need to (somehow) follow the Ref from the Post to the User instance. If Alice's email address changes, nothing in the Post data changes - we still just have a Ref(42).
This is good in one way - when we follow the Ref we will reach the new data. However this is a problem for systems like React, where we need to be able to tell when the model has changed by just comparing the old version of the model to the new one. We will generally pass the data model as part of Props, and then compare old and new props using equality to see if they have changed. This would lead to the rendering of the Post not changing at all when the User's email changes.
The solution to this seems to be the obvious one, to include the full set of referenced data in something like an immutable Cache
that provides a Id[A] => Option[A]
. This will then be replaced with a new Cache when the data referenced by any Id changes. This would be considered to be part of the data model, and passed around with it, say as a (A, Cache)
. We can still use lenses on the A, and just keep the same Cache around. When the Cache changes, the (A, Cache)
will also change, allowing us to see the changes again. We are making sure we adhere to the contract of React Components (at least those without state), that the rendered output only needs to change when the Props change.
The problem here is that the (A, Cache)
will change whenever any item in the cache changes, and we may have a variety of data in the cache that is irrelevant to any given component. This gets us the required change, but also many pointless changes leading to wasted re-renders.
I feel like there is probably a better way of handling this, but I've not located anything yet. Is there a general approach to this problem?
In the meantime the best approach I can think of would be to extend
Ref[A]
so that it contains not just anId[A]
but also a specific revision of the data we refer to. Essentially we model the referenced data as a time series - the data has a value at each revision of itself, that value never changes, and we can only reference a specific revision. Revisions are for examplecase class Rev(r: Int)
. Then if the data model contains aRef(Id(42), Rev(0))
it specifically references revision 0 of the data with id 42. If it continues to reference this revision, then the Ref won't change, but neither will the referenced data. In order to see a new revision of referenced data, we need the Ref to be updated to that new revision. Doing this changes the Ref itself, and so the data as a whole - this can in turn be detected by React. Revisions are fairly arbitrary - they just need to increment when data changes, they could be specific to each data item, or an overall counter.This change to Refs in turn allows us to exclude the Cache from comparisons made to decide whether to re-render data, getting rid of wasted rendering for data we don't reference. We know that the data itself will change if any referenced data that can be reached by following a single reference changes.
Note that this does not help when following 2 or more jumps - however this can be dealt with relatively easily, for example in React we simply need to make sure that each time we follow a reference to reach new data, we pass that new data to a child component's props. This allows React to "detect" changes to that data and trigger re-rendering. The child component can then follow further refs, again passing the looked-up data to its own children, and we can recurse through an arbitrary number of refs.
We will still pass the Cache through to all components, but they will not need to use it to detect changes, only to look up the referenced data. In React we can provide a shouldComponentUpdate function (or Reusability) that will just look at the
A
in(A, Cache)
.Managing the required rewriting of refs would be handled by a central system (for example in React this would probably be a root component with a cache of all data). It would provide a map from id to relevant data, but would also manage rewriting that data when necessary so that it contains the most recent revision of each reference, and then passing this data to child components for rendering.
To update the example, we would now need an Id for all top level data items. Using scala-ish pseudo-code we have cache contents a little like:
Id(1) -> (data = Post(id = Id(1), userRef = Ref(Id(42), Rev(0)), message = "Hello World!"), rev = Rev(0), refersTo = Set(Id(42))) Id(42) -> (data = User(id = Id(42), name = "Alice", email = "[email protected]"), rev = Rev(0), refersTo = Set())
So for each Id we have the data item itself, the revision that data item is at, and a set of other data items it refers to. We only store the most recent revision - old revisions will resolve to None if looked up.
The cache then has a function updateRefs that traverses a data item, looking for Refs. This will produce an updated data item with all refs at their most recent version, and a set of refs that were found in the data.
We run this updateRefs operation when:
If we are using React, when a data item is updated or has references updated, in one of the above cases, it is re-rendered. This allows React components to notice the changes to data OR ref revisions to trigger a re-render.
As described above, we only increment revisions to show changes to the contents of the data item itself, NOT data items reached from it by a reference. So if we change a Post's message, it will increment revision. If we change the Id in the Post's userRef, this will again increment the revision. However if the User changes, the revision of the Post does NOT change. Rewriting the Post so that the userRef has an updated revision does NOT update the revision of the Post itself.
This means that we can tolerate cyclic references, since updating of refs' revisions will not trigger further updates.
To see this in effect, lets assume we have the Post and User in the cache as above.
This system seems like it should extend to server-client use as well, where we can use something resembling Diode's
Pot
system to represent data that is being retrieved from the server. Rather thanId[A] => Option[A]
we would haveId[A] => Pot[A]
, and the specific Pot state would change as the data is retrieved, and then as it is updated by the server. We would tie into failed lookups to trigger retrieving referenced data, and perhaps periodically clear out data that has not been recently looked up.The requirement to traverse the data seems a little annoying at first, however at least in some systems this kind of traversal could be added to typeclasses used for encoding/decoding data, which already need to be able to traverse the data model fully.
We also introduce the need for rewriting - however this just replaces the rewriting of the data model that would be required to update data that was nested in the model rather than referenced (e.g. using lenses), and can use the same lenses to work. Admittedly it is less efficient to have to traverse the whole data model than just to apply a lens, but hopefully the traversal would not be too much of a burden.