React with references in immutable data

466 views Asked by At

I'm looking for input on a way to deal with issues arising with the use of id/ref links in immutable data (specifically with React using Scala-js and scala-js-react, but I think the solutions are probably common to any similar system, e.g. React in javascript, or another reactive system).

The concept is to use a Ref[A] in place of an A in my data where that same item will be referenced from multiple places in the data.

The Ref[A] would contain at least an Id[A] allowing the data to be looked up somehow - more details later.

That allows for updating the data in one place and then having that updated version still be referenced from other places.

To describe the problem, we can start from a system with an immutable data model without refs - in that case I can know that if two versions of the data x and y are equal that nothing in the data has changed. We might have something like Post(User("Alice", "[email protected]"), "Hello World") - all data is included in the model, and we can just compare different Posts as plain data.

We can also use lenses to navigate through the data and the same logic applies throughout.

However it's entirely possible that a User's email will change in future, and we may not want to either leave out of date emails scattered around all our Posts, or have to update the Posts' data to match the new email. To allow this we could introduce Refs.

To update the example, we could introduce a Ref[A] with an id field containing an Id[A] for the referent of type A. We now have Post(Ref(Id(42)), "Hello World"), and User(Id(42), "Alice", "[email protected]"). To display the Post, we need to (somehow) follow the Ref from the Post to the User instance. If Alice's email address changes, nothing in the Post data changes - we still just have a Ref(42).

This is good in one way - when we follow the Ref we will reach the new data. However this is a problem for systems like React, where we need to be able to tell when the model has changed by just comparing the old version of the model to the new one. We will generally pass the data model as part of Props, and then compare old and new props using equality to see if they have changed. This would lead to the rendering of the Post not changing at all when the User's email changes.

The solution to this seems to be the obvious one, to include the full set of referenced data in something like an immutable Cache that provides a Id[A] => Option[A]. This will then be replaced with a new Cache when the data referenced by any Id changes. This would be considered to be part of the data model, and passed around with it, say as a (A, Cache). We can still use lenses on the A, and just keep the same Cache around. When the Cache changes, the (A, Cache) will also change, allowing us to see the changes again. We are making sure we adhere to the contract of React Components (at least those without state), that the rendered output only needs to change when the Props change.

The problem here is that the (A, Cache) will change whenever any item in the cache changes, and we may have a variety of data in the cache that is irrelevant to any given component. This gets us the required change, but also many pointless changes leading to wasted re-renders.

I feel like there is probably a better way of handling this, but I've not located anything yet. Is there a general approach to this problem?

1

There are 1 answers

0
trepidacious On

In the meantime the best approach I can think of would be to extend Ref[A] so that it contains not just an Id[A] but also a specific revision of the data we refer to. Essentially we model the referenced data as a time series - the data has a value at each revision of itself, that value never changes, and we can only reference a specific revision. Revisions are for example case class Rev(r: Int). Then if the data model contains a Ref(Id(42), Rev(0)) it specifically references revision 0 of the data with id 42. If it continues to reference this revision, then the Ref won't change, but neither will the referenced data. In order to see a new revision of referenced data, we need the Ref to be updated to that new revision. Doing this changes the Ref itself, and so the data as a whole - this can in turn be detected by React. Revisions are fairly arbitrary - they just need to increment when data changes, they could be specific to each data item, or an overall counter.

This change to Refs in turn allows us to exclude the Cache from comparisons made to decide whether to re-render data, getting rid of wasted rendering for data we don't reference. We know that the data itself will change if any referenced data that can be reached by following a single reference changes.

Note that this does not help when following 2 or more jumps - however this can be dealt with relatively easily, for example in React we simply need to make sure that each time we follow a reference to reach new data, we pass that new data to a child component's props. This allows React to "detect" changes to that data and trigger re-rendering. The child component can then follow further refs, again passing the looked-up data to its own children, and we can recurse through an arbitrary number of refs.

We will still pass the Cache through to all components, but they will not need to use it to detect changes, only to look up the referenced data. In React we can provide a shouldComponentUpdate function (or Reusability) that will just look at the A in (A, Cache).

Managing the required rewriting of refs would be handled by a central system (for example in React this would probably be a root component with a cache of all data). It would provide a map from id to relevant data, but would also manage rewriting that data when necessary so that it contains the most recent revision of each reference, and then passing this data to child components for rendering.

To update the example, we would now need an Id for all top level data items. Using scala-ish pseudo-code we have cache contents a little like:

Id(1) -> (data = Post(id = Id(1), userRef = Ref(Id(42), Rev(0)), message = "Hello World!"), rev = Rev(0), refersTo = Set(Id(42))) Id(42) -> (data = User(id = Id(42), name = "Alice", email = "[email protected]"), rev = Rev(0), refersTo = Set())

So for each Id we have the data item itself, the revision that data item is at, and a set of other data items it refers to. We only store the most recent revision - old revisions will resolve to None if looked up.

The cache then has a function updateRefs that traverses a data item, looking for Refs. This will produce an updated data item with all refs at their most recent version, and a set of refs that were found in the data.

We run this updateRefs operation when:

  1. Adding a new data item to the cache. The data with updated refs is added to the cache, with refersTo field set to the refs we found. We then also update refs in any OTHER data items that refer to the added one.
  2. Updating a data item in the cache. The data has refs updated before it is stored, and refersTo field is updated to the new refs. We then also update refs in any OTHER data items that refer to the updated one.
  3. When a data item refers to another data item, and that data item is added to the cache or updated in the cache. This does NOT increment the revision number of that data item, and so does not cause a transitive update of data that refers to the traversed data. This matches up with the contract that we will update Refs when the data they reach by one step changes, NOT when 2 or more Refs need to be followed to see the change.

If we are using React, when a data item is updated or has references updated, in one of the above cases, it is re-rendered. This allows React components to notice the changes to data OR ref revisions to trigger a re-render.

As described above, we only increment revisions to show changes to the contents of the data item itself, NOT data items reached from it by a reference. So if we change a Post's message, it will increment revision. If we change the Id in the Post's userRef, this will again increment the revision. However if the User changes, the revision of the Post does NOT change. Rewriting the Post so that the userRef has an updated revision does NOT update the revision of the Post itself.

This means that we can tolerate cyclic references, since updating of refs' revisions will not trigger further updates.

To see this in effect, lets assume we have the Post and User in the cache as above.

  1. We wish to update the User to have a new email address. This could be done for example by dispatching an action to the Cache to request this.
  2. Because the User has been updated, it is traversed and found to contain no refs, so no rewriting of refs is done. If any refs were contained, they would be updated. The cache is then updated with the new User, which now has Rev(1).
  3. The cache then looks up all other data items that refer to the user - this is just the Post. It then traverses the post and finds one ref to the User via Id(42). This ref is updated to the current revision of User - Rev(1).
  4. At this point, note that if there WERE any data items that referred to the Post, they would NOT be traversed - this would be a "deep" change not a shallow one, and so does not cause any updates. If for example User contained a (cycle-forming) ref to Post, this would not cause an infinite update loop.
  5. The cache updates the entry for Post - it is still at Rev(0), and still refers to the User, but has a new data field with the rewritten Ref. This data is passed through to any components rendering the Post.
  6. We would then expect a component rendering Post to re-render say the message that is directly held in the post, and look up the User from the Cache. It then must pass this User to a child component, via props, to render. Since the child component has the User as its Props, it will re-render as well, since React allows comparing the old User to the new one. If the component rendering the Post just directly rendered the User it would work in this case, since the userRef will update whenever the User data is changed in a shallow way. However if the User contained a Ref, we would NOT see changes to the target of that Ref in the Post component.

This system seems like it should extend to server-client use as well, where we can use something resembling Diode's Pot system to represent data that is being retrieved from the server. Rather than Id[A] => Option[A] we would have Id[A] => Pot[A], and the specific Pot state would change as the data is retrieved, and then as it is updated by the server. We would tie into failed lookups to trigger retrieving referenced data, and perhaps periodically clear out data that has not been recently looked up.

The requirement to traverse the data seems a little annoying at first, however at least in some systems this kind of traversal could be added to typeclasses used for encoding/decoding data, which already need to be able to traverse the data model fully.

We also introduce the need for rewriting - however this just replaces the rewriting of the data model that would be required to update data that was nested in the model rather than referenced (e.g. using lenses), and can use the same lenses to work. Admittedly it is less efficient to have to traverse the whole data model than just to apply a lens, but hopefully the traversal would not be too much of a burden.