I read Mark Shapiro's article on CRDTs but it is not clear for me whether all the operations of operation-based CRDTs should be commutative or not.
For any two operations f, g in C(xi): (1) if they are not causally related then they are concurrent under <d (that is never stronger than causality) and must commute; (2) if they are causally related a → b but are not ordered in delivery order <d they must also commute; (3) if they are causaly related a → b and delivered in that order a <d b then they are applied in that same order everyware
(Shapiro et al., A comprehensive study of Convergent and Commutative Replicated Data Types, 2011, p. 11)
It seems that causally related events that are delivered in that order could be non-commutative. In this case, this would allow a CRDT based on a defined data type operation with add and remove operations.
Assume a reliable broadcast channel and 3 replicas (A, B, C). A adds "a" to its set and broadcasts the operation to B and C. But due to network conditions, C receives the operation with high latency. Before C receives the operation, B already sends another operation to remove "a" from the set (see pic below). In the end, the three replicas do not converge, even though the events were broadcast in a causal order. So should the operations of an operation-based CRDT be commutative? Should operation-based CRDTs increase monotonically? Or is there something I don't understand?
You are correct that only concurrent operations need to be commutative, assuming causal order delivery.
The issue in your example is that causal order delivery is violated: the
add(a)
message is causally prior to theremove(a)
message, but replica C receives them in the opposite order.What matters is the order of receipt, not broadcast. (Or in practice, the order of processing, since you can just delay processing a message that you received before its causal predecessors.)