I'm having trouble figuring out how to maintain attribute updates across multiple tables to ensure data consistency.
For example, suppose I have many-to-many relationship between actors and fans. A fan can support many actors, and an actor have many fans. I make several tables to support my queries
CREATE TABLE fans (
fan_id uuid,
fan_attr_1 int,
fan_attr_2 int
PRIMARY KEY ((fan_id))
)
CREATE TABLE actors (
actor_id uuid,
actor_attr_1 int,
actor_attr_2 int
PRIMARY KEY ((actor_id))
)
CREATE TABLE actors_by_fan (
fan_id uuid,
actor_id uuid,
actor_attr_1 int,
actor_attr_2 int
PRIMARY KEY (fan_id, actor_id)
)
CREATE TABLE fans_by_actor (
actor_id uuid,
fan_id uuid,
fan_attr_1 int,
fan_attr_2 int
PRIMARY KEY (actor_id, fan_id)
)
Let's say I'm a fan and I'm on my settings page and I want to change my fan_attr_1
to a different value.
On the fans
table I can update my attribute just fine since the application knows my fan_id and can key on that.
However I cannot change my fan_attr_1
on the fans_by_actor
without first querying for the actor_ids tied to the fan.
This problem occurs for any time you want to update any attribute of either fans
or actors
.
I've tried looking online for people experiencing similar problems, but I couldn't find them. For example, in Datastax's Data Modeling course they use the examples with actors and videos in a many to many relationship where they have tables actors_by_video
and videos_by_actor
. The course, like the other online resources I've consulted, discussed modeling tables after queries, but haven't dug into how to maintain data integrity. In the actors_by_video
table, what would happen if I want to change an actor's attribute? Wouldn't have have to go through every row of actors_by_video
to find the partitions that contain the actor and update the attribute? That sounds very inefficient. The other option is to look for the video id's beforehand, but I read elsewhere that reads before writes are an antipattern in Cassandra.
What would be the best approach for tackling this problem either from a data modeling standpoint or from a CQL standpoint?
EDIT: - Fixed sentence stubs - Added context and prior research
Materialized Views are probably the best choice:
In versions prior to 3.0, create secondary indices and evaluate if their performance is acceptable. Later, after upgrading to 3.x, just drop the secondary indexes and create materialized views.