akka-cluster: unable to get rid of actors from scaled down node

46 views Asked by At

I face an issue using akka cluster.

System setup: 2 nodes, each having 2 instances of a worker actor. All these actors(4 in total) are tracked by a singleton actor in the application.

Issue is that when one of the node is scaled down, singleton actor still keeps having references to actors in downed node.

I am not able to get to root of the cause.

I am not sure what part in akka cluster takes care of such scenarios, and where to look for the issue.

1

There are 1 answers

0
David Ogren On

This is expected behavior. Just because an actor goes away (because it is stopped, it fails, its node goes away, other) doesn't mean that ActorRefs are deleted or anything like that. That's not really even possible with the way that ActorRefs work: they are just ordinary objects that can be passed around/copied/etc.

You are have a few options:

  1. An actor can register to receive a message when an actor is terminated via watch or watchWith. (As an actor you can be sent messages, unlike an ordinary object like an ActorRef.)

  2. You could listen for cluster member changes in a similar way.

  3. Rather than building your own singleton to keep track of actors, you could use the built-in Receptionist which does this kind of actor watching/cluster watching automatically.

  4. Fundamentally, in a system like you are describing, message senders do need to be resilient. For example, if one of these worker actors ends up behind a network partition it's not going to be able to respond or deregister itself. There has to be some kind of backup plan as well as the above.

EDIT 1/19/24: As an example of #4 you'll note that none of the native cluster features (clustered receptionist, singleton, sharding, sharded daemon are the ones that come to immediate mind) have you keep a direct reference to a clustered Actor. You are almost always working with a "proxy" on your local node and that proxy actor does the work of tracking liveness and reachability. That's one of the reasons I suggested using the builtin Receptionist.