I face an issue using akka cluster.
System setup: 2 nodes, each having 2 instances of a worker actor. All these actors(4 in total) are tracked by a singleton actor in the application.
Issue is that when one of the node is scaled down, singleton actor still keeps having references to actors in downed node.
I am not able to get to root of the cause.
I am not sure what part in akka cluster takes care of such scenarios, and where to look for the issue.
This is expected behavior. Just because an actor goes away (because it is stopped, it fails, its node goes away, other) doesn't mean that
ActorRefs are deleted or anything like that. That's not really even possible with the way thatActorRefs work: they are just ordinary objects that can be passed around/copied/etc.You are have a few options:
An actor can register to receive a message when an actor is terminated via
watchorwatchWith. (As an actor you can be sent messages, unlike an ordinary object like anActorRef.)You could listen for cluster member changes in a similar way.
Rather than building your own singleton to keep track of actors, you could use the built-in
Receptionistwhich does this kind of actor watching/cluster watching automatically.Fundamentally, in a system like you are describing, message senders do need to be resilient. For example, if one of these worker actors ends up behind a network partition it's not going to be able to respond or deregister itself. There has to be some kind of backup plan as well as the above.
EDIT 1/19/24: As an example of #4 you'll note that none of the native cluster features (clustered receptionist, singleton, sharding, sharded daemon are the ones that come to immediate mind) have you keep a direct reference to a clustered Actor. You are almost always working with a "proxy" on your local node and that proxy actor does the work of tracking liveness and reachability. That's one of the reasons I suggested using the builtin Receptionist.