Who is in charge of re-create persistant actor instance after JVM crash?

164 views Asked by At

I am evaluating whether to use Akka and akka persistent as key toolkit of certain project, in which there is a complex background running process (might be triggered by Quartz at a fixed time per day).

The back-ground running process will communicate with many different external services via HTTP communication, will generates many encrypted files locally, and transfer them via SFTP.

From business perspective:

  • The service is mission critical, roughly it will charge N million users' money from their bank cards automatically and help them purchase some fund product.

From technical perspective:

  • Each of the external service might not be available with whatever reason, such as network issues, the external service might running out of their resources(i.e. jdbc connections).
  • Our service might be killed, restarted, re-deployed due to urgent reason or crashed with some unexpected errors.
  • Once the process was restarted with an incomplete job, then it needs to gracefully complete them with different strategies, such as redo, confirm external system business state, and resume from certain check point.

I was reading from official AkkaScala.PDF, and some youtube conference videos, all of them were mentioning, actor's state can be restored by replaying the events from journal after JVM crash.

But it must be a stupid question, since i did not find it was being discussed:

Imagine there were 1000 persistent actors living in the service, and the service's JVM crashed and restarted, who should be in charge of triggering re-create those 1000 persistent actors in the newly created actor system in both single process mode and clustered mode? And how? Or what articles should I read first?

1

There are 1 answers

2
Branislav Lazic On BEST ANSWER

You should read basics of Akka Persistence and Akka Persistence Query. But probably, first thing that comes to my mind is to use Akka Persistence Query AllPersistenceIdsQuery or CurrentPersistenceIdsQuery. It will give you all persistence id's which you can use to reignite your persistent actors. Persistent actors by specific persistent id will replay all events from event store journal. You can take snapshots to speed up recovery. Your event store will probably be some kind of database (e.g. Cassandra). Considering that your persistent actor has specific mutable state, it will be brought back to its last state after the recovery. Recovery might take some time.