I am currently working on an application that pulls mail from many IMAP mailboxes. It seems like Celluloid is a goot fit for this part, but I'm unsure on how to employ actors.
The application will be run in a distributed fashion. There are x mailboxes to poll and y processes among which these will be divided. So each process has a list of mailboxes they have to poll and this list will change every now and then. This means the pool of connections maintained by each process is dynamic.
My biggest question is: should I spawn a separate ImapConnection actor for each mailbox, or should I make a single ImapListener actor that manages all connections internally?
My current design features the former solution. There's one central Coordinator actor that keeps an array of actors that each manage one imap connection. A new connection is added with a simple:
@connections << ImapConnection.supervise(account_info)
The ImapConnection either polls the IMAP server at regular intervals, or maintains an IDLE connection. If the Coordinator wants to stop polling a mailbox it looks it up in its @connections array and properly disposes of it.
This seems like a logical approach for me that yields many benefits of Celluloid (such as automatic restarting of crashed actors), but I'm struggling to find examples of other software that uses this approach. Is spawning 100's of actors in this fashion proper use of the actor model or should I use a different approach?
Very glad to hear you are using
Celluloid
. Good question.Not sure how you create connections and maintain them, whether that be by a
TCPSocket
you have the ability to manage or not. If you have the ability to manage aTCPSocket
directly, you ought to useCelluloid::IO
as well asCelluloid
itself. I also don't know where you put information pulled in from IMAP connections. These two things influence your strategy.Your approach is not bad, but yes - it could possibly be improved by adding something to do your heavy lifting, polling workers; another to hold
account_info
only; and a final actor to trigger the work and/or maintain the IDLE state. So you'd end up withImapWorker
( a pool ),ImapMaintainer
, andImapRegistry
. Right here, I wonder if since you are polling, if you need to keep an open connection rather than allowing information to be pushed. If you plan to poll and still keep connections open, here is what the three actors would do:ImapRegistry
holds youraccount_info
in aHash
. This would have methods on it likeadd
,get
, andremove
. I recommend aHash
of@credentials
so you can use the same ID betweenImapMaintainer
andImapRegistry
; one holds live connections in its@connections
, and one holdsaccount_info
instances in its@credentials
. Both@connections
and@credentials
are accessed by the same ID, but one keeps a volatile connection whereas the other only has static data useable to recreate a connection if necessary. In this way, your heavy lifters could die, be respawned, and the entire system could regenerate itself.ImapMaintainer
would have the actual@connections
in it, andevery( interval ) { }
tasks built into it, added to whenaccount_info
is stored inImapRegistry
. There are two tasks I see, depending on what frequency you plan to poll. One could be to simply touch the IMAP connection to maintain it, and the other could be to poll the IMAP server withImapWorker
.ImapWorker
would be a pool saved inImapMaintainer
as say@worker
. So it has@connections
,@worker
,#polling
, and#keepalive
.polling
could be an@connections.each
situation, or you could have a timer per connection, added at the point a connection is created.ImapWorker
has two methods... one is#touch
that keeps a connection alive. The main one is#poll
, which takes a connection you maintain, and runs a polling process on it. That method returns the information or even better stores it also, then the worker returns to the@worker
pool. This would give you the benefit of having the polling process happen in a separate thread rather than just a separate fiber, and also allows the most tricky aspect to be kept out in the most robust yet most unaware kind of actor.Working backward, if
ImapRegistry
receives#add
, it storesaccount_info
and gives that toImapMaintainer
which creates the connection, and timers ( but it forgetsaccount_info
and only creates the connection and timer(s) or just creates the connection and lets one big timer maintain the connection with@worker
which is a pool.ImapMaintainer
inevitably hits a timer, so at the start and end of its timer it can check its connection. If the connection is gone for some reason, it can recreate it with@registry.get
information. Within its timer prompted task, it can run@worker.poll
or@worker.alive
.This illustrates the above requirements, showing how the initializers would put together the actor system, and has an incomplete skeleton of methods mentioned.
I love
Celluloid
and hope you have a lot of success with it. Please ask if you want anything clarified, but this at least is another strategy for you to consider.