I am currently working on an application that pulls mail from many IMAP mailboxes. It seems like Celluloid is a goot fit for this part, but I'm unsure on how to employ actors.
The application will be run in a distributed fashion. There are x mailboxes to poll and y processes among which these will be divided. So each process has a list of mailboxes they have to poll and this list will change every now and then. This means the pool of connections maintained by each process is dynamic.
My biggest question is: should I spawn a separate ImapConnection actor for each mailbox, or should I make a single ImapListener actor that manages all connections internally?
My current design features the former solution. There's one central Coordinator actor that keeps an array of actors that each manage one imap connection. A new connection is added with a simple:
@connections << ImapConnection.supervise(account_info)
The ImapConnection either polls the IMAP server at regular intervals, or maintains an IDLE connection. If the Coordinator wants to stop polling a mailbox it looks it up in its @connections array and properly disposes of it.
This seems like a logical approach for me that yields many benefits of Celluloid (such as automatic restarting of crashed actors), but I'm struggling to find examples of other software that uses this approach. Is spawning 100's of actors in this fashion proper use of the actor model or should I use a different approach?
Very glad to hear you are using
Celluloid. Good question.Not sure how you create connections and maintain them, whether that be by a
TCPSocketyou have the ability to manage or not. If you have the ability to manage aTCPSocketdirectly, you ought to useCelluloid::IOas well asCelluloiditself. I also don't know where you put information pulled in from IMAP connections. These two things influence your strategy.Your approach is not bad, but yes - it could possibly be improved by adding something to do your heavy lifting, polling workers; another to hold
account_infoonly; and a final actor to trigger the work and/or maintain the IDLE state. So you'd end up withImapWorker( a pool ),ImapMaintainer, andImapRegistry. Right here, I wonder if since you are polling, if you need to keep an open connection rather than allowing information to be pushed. If you plan to poll and still keep connections open, here is what the three actors would do:ImapRegistryholds youraccount_infoin aHash. This would have methods on it likeadd,get, andremove. I recommend aHashof@credentialsso you can use the same ID betweenImapMaintainerandImapRegistry; one holds live connections in its@connections, and one holdsaccount_infoinstances in its@credentials. Both@connectionsand@credentialsare accessed by the same ID, but one keeps a volatile connection whereas the other only has static data useable to recreate a connection if necessary. In this way, your heavy lifters could die, be respawned, and the entire system could regenerate itself.ImapMaintainerwould have the actual@connectionsin it, andevery( interval ) { }tasks built into it, added to whenaccount_infois stored inImapRegistry. There are two tasks I see, depending on what frequency you plan to poll. One could be to simply touch the IMAP connection to maintain it, and the other could be to poll the IMAP server withImapWorker.ImapWorkerwould be a pool saved inImapMaintaineras say@worker. So it has@connections,@worker,#polling, and#keepalive.pollingcould be an@connections.eachsituation, or you could have a timer per connection, added at the point a connection is created.ImapWorkerhas two methods... one is#touchthat keeps a connection alive. The main one is#poll, which takes a connection you maintain, and runs a polling process on it. That method returns the information or even better stores it also, then the worker returns to the@workerpool. This would give you the benefit of having the polling process happen in a separate thread rather than just a separate fiber, and also allows the most tricky aspect to be kept out in the most robust yet most unaware kind of actor.Working backward, if
ImapRegistryreceives#add, it storesaccount_infoand gives that toImapMaintainerwhich creates the connection, and timers ( but it forgetsaccount_infoand only creates the connection and timer(s) or just creates the connection and lets one big timer maintain the connection with@workerwhich is a pool.ImapMaintainerinevitably hits a timer, so at the start and end of its timer it can check its connection. If the connection is gone for some reason, it can recreate it with@registry.getinformation. Within its timer prompted task, it can run@worker.pollor@worker.alive.This illustrates the above requirements, showing how the initializers would put together the actor system, and has an incomplete skeleton of methods mentioned.
I love
Celluloidand hope you have a lot of success with it. Please ask if you want anything clarified, but this at least is another strategy for you to consider.