Proper way to maintain many connections with Celluloid?

1.1k views Asked by At

I am currently working on an application that pulls mail from many IMAP mailboxes. It seems like Celluloid is a goot fit for this part, but I'm unsure on how to employ actors.

The application will be run in a distributed fashion. There are x mailboxes to poll and y processes among which these will be divided. So each process has a list of mailboxes they have to poll and this list will change every now and then. This means the pool of connections maintained by each process is dynamic.

My biggest question is: should I spawn a separate ImapConnection actor for each mailbox, or should I make a single ImapListener actor that manages all connections internally?

My current design features the former solution. There's one central Coordinator actor that keeps an array of actors that each manage one imap connection. A new connection is added with a simple:

@connections << ImapConnection.supervise(account_info)

The ImapConnection either polls the IMAP server at regular intervals, or maintains an IDLE connection. If the Coordinator wants to stop polling a mailbox it looks it up in its @connections array and properly disposes of it.

This seems like a logical approach for me that yields many benefits of Celluloid (such as automatic restarting of crashed actors), but I'm struggling to find examples of other software that uses this approach. Is spawning 100's of actors in this fashion proper use of the actor model or should I use a different approach?

1

There are 1 answers

7
digitalextremist On BEST ANSWER

Very glad to hear you are using Celluloid. Good question.

Not sure how you create connections and maintain them, whether that be by a TCPSocket you have the ability to manage or not. If you have the ability to manage a TCPSocket directly, you ought to use Celluloid::IO as well as Celluloid itself. I also don't know where you put information pulled in from IMAP connections. These two things influence your strategy.

Your approach is not bad, but yes - it could possibly be improved by adding something to do your heavy lifting, polling workers; another to hold account_info only; and a final actor to trigger the work and/or maintain the IDLE state. So you'd end up with ImapWorker ( a pool ), ImapMaintainer, and ImapRegistry. Right here, I wonder if since you are polling, if you need to keep an open connection rather than allowing information to be pushed. If you plan to poll and still keep connections open, here is what the three actors would do:

ImapRegistry holds your account_info in a Hash. This would have methods on it like add, get, and remove. I recommend a Hash of @credentials so you can use the same ID between ImapMaintainer and ImapRegistry; one holds live connections in its @connections, and one holds account_info instances in its @credentials. Both @connections and @credentials are accessed by the same ID, but one keeps a volatile connection whereas the other only has static data useable to recreate a connection if necessary. In this way, your heavy lifters could die, be respawned, and the entire system could regenerate itself.

ImapMaintainer would have the actual @connections in it, and every( interval ) { } tasks built into it, added to when account_info is stored in ImapRegistry. There are two tasks I see, depending on what frequency you plan to poll. One could be to simply touch the IMAP connection to maintain it, and the other could be to poll the IMAP server with ImapWorker. ImapWorker would be a pool saved in ImapMaintainer as say @worker. So it has @connections, @worker, #polling, and #keepalive. polling could be an @connections.each situation, or you could have a timer per connection, added at the point a connection is created.

ImapWorker has two methods... one is #touch that keeps a connection alive. The main one is #poll, which takes a connection you maintain, and runs a polling process on it. That method returns the information or even better stores it also, then the worker returns to the @worker pool. This would give you the benefit of having the polling process happen in a separate thread rather than just a separate fiber, and also allows the most tricky aspect to be kept out in the most robust yet most unaware kind of actor.

Working backward, if ImapRegistry receives #add, it stores account_info and gives that to ImapMaintainer which creates the connection, and timers ( but it forgets account_info and only creates the connection and timer(s) or just creates the connection and lets one big timer maintain the connection with @worker which is a pool. ImapMaintainer inevitably hits a timer, so at the start and end of its timer it can check its connection. If the connection is gone for some reason, it can recreate it with @registry.get information. Within its timer prompted task, it can run @worker.poll or @worker.alive.

This illustrates the above requirements, showing how the initializers would put together the actor system, and has an incomplete skeleton of methods mentioned.

 WORKERS = 9 #de arbitrarily chosen

 class ImapRegistry
     include Celluloid

     def initialize
         @maintainer = ImapMaintainer.supervise
         @credentials = {}
     end

     def add( account_info )
         ...
     end

     def get( id )
         ...
     end

     def remove( id )
         ...
     end
 end

 class ImapMaintainer
     include Celluloid

     def initialize
         @worker = ImapWorker.pool size: WORKERS
         @connections = {}
     end

     def add( id, credential )
         ...
     end

     def remove( id )
         ...
     end

     #de These exist if there is one big timer:
     def polling
         ...
     end

     def keepalive
         ...
     end
 end

 class ImapWorker
     include Celluloid

     def initialize
         #de Nothing needed.
     end

     def poll( connection )
         ...
     end

     def touch( connection )
         ...
     end
 end

 registry = ImapRegistry.supervise

I love Celluloid and hope you have a lot of success with it. Please ask if you want anything clarified, but this at least is another strategy for you to consider.