Background

  • I have an application that is auto-started when a node is started (using .rel, .boot, etc.)
  • I want the application to fail-over to alternate nodes if a first node goes down.
  • I use Erlang's Distributed Application capability to handle fail-over and take-over.

Problem

The problem is that part of the Distributed Applications Negotiation is that when the nodes handshake to determine which node will stay-up and which will be quiesced, the application is started at all of the nodes. I need the application to NOT be up on multiple nodes if possible.

Question

  • Is there a way to have nodes automatically start my application except when involved in the Distributed Applications start-up negotiation? Alternately,
  • How do I achieve having my application start in an unattended way and fail-over without requiring my application to be up (even briefly) at multiple nodes
1

There are 1 answers

3
Roberto Aloi On BEST ANSWER

Unfortunately, the Erlang take-over and fail-over capabilities are pretty limited nowadays, so you need your application to run on all the nodes for these capabilities to work.

The only idea which comes to my mind is a bit crazy and involves one more level of indirection, but it might actually work.

You could write a fake, lightweight, wrapper application, which you then start on all the nodes. This application uses the standard Erlang Distribution capabilities. You then implement your takeover/failover strategies by simply starting your original application:

-module(wrapper).
-behaviour(application).

[...]

start({takeover, _Node}, _Args) ->
  application:start(original_app).

[...]

Also, bear in mind that when you type application:start(my_app) for a distributed app in all of your nodes, the application is not started on all the nodes. You can verify that by typing application:which_applications() on each of the nodes. You will notice how the application is running on one single node.

Finally, may I ask why you cannot start the application on more than one node?