Handling exit signals in hand rolled Supervisors in Erlang?

305 views Asked by At

I am trying to write a supervisor for a process that I have made. I have investigated this for some time to no avail so hopefully someone can help.

I have certain restrictions on the interface I have to use as this is for an assignment, so I am aware of the examples using Lists and also more elaborate OTP examples on the Erlang site, however these are not suitable. I have provided and example abstracted from my application to demonstrate the issue.

I am trying to restart an arbitrary worker when it is exited for none normal reasons. The worker process is simply:

-module(my_mod).

-export([start/1, init/1]).

start(Pid)->
  {ok, spawn_link(?MODULE, init, [Pid])}.

init(Pid) ->
  register(Pid, self()),
  io:format("Started ~p~n",[Pid]),
  loop().

loop() ->
  receive stop -> exit(byebye) end.

In the supervisor I am using ETS tabs to keep track of the workers and restart them, the supervisor is as such:

-module(my_sup).

-export([start_link/0, init/1, add_item/1, remove_item/1]).


start_link() ->
  spawn(?MODULE, init, [self()]).

init(Pid) ->
  process_flag(trap_exit, true),
  register(?MODULE, Pid),
  ets:new(?MODULE, [set, named_table, public]),
  loop().

add_item(Pid) ->
  ets:insert(?MODULE, {Pid}),
  my_mod:start(Pid),
  {ok, Pid}.

remove_item(Pid) ->
  ets:delete(?MODULE, [Pid]).

loop() ->
  io:format("Looping ~n"),
  receive
    {'EXIT', Pid, _Reason} ->
      remove_item(Pid),
      add_item(Pid)
  end.

So I believe I am doing somethings right here, my_mod is linked back to the supervisor so that it is notified of the exit signal, the supervisor has the trap_exit set so that the supervisor has to opportunity to handle the signal. However I am finding that I just get a ** exception exit: stop thrown and I am not sure why this is?

My test case is the following:

1> c(my_sup), c(my_mod), my_sup:start_link().
Looping 
<0.42.0>
2> my_sup:add_item(a). 
Started a
{ok,a}
3> a ! stop .
** exception exit: byebye

Can anyone point me in the right direction?

2

There are 2 answers

1
Steve Vinoski On

In your shell your add_item/1 call occurs within the shell process, not within the supervisor process, which means the supervisor is not linked to the newly-added process, but rather your shell is. In add_item/1 you should instead send a message into the supervisor process to tell it to launch a new worker, and change your supervisor loop to handle that new message and launch the worker from there.

0
Opentuned On

Ok, so as Steve V pointed out, my issue was that I was actually linking to the shell process rather then the supervisor when calling add_item/1. I have found the following solution, there are some issues still in that if you try to add an existing Pid things blow up, but it is an adequate solution for the initial question. The my_mod was changed to the following:

-module(my_mod).

-export([start/1, init/1]).

start(Name)->
  {ok, spawn_link(?MODULE, init, [Name])}.

init(Name) ->
  register(Name, self()),
  io:format("Started ~p~n",[Name]),
  loop().

loop() ->
  receive 
    exit -> exit(kill);
    stop -> exit(graceful)
  end. 

and the supervisor amended to:

-module(my_sup).

-export([start_link/0, init/0, add_item/1, remove_item/1]).

start_link() -> register(?MODULE, spawn(?MODULE, init, [])).

init() ->
  process_flag(trap_exit, true),
  ets:new(?MODULE, [set, named_table, public]), loop().

add_item(Name) -> ?MODULE ! {add_item, Name}.

update_item(Name, Pid) -> ?MODULE ! {update_item, Name, Pid}.

remove_item(Name) -> ?MODULE ! {remove_item, Name}.

 loop() ->
   io:format("Looping ~n"),
   receive
     {'EXIT', Pid, graceful} ->
       io:format("~p exiting gracefully. ~n", [Pid]),
       loop();
     {'EXIT', Pid, Reason} ->
      io:format("ERROR: ~p, ~p ~n", [Pid, Reason]),
       [[Name, Id]] = ets:select(my_sup, [{{'$1', '$2'}, [{'==', '$2', pid_to_list(Pid)}], [['$1', '$2']]}]),
       update_item(Pid, Name), loop();
     {add_item, Name} ->
       {ok, Pid} = my_mod:start(Name),
       ets:insert(?MODULE, {Name, pid_to_list(Pid)}),
       loop();
      {update_item, Pid, Name} ->
       {ok, NewPid} = my_mod:start(Name),
       ets:update_element(?MODULE, Name, {2, pid_to_list(NewPid)}),
       loop();
      {remove_item, Name} ->
        ets:delete(?MODULE, Name),
        Name ! stop, loop()
   end.

Note how now I am now calling the my_mod methods from the supervisor and when the spawn_link is called in the my_mod, it is linking back to the supervisor and not the shell. I am also able to enforce the prescribed supervisor interface add_item/1, remove_item/1 by passing the commands off the the receive loop where I can then preform other actions that dont break the arity of the interface methods. I tested with the following:

1> c(my_sup), c(my_mod), my_sup:start_link(), my_sup:add_item(a), my_sup:add_item(b), observer:start().
Looping 
Started a
Looping 
Started b
ok
2> my_sup:remove_item(a).
Looping 
<0.43.0> exiting gracefully. 
{remove_item,a}
Looping 
3> b ! exit .
ERROR: <0.44.0>, kill 
Looping 
exit
Looping 
Started b

Oh and i also spent some time going in circles as to why I could not call

exit(whereis(Pid), normal).

It turns this is explained by:

"...a call to exit(Pid, normal). This command doesn't do anything useful, because a process can not be remotely killed with the reason normal as an argument."

http://learnyousomeerlang.com/errors-and-processes

Hope this helps others...