I am trying to implement a socket server in my supervision tree. I have added the following to the main supervisor:
supervisor(AcceptorSup, [Application.get_env(:appname, :port)])
supervisor(Task.Supervisor, [[name: ClientSupervisor]])
And here is the AcceptorSup
module
defmodule AcceptorSup do
use Supervisor
require Logger
def start_link(port) do
Supervisor.start_link __MODULE__, [port: port], name: __MODULE__
end
def init(args \\ []) do
port = Keyword.get args, :port, 8000
{:ok, socket} = :gen_tcp.listen port, active: false, reuseaddr: true, packet: :raw
Logger.info "Server started on port #{port}"
children = [
worker(Acceptor, [socket], function: :start)
]
num_childs = Application.get_env :appname, :acceptors_num, 50
for _ <- 1..num_childs do
Task.start fn ->
Supervisor.start_child AcceptorSup, []
end
end
supervise(children, strategy: :simple_one_for_one, max_restarts: 1000, max_seconds: 10)
end
end
And here is the basic code of Acceptor
defmodule Acceptor do
require Logger
def start(socket) when is_port(socket) do
Task.start_link fn -> serve_client(socket) end
end
defp serve_client(socket) when is_port(socket) do
{:ok, client} = :gen_tcp.accept socket
Logger.info "A client connected #{address client}"
{:ok, pid} = Task.Supervisor.start_child ClientSupervisor, fn -> serve(client) end
:ok = :gen_tcp.controlling_process client, pid
Supervisor.start_child AcceptorSup, []
end
end
So, I start a Task.Supervisor
to handle clients, and that works fine. I also start a supervisor, of type simple_one_for_one
to handle listeners. Each child, waits for a connection, and when given, spawns a task for client in the task supervisor and starts another child instead of itself. The problem is that if the rate of clients connecting and disconnecting is high enough, the supervisor would crash, because it would reach values of max_restarts
and max_seconds
. I can increase the threshold by increasing values, but it just doesn't seem right to me. I am searching for a way to handle clients' connection and disconnection, and not hitting these limits.
How should one implement this, while also respecting the supervision tree? I don't want to use a custom process management, because I'm almost sure there is something I'm missing here. Anyway, Erlang/OTP was designed to handle such problems, right?