Using System.cmd within a Poolboy worker (gen_server) causes silent failure

828 views Asked by At

I've got a function which is spawned from a poolboy worker

basic overview:

  • Phoenix Controller calls Dispatcher with data
  • Dispatcher passes data to Poolboy worker
  • Poolboy worker spawns a new process with the given data to process
  • New process uses the data to call a system command (wget in this instance)

The problem I'm having is when I run the ExUnit test, it gets all the way through to the spawned process fine and I can output the data (using IO.inspect).

When I run the System.cmd("wget".... I see the wget output in the terminal when the ExUnit test runs, so the command is actually being run, but then anything I do after that command doesn't run.

So in my worker if I do this:

IO.puts "hello"
System.cmd("wget", opts)
IO.puts "world"

Then I see hello I see the output from wget but I don't see world

If I do something else like:

IO.puts "hello"
File.write("/tmp/temp.txt", "test")
IO.puts "world"

Then I see both hello and world and a file is written.

Is there something specific about System.cmd that I'm missing that is causing this to be the case? It works fine when it's not run within the separate process, so it's a combination of the process and and System.cmd.

Any ideas? Thanks!

2

There are 2 answers

12
Fred the Magic Wonder Dog On

You have entered the part of Elixir marked

"Here Be Dragons"

System.cmd is just a simple wrapper around Port and Port is an largely undocumented wrapper of the Erlang port function.

http://www.erlang.org/doc/man/erlang.html#open_port-2

The underlying Erlang BEAM process scheduler is built on the assumption that it can "swap" processes at very short time intervals. If you only use Erlang/Exilir code, it is all engineered to work in the BEAM VM. Any code that can potentially block or hang on a system call needs to be run in a driver. This is a special interface into the Erlang VM that isolates the Erlang scheduler from any processes that can hang on system calls.

The Ports driver is setup to deal with calls to external programs.

System.cmd ultimately calls

 do_cmd Port.open({:spawn_executable, cmd}, opts), initial, fun

The Port runs in a separate process and the do_cmd routine runs a receive loop until it receives the exit status from underlying Erlang port. So System.cmd will "block" that particular BEAM process until the wget unix process exits.

However, the rest of the Elixir BEAM processes will go on their merry way. I'm not familiar enough with PoolBoy to know if there is some kind of timeout monitor or heartbeat on your workers. However, if there is and the wget command exceeds this timeout, the worker process may exit before the wget command completes.

System.cmd isn't really setup to deal with all the issues around a command that could potentially take a long time. I'd suggest you look into the Porcelain module as a nice wrapper around the rather complex topic of Erlang ports.

https://github.com/alco/porcelain

Or since you are doing a simple wget, using an Elixir or Erlang HTTP client module would likely work much better within the BEAM framework.

0
Sean Tan On

I have the same problem and is able to sidestep it with running wget command with -q option.

System.cmd("wget", ["-q", url])

This seems to prevent the process from getting stuck, by quietening the output from wget.