lists:map with side-effects in Erlang

390 views Asked by At

I have a list of batches(sublists) of ids and I want to iterate over this list and spawn a worker process for each id in the batch of ids. Each of these workers will query some service, get the result and send it back to the caller. In simple words I want to map a list of ids to a list of data which I get by virtue of those ids. I managed to achieve this but in, what I believe is, an unidiomatic way:

lists:map(fun(Ids) ->
Pids = [spawn_link(fun() ->
    Result = [...] % Here goes a side-effect operation (http request)
    Self ! {received_data, process(Result)}
end) || Id <- Ids],
[receive {received_data, Data} -> Data end || _Pid <- Pids],
end, JobChunks)))

In this case as you see I misuse map function as it's designed to be side-effect free. But i don't see another option. There is foreach() but it is used only to run side-effects and just returns ok whereas in my case I want to preserve shape of a list too. In Haskell there is a handy type-class Traversable with traverse function which exactly does this: runs fmap and in the same time allows you to perform action (effect) on each item. Is there something similar in Erlang? (like smap perhaps?).

2

There are 2 answers

0
AudioBubble On BEST ANSWER

Erlang unlike Haskell is not a pure functional programming language. As a corollary it does not put restrictions on functions in terms of whether they can or can't have side effects. In Haskell even I/O subsystem can't break its purity and that's why there exists a distinction on a type-level between Traversable and Functor (traverse and fmap) where the former can run effects upon each element of the container and the latter can't. In Erlang there is no such clear distinction and, as a result, you may have a function execute(Container) -> and you don't know whether it will or won't run effects by just gazing at its signature. That's why having map and smap (or traverse, or whatever you call it) in Erlang does not make sense and does not bring any value whatsoever. But it is true that using lists:map for this sort of operation breaks a contract of map which is supposed to be a pure function. In this kind of situation I may recommend you to use a list comprehension which in my opinion is a more idiomatic way:

[begin
    Pids = [spawn_link(fun() ->
        % Side-effect operation which worker performs
    end) || Id <- Ids],
   [receive {received_data, Data} -> Data end || _Pid <- Pids]
end || Ids <- JobChunks].

Again in my own viewpoint side effects thing is a major difference between list comprehensions and lists:map(). When they are used in the aforementioned way I ordinarily think of them as of Haskell's monad comprehensions.

0
Brujo Benavides On

I like @Oleksandr answer, but using a begin..end block within a list comprehension feels a bit dirty. I would use functions for that.

It’s also important to note that the second part of his answer does not guarantee to respect the order of the original list (i.e. it will just have the same # of elements but they will be sorted according to the order in which they arrive). That may be fine with you, but if you want to be able to match inputs (Ids) and outputs (Results), you have to use selective receives as I’ll show you below.

So, this is how I would implement it without OTP (Since you’re not using OTP either):

your_function() ->
    [process_chunk(Ids) || Ids <- JobChunks].

process_chunk(Ids) ->
    Pids = [spawn_side_effect_fun(Id) || Id <- Ids],
    [get_result_for(Pid) || _Pid <- Pids].

spawn_side_effect_fun(Id) ->
    Self = self(),
    spawn_link(fun() ->
        Self ! {received_data, self(), your_side_effect_operation()}
    end).

get_result_for(Pid) ->
    receive
        %% Here we're pattern-matching on Pid
        %% so that we get the result for this particular Pid
        %% therefore the order is preserved in the final list.
        {received_data, Pid, Data} -> Data
    end.

It’s also important to notice that you’re not handling any errors here. Since you’re not trapping exits, an error in a spawned process will just kill the main one.