Why tuples are not enumerable in Elixir?

3.1k views Asked by At

I need an efficient structure for array of thousands of elements of the same type with ability to do random access.

While list is most efficient on iteration and prepending, it is too slow on random access, so it does not fit my needs.

Map works better. Howerver it causes some overheads because it is intended for key-value pairs where key may be anything, while I need an array with indexes from 0 to N. As a result my app worked too slow with maps. I think this is not acceptable overhead for such a simple task like handling ordered lists with random access.

I've found that tuple is most efficient structure in Elixir for my task. When comparing to map on my machine it is faster

  1. on iteration - 1.02x for 1_000, 1.13x for 1_000_000 elements
  2. on random access - 1.68x for 1_000, 2.48x for 1_000_000
  3. and on copying - 2.82x for 1_000, 6.37x for 1_000_000.

As a result, my code on tuples is 5x faster than the same code on maps. It probably does not need explanation why tuple is more efficient than map. The goal is achieved, but everybody tells "don't use tuples for a list of similar elements", and nobody can explain this rule (example of such cases https://stackoverflow.com/a/31193180/5796559).

Btw, there are tuples in Python. They are also immutable, but still iterable.

So,

1. Why tuples are not enumerable in Elixir? Is there any technical or logical limitation?

2. And why should not I use them as lists of similar elements? Is there any downsides?

Please note: the questions is "why", not "how". The explanation above is just an example where tuples works better than lists and maps.

2

There are 2 answers

0
raacer On BEST ANSWER

1. The reason not to implement Enumerable for Tuple

From the retired Elixir talk mailing list:

If there is a protocol implementation for tuple it would conflict with all records. Given that custom instances for a protocol virtually always are defined for records adding a tuple would make the whole Enumerable protocol rather useless.

-- Peter Minten

I wanted tuples to be enumerable at first, and even eventually implemented Enumerable on them, which did not work out.

-- Chris Keele

How does this break the protocol? I'll try to put things together and explain the problem from the technical point of view.

Tuples. What's interesting about tuples is that they are mostly used for a kind of duck typing using pattern matching. You are not required to create new module for new struct every time you want some new simple type. Instead of this you create a tuple - a kind of object of virtual type. Atoms are often used as first elements as type names, for example {:ok, result} and {:error, description}. This is how tuples are used almost anywhere in Elixir, because this is their purpose by design. They are also used as a basis for "records" that comes from Erlang. Elixir has structs for this purpose, but it also provides module Record for compatibility with Erlang. So in most cases tuples represent single structures of heterogenous data which are not meant to be enumerated. Tuples should be considered like instances of various virtual types. There is even @type directive that allows to define custom types based on tuples. But remember they are virtual, and is_tuple/1 still returns true for all those tuples.

Protocols. On the other hand, protocols in Elixir is a kind of type classes which provide ad hoc polymorphism. For those who come from OOP this is something similar to superclasses and multiple inheritance. One important thing that protocol is doing for you is automatic type checking. When you pass some data to a protocol function, it checks that the data belongs to this class, i.e. that protocol is implemented for this data type. If not then you'll get error like this:

** (Protocol.UndefinedError) protocol Enumerable not implemented for {}

This way Elixir saves your code from stupid mistakes and complex errors unless you make wrong architectural decisions

Altogether. Now imagine we implement Enumerable for Tuple. What it does is making all tuples enumerable while 99.9% of tuples in Elixir are not intended to be so. All the checks are broken. The tragedy is the same as if all animals in the world begin quacking. If a tuple is passed to Enum or Stream module accidentally then you will not see useful error message. Instead of this your code will produce unexpected results, unpredictable behaviour and possibly data corruption.

2. The reason not to use tuples as collections

Good robust Elixir code should contain typespecs that help developers to understand the code, and give Dialyzer ability to check the code for you. Imagine you want a collection of similar elements. The typespec for lists and maps may look like this:

@type list_of_type :: [type]
@type map_of_type :: %{optional(key_type) => value_type}

But you can't write same typespec for tuple, because {type} means "a tuple of single element of type type". You can write typespec for a tuple of predefined length like {type, type, type} or for a tuple of any elements like tuple(), but there is no way to write a typespec for a tuple of similar elements just by design. So choosing tuples to store your collection of elemenets means you lose such a nice ability to make your code robust.

Conclusion

The rule not to use tuples as lists of similar elements is a rule of thumb that explains how to choose right type in Elixir in most cases. Violation of this rule may be considered as possible signal of bad design choice. When people say "tuples are not intended for collections by design" this means not just "you do something unusual", but "you can break the Elixir features by doing wrong design in your application".

If you really want to use tuple as a collection for some reason and you are sure you know what you do, then it is a good idea to wrap it into some struct. You can implement Enumerable protocol for your struct without risk to break all things around tuples. It worth to note that Erlang uses tuples as collections for internal representation of array, gb_trees, gb_sets, etc.

iex(1)> :array.from_list ['a', 'b', 'c']
{:array, 3, 10, :undefined,
 {'a', 'b', 'c', :undefined, :undefined, :undefined, :undefined, :undefined,
  :undefined, :undefined}}

Not sure if there is any other technical reason not to use tuples as collections. If somebody can provide another good explanation for the conflict between the Record and the Enumerable protocol, he is welcome to improve this answer.

3
Aleksei Matiushkin On

As you are sure you need to use tuples there, you might achieve the requested functionality at a cost of compilation time. The solution below will be compiling for long (consider ≈100s for @max_items 1000.) Once compiled the execution time would gladden you. The same approach is used in Elixir core to build up-to-date UTF-8 string matchers.

defmodule Tuple.Enumerable do
  defimpl Enumerable, for: Tuple do
    @max_items 1000

    def count(tuple), do: tuple_size(tuple)

    def member?(_, _), do: false # for the sake of compiling time

    def reduce(tuple, acc, fun), do: do_reduce(tuple, acc, fun)

    defp do_reduce(_,       {:halt, acc}, _fun),   do: {:halted, acc}
    defp do_reduce(tuple,   {:suspend, acc}, fun)  do
      {:suspended, acc, &do_reduce(tuple, &1, fun)}
    end
    defp do_reduce({},      {:cont, acc}, _fun),   do: {:done, acc}
    defp do_reduce({value}, {:cont, acc}, fun)     do
      do_reduce({}, fun.(value, acc), fun)
    end

    Enum.each(1..@max_items-1, fn tot ->
      tail = Enum.join(Enum.map(1..tot, & "e_★_#{&1}"), ",")
      match = Enum.join(["value"] ++ [tail], ",")
      Code.eval_string(
        "defp do_reduce({#{match}}, {:cont, acc}, fun) do
           do_reduce({#{tail}}, fun.(value, acc), fun)
         end", [], __ENV__
      )
    end)

    defp do_reduce(huge,    {:cont, _}, _) do
      raise Protocol.UndefinedError, 
            description: "too huge #{tuple_size(huge)} > #{@max_items}",
            protocol: Enumerable,
            value: Tuple
    end
  end
end

Enum.each({:a, :b, :c}, fn e ->  IO.puts "Iterating: #{e}" end)
#⇒ Iterating: a
#  Iterating: b
#  Iterating: c

The code above explicitly avoids the implementation of member?, since it would take even more time to compile while you have requested the iteration only.