I need an efficient structure for array of thousands of elements of the same type with ability to do random access.
While list is most efficient on iteration and prepending, it is too slow on random access, so it does not fit my needs.
Map works better. Howerver it causes some overheads because it is intended for key-value pairs where key may be anything, while I need an array with indexes from 0 to N. As a result my app worked too slow with maps. I think this is not acceptable overhead for such a simple task like handling ordered lists with random access.
I've found that tuple is most efficient structure in Elixir for my task. When comparing to map on my machine it is faster
- on iteration - 1.02x for 1_000, 1.13x for 1_000_000 elements
- on random access - 1.68x for 1_000, 2.48x for 1_000_000
- and on copying - 2.82x for 1_000, 6.37x for 1_000_000.
As a result, my code on tuples is 5x faster than the same code on maps. It probably does not need explanation why tuple is more efficient than map. The goal is achieved, but everybody tells "don't use tuples for a list of similar elements", and nobody can explain this rule (example of such cases https://stackoverflow.com/a/31193180/5796559).
Btw, there are tuples in Python. They are also immutable, but still iterable.
So,
1. Why tuples are not enumerable in Elixir? Is there any technical or logical limitation?
2. And why should not I use them as lists of similar elements? Is there any downsides?
Please note: the questions is "why", not "how". The explanation above is just an example where tuples works better than lists and maps.
1. The reason not to implement Enumerable for Tuple
From the retired Elixir talk mailing list:
-- Peter Minten
-- Chris Keele
How does this break the protocol? I'll try to put things together and explain the problem from the technical point of view.
Tuples. What's interesting about tuples is that they are mostly used for a kind of duck typing using pattern matching. You are not required to create new module for new struct every time you want some new simple type. Instead of this you create a tuple - a kind of object of virtual type. Atoms are often used as first elements as type names, for example
{:ok, result}
and{:error, description}
. This is how tuples are used almost anywhere in Elixir, because this is their purpose by design. They are also used as a basis for "records" that comes from Erlang. Elixir has structs for this purpose, but it also provides module Record for compatibility with Erlang. So in most cases tuples represent single structures of heterogenous data which are not meant to be enumerated. Tuples should be considered like instances of various virtual types. There is even@type
directive that allows to define custom types based on tuples. But remember they are virtual, andis_tuple/1
still returns true for all those tuples.Protocols. On the other hand, protocols in Elixir is a kind of type classes which provide ad hoc polymorphism. For those who come from OOP this is something similar to superclasses and multiple inheritance. One important thing that protocol is doing for you is automatic type checking. When you pass some data to a protocol function, it checks that the data belongs to this class, i.e. that protocol is implemented for this data type. If not then you'll get error like this:
This way Elixir saves your code from stupid mistakes and complex errors unless you make wrong architectural decisions
Altogether. Now imagine we implement Enumerable for Tuple. What it does is making all tuples enumerable while 99.9% of tuples in Elixir are not intended to be so. All the checks are broken. The tragedy is the same as if all animals in the world begin quacking. If a tuple is passed to Enum or Stream module accidentally then you will not see useful error message. Instead of this your code will produce unexpected results, unpredictable behaviour and possibly data corruption.
2. The reason not to use tuples as collections
Good robust Elixir code should contain typespecs that help developers to understand the code, and give Dialyzer ability to check the code for you. Imagine you want a collection of similar elements. The typespec for lists and maps may look like this:
But you can't write same typespec for tuple, because
{type}
means "a tuple of single element of typetype
". You can write typespec for a tuple of predefined length like{type, type, type}
or for a tuple of any elements liketuple()
, but there is no way to write a typespec for a tuple of similar elements just by design. So choosing tuples to store your collection of elemenets means you lose such a nice ability to make your code robust.Conclusion
The rule not to use tuples as lists of similar elements is a rule of thumb that explains how to choose right type in Elixir in most cases. Violation of this rule may be considered as possible signal of bad design choice. When people say "tuples are not intended for collections by design" this means not just "you do something unusual", but "you can break the Elixir features by doing wrong design in your application".
If you really want to use tuple as a collection for some reason and you are sure you know what you do, then it is a good idea to wrap it into some struct. You can implement Enumerable protocol for your struct without risk to break all things around tuples. It worth to note that Erlang uses tuples as collections for internal representation of
array
,gb_trees
,gb_sets
, etc.Not sure if there is any other technical reason not to use tuples as collections. If somebody can provide another good explanation for the conflict between the Record and the Enumerable protocol, he is welcome to improve this answer.