Wouldn’t you like to define a data structure as a module, with the internals
hidden? Obviously you can, but because modules are not records the data
structure then can’t be used with protocols. You could define a record and in
do block put functions to manipulate it but then you’re making it easy for
your users to use the record directly, and trample on the invariants in the
Luckily there is a trick. It’s not very obvious but if you read the
Kernel.defrecordp/3 closely there is a tag field which is
used in an example to set the tag to the name of the enclosing module. What’s
the tag? Well in Erlang, and Elixir, records are stored as tuples with the first
field being a “tag” that distinguishes different records and the other fields
being the data of the record. With
defrecord the tag is always the name of the
record module, with
defrecordp it is the record name.
1 2 3 4
When defining implementations of protocols the name you pass to
for: is the
tag of the record. This makes perfect sense as the protocol doesn’t need to know
anything about your record except how to recognize it, and that’s what the tag
An abstract binary tree
Using record tags you can write an abstract binary tree module like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
__MODULE__ is a special variable that expands to the name of the module. I
could have written
BinTree in it’s place, but
__MODULE__ stands out and is a
good reminder of what trick is used here.
The module works as you’d expect:
1 2 3 4
We’re still showing the guts of the record through
inspect though, but that’s
1 2 3
Just remember that we’re only hiding the representation when pretty printed, if
raw: true to inspect you’ll still see the internal representation.
Due to how Erlang (and thus Elixir) works this is unavoidable, but the pretty
printing serves as a powerful reminder to the user that the internal
representation should not be relied upon.
Dispatching on the record tag
There is another trick I’d like to show you. If you look at the source of the
Set module from the standard library most functions look like this:
1 2 3
Huh? Object-oriented programming. Nope, just plain old
apply/3 (if the first
argument is not a literal symbol Elixir translates this to runtime apply call).
The magic is in the
1 2 3 4 5 6 7 8 9
Remember that the first field of a record is the tag, so
elem(rec, 0) extracts
the record tag. If you call
Set.difference(s1, s2) (where
target(s1).difference(s1, s2) gets called, which expands to
(ignoring the tuple check)
elem(s1, 0).difference(s1, s2), which effectively
apply(HashSet, :difference, [s1, s2]).
When I first saw that code I spent some time scratching my head over why they
didn’t use a protocol. I’m still not 100% sure, but I strongly suspect
performance is one of the main reasons. As a test I made a very simple benchmark
to compare the overhead of calling through a protocol vs calling through
Running this a number of times gives a pretty consistent result after the first few runs: the protocol calls are about 14-15 times slower than the dispatch calls (which is how I named the “through apply” calls, not sure if it’s the right term).
Now protocol calls are known to be slow (that’s a big part of why Elixir has
reduce based collections, those only require a few protocol calls even for
large amounts of data). There are
ideas on how to fix that and
once those get implemented things may change for released code. For now however
indirect calls through
apply are significantly faster.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
If you look at
Set again notice that it’s a behaviour with callbacks of the
same types and arity for each exposed function. All the implementation modules
HashSet in the standard library) implement the
Set behaviour and the
compiler makes sure that all the functions are indeed implemented. If you’re
going to do the same protocol replacing trick as
Set it’s a good idea to
define a behaviour as well.
So what are the downsides of the abstract data structure technique? Well you
lose the nice record syntax and decomposition and have to work with the macros
defrecordp generates for you. And it’s a bit more work getting everything set
But let’s face it, those are minor issues. What you get is the ability to offer a data structure without exposing it’s representation. You get the ability to make a data structure that (ugly hacks aside) can only be manipulated through the API you provide, an API that maintains the internal invariants.
If you define a public record stop and think for a moment why you are defining the record that way, because often hiding it in a module results in cleaner, more maintainable code.