A Cauldron of Black and White Stones

Peter Minten's blog

Parallelism in Elixir

Parallelization is something that comes up more and more now that processors get more and more cores. In this blog post I will take a look at how Elixir supports parallelism.

First the usual lecture on parallelism and concurrency. These are often mixed up, probably because traditionally to get parallelism you needed to work with the tools of concurrency (threads and mutexes and all that jazz).

Parallelism is fundamentally about taking something you can do in a sequential (one process/thread) program and making it faster by doing parts of it at the same time. Concurrency on the other hand is about making things possible that are impossible in a sequential program.

For example when you have a function that blocks until it receives input from the keyboard and a function that blocks until it receives input from the network and you want to watch for both you’re going to need multiple threads/processes and thus concurrency (assuming there’s no select(3) or something available).

On the other hand it’s easy to count the words in a giant file sequentially, in just a single process/thread. But if you can split up the file and have multiple workers each count part of the file and those workers can run all at the same time (in parallel) you can get your result much faster.

When working with parallelism you ideally want to convert a sequential algorithm into a parallel one without changing the semantics. Now the bad news, in Elixir this will not work. In Haskell it will. Haskell has deterministic parallelism, Elixir doesn’t. Well, technically Haskell doesn’t have deterministic parallelism (that’s simply impossible to do with how computers work) but it’s so good at hiding the non-determinism that for all intents and purposes it has deterministic parallelism.

In Elixir we don’t quite have it so easy as Haskell programmers. We have to make do with the tools that Erlang gives us. Ok, that’s perhaps not the worst fate. Erlang supports concurrency quite well (understatement of the century) and with concurrency we can build parallelism.

A simple parallel word count

Let’s think about parallel word count. How can we implement it using the tools of Elixir? First, consider how the algorithm works:

  1. Divide the input lines into equally sized groups.
  2. For each of the groups, in parallel, count the number of words.
  3. Sum the results.

This is actually quite a well known pattern: divide-map-reduce. Any time you hear somebody talk about MapReduce this is the fundamental idea behind it: divide the work, map it to workers and reduce the results. There’s more to it than that of course, but fundamentally it’s this simple.

Now, how can we map (pardon the pun) these steps to Elixir code? Dividing is fairly easy, because addition is commutative (a + b == b + a) we can simply split the lines into N lists like this:

1
2
3
4
5
def divide(lines, n), do: do_divide(lines, List.duplicate([], n), [])

defp do_divide([], o1, o2), do: o1 ++ o2
defp do_divide([l|ls], [o|o1], o2), do: do_divide(ls, o1, [[l|o]|o2])
defp do_divide(ls, [], o2), do: do_divide(ls, o2, [])

Running divide([1,2,3,4,5,6,7,8,9,10], 3) gives [[8, 5, 2], [7, 6, 1], [10, 9, 4, 3]], so the lines are neatly divided. Ok, order is weird but we’ve already seen that that doesn’t matter (addition is commutative). Compared to the obvious solution of slicing the list into equal pieces we don’t need the length, which means this algorithm will be easier to adapt to a form that works with any enumerable. Mostly though it’s because I learned functional programming in Haskell and this algorithm works well with lazyness. :)

For each of these groups of lines we’ll want to spawn a worker.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def parcount_spawn(groups) do
  master = self()
  Enum.each(groups, fn group ->
    Process.spawn(fn ->
      # Gotta love closures: group is captured
      # (If only I could do that so easily for the white Go groups)
      count = Enum.map(group, &length(split_words(&1)))
              |> Enum.reduce(0, &(&1+&2))
      master <- count
    end)
  end)
end

defp split_words(""), do: [] # String.split gives [""] in this case
defp split_words(s), do: String.split(s)

Each of the workers counts the words in the lines in its group, sums them and sends the count to the master process (the one that spawned the workers).

All that remains is to collect the subtotals and sum them.

1
2
3
4
5
6
7
def parcount_collect(n) do
  Enum.reduce(1..n, 0, fn _, total ->
    receive do
      count -> total + count
    end
  end)
end

Here the reduce of a range is a bit of a trick in order to go into receive a specific number of times.

To put it all together:

1
2
3
4
5
def parcount(lines, n) do
  groups = divide(lines, n)
  parcount_spawn(groups)
  parcount_collect(n)
end

When we pass parcount(["A", "B C", " ", "D D D ", ""], 3) it neatly gives us 6.

Thinking in Elixir: Hide Your Messages

In this new blog series I will try to explain some of the concepts behind programming in Elixir. This will be less practical oriented than my Elixir Patterns series and more focussed on the big ideas of functional concurrent programming as supported by Elixir.

Interface, messages and implementation

Elixir is a language in which concurrency is done by passing messages between processes. However in practical code it’s actually pretty rare to see explicit message passing. For example say we’re working on a chat server. A chat room could be written somewhat like this, using otp_dsl:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
defmodule Chatroom do
  use OtpDsl.GenServer, initial_state: HashDict.new()

  defcall enter(name), users do
    send_all(users, "#{name} has entered the room")
    # _from is secretly filled in by defcall and contains the PID of the caller
    reply(:ok, Dict.put(users, name, _from))
  end

  defcall leave(name), users do
    d = Dict.delete(users, name)
    send_all(users, "#{name} has left the room")
    reply(:ok, d)
  end

  defcall message(name, message) do
    send_all(users, message)
    reply(:ok, d)
  end

  defp send_all(users, message) do
    Enum.each(Dict.values(users), User.send_line(&1, message))
  end
end

Elixir Patterns: Abstract Data Structures

Wouldn’t you like to define a data structure as a module, with the internals hidden? Obviously you can, but because modules are not records the data structure then can’t be used with protocols. You could define a record and in the do block put functions to manipulate it but then you’re making it easy for your users to use the record directly, and trample on the invariants in the process.

Record tags

Luckily there is a trick. It’s not very obvious but if you read the documentation for Kernel.defrecordp/3 closely there is a tag field which is used in an example to set the tag to the name of the enclosing module. What’s the tag? Well in Erlang, and Elixir, records are stored as tuples with the first field being a “tag” that distinguishes different records and the other fields being the data of the record. With defrecord the tag is always the name of the record module, with defrecordp it is the record name.

1
2
3
4
defrecordp :foo, a: nil, b: nil
# name = :foo, tag = :foo, foo(a: 1, b: 2) ==> { :foo, 1, 2 }
defrecordp :foo, :bar, a: nil, b: nil
# name = :foo, tag = :bar, foo(a: 1, b: 2) ==> { :bar, 1, 2 }

When defining implementations of protocols the name you pass to for: is the tag of the record. This makes perfect sense as the protocol doesn’t need to know anything about your record except how to recognize it, and that’s what the tag is for.

Elixir Patterns: Mixins

Elixir’s macros are a powerful way to add language features without encoding them in the language core itself. One of such features is a pattern one could call “mixin” (by analogy to the Mixin concept in class based object-oriented languages.

In Ruby mixin modules are used for example to define the ==, <, >=, etc operators for a class if the <=> (compare) operator is defined. The compare operator returns -1 if the first argument is smaller than the second, 0 if it is equal to the second and 1 if it is greater than the second. Obviously if you have that it’s easy to define < and friends. By including the Comparable mixin you get most of the comparison operators for free, just remember to define <=>.

Default functions for protocols

In Elixir we don’t have classes but we do have similar situations where you generally want to define something in terms of something else. Take for example Enumerable. The Enumerable protocol has three methods: reduce/3, count/1 and member?/2. However you can always define count and member? in terms of reduce, they’re just there so that you can override them with a more efficient implementation.

Because protocols don’t support default definitions for a method you always have to define all three, even if you don’t do anything special for count and member?. That is to say for a simple binary tree you have to write:

defrecord BinTree, value: nil, left: nil, right: nil

defimpl Enumerable, for: BinTree do
  def reduce(nil, acc, _), do: acc
  def reduce(BinTree[value: value, left: left, right: right], acc, fun) do
    acc1 = fun.(value, acc)
    acc2 = reduce(left, acc1, fun)
    reduce(right, acc2, fun)
  end

  def count(BinTree[] = t) do
    reduce(t, 0, fn (_, acc) -> acc + 1 end)
  end
  def member?(BinTree[] = t, x) do
    reduce(t, false, fn (v, acc) -> acc or x == v end)
  end
end

Using a mixin we can simplify this. A mixin in Elixir is generally defined as a module with a __using__ macro.

defmodule Enumerable.Mixin do
  defmacro __using__(_) do
    quote location: keep do
      def count(e) do
        reduce(e, 0, fn (_, acc) -> acc + 1 end)
      end
      def member?(e, x) do
        reduce(e, false, fn (v, acc) -> acc or x == v end)
      end
    end
  end
end

Elixir’s Enumerable

Sometimes Elixir can be quite a pain. Often it’s because the libraries don’t yet have what you need or because a feature that looks like it’s going to do exactly what you want turns out to be more limited than you would like (|> I’m looking at you). Sometimes however the pain is of a different kind, the kind you get from trying to understand why that seemingly logical Go move is the exact opposite of what’s good, the kind you get from reading about category theory, the kind you get from trying to use the newest kind of lenses in Haskell. In other words the kind of pain that comes from desperately trying to understand something that seems to make no sense at all.

And then, after beating your head against the proverbial desk enough time, you suddenly realize that things are starting to make sense and that it’s actually all quite clear.

Today I had one of those moments. After trying to produce a better stream design based on functions that return a value and a replacement for themselves (think Continuation Passing Style) I ran into a problem where my Iter.to_list was much slower than Enum.to_list. When I asked on IRC @ericmj told me that it’s just not possible to make my iterator approach work fast enough and that that’s the reason Elixir switched to a reduce based approach. He also gave me the excellent tip to look at Clojure’s reducers, which was the inspiration for Elixir’s design.

While the blog posts linked from Clojure’s documentation are informative they are (naturally) about Clojure and not everybody understands that. So here’s an attempt to explain things in an Elixir context.