When Discord handles 10 million concurrent voice users, or when a telecom switch routes thousands of calls per second, they can’t afford downtime. The secret? Architecture patterns originally developed for telephone switches in the 1980s—patterns that automatically recover from failures and scale horizontally without breaking a sweat.
In this deep dive, we’ll explore how to build systems that self-heal using Elixir’s GenServer—the same foundation powering WhatsApp (2 billion users) and countless gaming backends.
What is a GenServer?
A GenServer is a process that implements a client-server relationship. It encapsulates state, provides synchronous and asynchronous communication patterns, and integrates seamlessly with OTP supervision trees for fault tolerance.
Creating Your First GenServer
Let’s build a simple counter that demonstrates the core concepts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
| defmodule Counter do
use GenServer
# Client API
def start_link(initial_value \\ 0) do
GenServer.start_link(__MODULE__, initial_value, name: __MODULE__)
end
def increment do
GenServer.call(__MODULE__, :increment)
end
def decrement do
GenServer.call(__MODULE__, :decrement)
end
def get_value do
GenServer.call(__MODULE__, :get_value)
end
def increment_async do
GenServer.cast(__MODULE__, :increment)
end
# Server Callbacks
@impl true
def init(initial_value) do
{:ok, initial_value}
end
@impl true
def handle_call(:increment, _from, state) do
new_state = state + 1
{:reply, new_state, new_state}
end
@impl true
def handle_call(:decrement, _from, state) do
new_state = state - 1
{:reply, new_state, new_state}
end
@impl true
def handle_call(:get_value, _from, state) do
{:reply, state, state}
end
@impl true
def handle_cast(:increment, state) do
{:noreply, state + 1}
end
end
|
Understanding Call vs Cast
GenServer provides two primary communication patterns:
call/2: Synchronous. The client waits for a response. Use when you need the result immediately.cast/2: Asynchronous. Fire-and-forget. Use for operations where you don’t need a response.
1
2
3
4
5
| # Synchronous - waits for response
Counter.increment() # Returns the new value
# Asynchronous - returns immediately
Counter.increment_async() # Returns :ok
|
Real-World Example: Rate Limiter
Let’s build something more practical—a rate limiter that tracks API requests per client:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
| defmodule RateLimiter do
use GenServer
@max_requests 100
@window_ms 60_000 # 1 minute
# Client API
def start_link(opts \\ []) do
GenServer.start_link(__MODULE__, %{}, opts)
end
def check_rate(pid, client_id) do
GenServer.call(pid, {:check_rate, client_id})
end
# Server Callbacks
@impl true
def init(_opts) do
schedule_cleanup()
{:ok, %{}}
end
@impl true
def handle_call({:check_rate, client_id}, _from, state) do
now = System.monotonic_time(:millisecond)
client_requests = Map.get(state, client_id, [])
|> Enum.filter(fn timestamp -> now - timestamp < @window_ms end)
if length(client_requests) < @max_requests do
new_requests = [now | client_requests]
new_state = Map.put(state, client_id, new_requests)
{:reply, {:ok, @max_requests - length(new_requests)}, new_state}
else
{:reply, {:error, :rate_limited}, state}
end
end
@impl true
def handle_info(:cleanup, state) do
now = System.monotonic_time(:millisecond)
cleaned_state = state
|> Enum.map(fn {client_id, requests} ->
valid_requests = Enum.filter(requests, fn ts -> now - ts < @window_ms end)
{client_id, valid_requests}
end)
|> Enum.reject(fn {_client_id, requests} -> Enum.empty?(requests) end)
|> Map.new()
schedule_cleanup()
{:noreply, cleaned_state}
end
defp schedule_cleanup do
Process.send_after(self(), :cleanup, @window_ms)
end
end
|
Supervision and Fault Tolerance
The real power of GenServer comes when combined with Supervisors. If your GenServer crashes, the supervisor can restart it automatically:
1
2
3
4
5
6
7
8
9
10
11
12
13
| defmodule MyApp.Application do
use Application
def start(_type, _args) do
children = [
{Counter, 0},
{RateLimiter, name: RateLimiter}
]
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end
end
|
Best Practices
Keep state minimal: Only store what you need. Large states increase memory usage and make restarts slower.
Use handle_continue for initialization: Don’t block init/1 with expensive operations:
1
2
3
4
5
6
7
8
9
10
| @impl true
def init(opts) do
{:ok, %{}, {:continue, :load_data}}
end
@impl true
def handle_continue(:load_data, state) do
data = expensive_data_load()
{:noreply, Map.put(state, :data, data)}
end
|
- Handle timeouts: Protect against slow operations:
1
2
3
| def get_data(pid) do
GenServer.call(pid, :get_data, 5_000) # 5 second timeout
end
|
- Use
terminate/2 for cleanup:
1
2
3
4
5
| @impl true
def terminate(_reason, state) do
save_state_to_disk(state)
:ok
end
|
Conclusion
These patterns are why gaming companies choose this architecture for multiplayer servers, why telecom providers trust it for carrier-grade reliability, and why fintech platforms use it for payment processing. The ability to handle failures gracefully—automatically isolating and restarting failed components—means your users never experience downtime.
Whether you’re building a game that needs to handle launch-day traffic spikes, a trading platform that can’t lose a single transaction, or a messaging system for millions of users, these architectural patterns are your foundation.
At Sajima Solutions, we build fault-tolerant systems for gaming, telecom, and finance across the Gulf region. Contact us to discuss your next project.