Circular lock-free buffer

circular buffer

lock-free programming

concurrency

data structures

buffer management

Circular lock-free buffer

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A circular lock-free buffer, often called a ring buffer, is a fixed-size queue designed for concurrent reads and writes without using a mutex around every operation. It is especially useful when one thread produces data and another thread consumes it at high frequency. The simplest practical design is a single-producer, single-consumer buffer built with atomics and careful memory ordering.

Start with the Single-Producer, Single-Consumer Case

The phrase "lock-free" covers many designs, but they do not all have the same complexity. A single-producer, single-consumer buffer is much easier to reason about than a queue with many writers and many readers.

In the SPSC model:

one thread is the only writer
one thread is the only reader
both threads share a fixed array
the producer advances tail
the consumer advances head

Because each index has exactly one owner, the design can avoid compare-and-swap loops in the common case. That makes the code faster and easier to verify.

Represent the Buffer with a Ring

The array is treated as circular. When an index reaches the end, it wraps back to the beginning. A common implementation stores Capacity + 1 slots so that head == tail can mean empty and the next position after tail colliding with head can mean full.

Here is a runnable C++ example:

cpp

1#include <array>
2#include <atomic>
3#include <cstddef>
4#include <iostream>
5
6template <typename T, std::size_t Capacity>
7class SpscRingBuffer {
8public:
9    bool push(const T& value) {
10        const auto tail = tail_.load(std::memory_order_relaxed);
11        const auto next = increment(tail);
12
13        if (next == head_.load(std::memory_order_acquire)) {
14            return false; // full
15        }
16
17        buffer_[tail] = value;
18        tail_.store(next, std::memory_order_release);
19        return true;
20    }
21
22    bool pop(T& value) {
23        const auto head = head_.load(std::memory_order_relaxed);
24
25        if (head == tail_.load(std::memory_order_acquire)) {
26            return false; // empty
27        }
28
29        value = buffer_[head];
30        head_.store(increment(head), std::memory_order_release);
31        return true;
32    }
33
34private:
35    static constexpr std::size_t StorageSize = Capacity + 1;
36
37    std::size_t increment(std::size_t index) const {
38        return (index + 1) % StorageSize;
39    }
40
41    std::array<T, StorageSize> buffer_{};
42    std::atomic<std::size_t> head_{0};
43    std::atomic<std::size_t> tail_{0};
44};
45
46int main() {
47    SpscRingBuffer<int, 4> queue;
48    queue.push(10);
49    queue.push(20);
50
51    int value = 0;
52    while (queue.pop(value)) {
53        std::cout << value << '\n';
54    }
55}

This example is intentionally narrow. It does not attempt multiple producers, blocking waits, dynamic growth, or memory reclamation. That is a strength, not a weakness, because most production bugs in lock-free code come from overreaching.

Why the Memory Orders Matter

The buffer works only if the consumer sees the written element before it sees the updated tail, and the producer sees the consumed slot before it reuses it. That is why the code uses memory_order_release when publishing a new index and memory_order_acquire when observing the other side's index.

The sequence is:

producer writes element into the array
producer stores the new tail with release semantics
consumer loads tail with acquire semantics
consumer can now safely read the published element

The same logic applies in reverse for head. If you weaken the memory ordering without understanding the consequences, the queue may pass tests on one machine and fail under load on another.

Full and Empty Conditions

A ring buffer always needs a rule to distinguish full from empty. The implementation above sacrifices one slot and uses these rules:

empty when head == tail
full when increment(tail) == head

That choice keeps the logic simple. Another design can maintain a separate count, but that introduces more shared state and more synchronization pressure.

If you know Capacity is a power of two, you can replace modulo with a bit mask for slightly faster wraparound. The tradeoff is that the code becomes more specialized.

When You Need More Than SPSC

Many questions about "lock-free circular buffers" really mean MPMC queues. That is a different class of problem. Once multiple producers or consumers enter the picture, simple head and tail ownership is gone. You usually need compare-and-swap, per-slot sequence numbers, or a different data structure entirely.

If the requirement is genuinely MPMC, do not stretch an SPSC design until it breaks. Use an algorithm built for that case, or a well-tested library implementation.

Common Pitfalls

The most common mistake is calling an SPSC buffer from more than one producer or more than one consumer. The code may appear to work in light testing and then corrupt data later.

Another frequent problem is using the wrong memory ordering. Replacing acquire and release operations with relaxed operations everywhere can produce stale reads or reordered writes.

Index overflow is less dramatic but still worth handling correctly. Use unsigned index types and wrap in a controlled way.

Finally, avoid storing objects with expensive copy behavior unless that tradeoff is acceptable. A ring buffer is often chosen for low latency, so the element type matters.

Summary

A circular lock-free buffer is easiest to implement correctly in the SPSC case
The core shared state is a fixed array plus atomic head and tail
Acquire and release semantics are what make the publication of elements safe
One slot is commonly sacrificed to distinguish full from empty
SPSC code is not automatically safe for MPMC use
Keep the design narrow unless you truly need a more complex queue