C++11
Async Programming
Tree Data Structure
Concurrent Operations
Threading

C11 Async operation on a tree

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Tree algorithms are naturally recursive, which makes them a strong candidate for task-based parallelism. In C++11, std::async can run work for one subtree while the current thread continues with another, giving you concurrency without managing raw threads directly.

Why Trees Work Well With std::async

Many read-only tree operations split cleanly into independent left and right branches. Summing values, counting nodes, computing height, or searching for a predicate can all be expressed as "solve the left subtree, solve the right subtree, then combine the answers."

That independence matters. If the two branches do not share mutable state, you can run them concurrently with little coordination. C++11 provides this through the \<future> header, mainly with std::async and std::future.

Use this approach when the work per subtree is large enough to amortize scheduling cost. Spawning a task for every node is usually too expensive, especially for shallow or tiny trees.

A Practical Example

The program below computes the sum of a binary tree. It spawns asynchronous work only for the upper part of the tree, then falls back to plain recursion deeper down. That depth limit prevents the function from creating hundreds of tiny tasks.

cpp
1#include <future>
2#include <iostream>
3#include <memory>
4
5struct Node {
6    int value;
7    std::unique_ptr<Node> left;
8    std::unique_ptr<Node> right;
9};
10
11int sum_tree(const Node* node, int depth = 0) {
12    if (!node) {
13        return 0;
14    }
15
16    // Stop creating tasks once the subtree becomes small.
17    if (depth >= 3) {
18        return node->value
19             + sum_tree(node->left.get(), depth + 1)
20             + sum_tree(node->right.get(), depth + 1);
21    }
22
23    auto left_future = std::async(std::launch::async, [node, depth]() {
24        return sum_tree(node->left.get(), depth + 1);
25    });
26
27    int right_sum = sum_tree(node->right.get(), depth + 1);
28    int left_sum = left_future.get();
29
30    return node->value + left_sum + right_sum;
31}
32
33int main() {
34    auto root = std::make_unique<Node>(Node{10});
35    root->left = std::make_unique<Node>(Node{4});
36    root->right = std::make_unique<Node>(Node{6});
37    root->left->left = std::make_unique<Node>(Node{2});
38    root->left->right = std::make_unique<Node>(Node{3});
39    root->right->right = std::make_unique<Node>(Node{8});
40
41    std::cout << "Tree sum: " << sum_tree(root.get()) << '\n';
42}

This pattern does a few important things:

  • 'std::launch::async requests real asynchronous execution instead of deferred execution.'
  • The current thread keeps useful work by computing the right subtree immediately.
  • 'left_future.get() synchronizes with the spawned task and rethrows any exception from that branch.'

Designing The Combination Step

The safest pattern is to make each subtree return a value and combine the results at the parent. That avoids locks, atomics, and race conditions. For example, a height calculation looks nearly identical:

cpp
1#include <algorithm>
2
3int height(const Node* node, int depth = 0) {
4    if (!node) {
5        return 0;
6    }
7
8    if (depth >= 3) {
9        return 1 + std::max(height(node->left.get(), depth + 1),
10                            height(node->right.get(), depth + 1));
11    }
12
13    auto left_future = std::async(std::launch::async, [node, depth]() {
14        return height(node->left.get(), depth + 1);
15    });
16
17    int right_height = height(node->right.get(), depth + 1);
18    int left_height = left_future.get();
19    return 1 + std::max(left_height, right_height);
20}

Returning values scales better than having every task update a shared counter. Once shared mutation enters the picture, the tree is no longer embarrassingly parallel and the code gets much harder to reason about.

When Async Helps And When It Does Not

std::async is useful when each subtree performs enough CPU work to justify task overhead. Examples include evaluating expensive expressions, scanning large node payloads, or running heavier numeric logic per branch.

It helps less when the tree is tiny, highly unbalanced, or dominated by memory latency instead of actual computation. A linked-list-shaped tree offers very little parallelism because one side is almost always empty. In those cases, the asynchronous version can be slower than a plain recursive function.

A practical rule is to measure with representative data. Parallel recursion often looks elegant, but runtime behavior depends on tree shape, node cost, and the number of available cores.

Common Pitfalls

  • Launching a task for every node. Task creation overhead can erase any speedup. Add a cutoff by depth or estimated subtree size.
  • Forgetting to call get(). The program then loses synchronization and may also skip exception propagation from the worker.
  • Writing to shared state from both branches. Prefer returning values from each subtree instead of protecting a shared accumulator.
  • Capturing data with the wrong lifetime. If a task outlives an object captured by reference, the program has undefined behavior.
  • Assuming std::async always means a new thread unless you pass std::launch::async. Without that policy, execution may be deferred until get().

Summary

  • Trees are a good fit for parallel recursion because left and right subtrees are often independent.
  • In C++11, std::async plus std::future is the simplest way to express subtree concurrency.
  • Limit task creation with a cutoff so the runtime does not spend more time scheduling than computing.
  • Return values from each branch to keep the code race-free and easier to test.

Course illustration
Course illustration

All Rights Reserved.