Concatenating red-black trees

Red-black trees

Tree data structures

Concatenation algorithms

Data structure optimization

Computer science algorithms

Concatenating red-black trees

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Concatenating red-black trees is usually described as a join operation: combine two trees T1 and T2, where every key in T1 is less than every key in T2, into one valid red-black tree. The subtle part is that you cannot just make one tree the child of the other and hope the balancing rules survive; you need to respect black height and then repair red-black properties carefully.

The Right Problem Statement

Most textbook formulations use a separator key x such that:

every key in T1 is less than x
every key in T2 is greater than x

Then the operation is JOIN(T1, x, T2).

If you only have two trees and no separator key, a common trick is:

remove the maximum key from T1, or
remove the minimum key from T2

and use that removed key as the separator.

That reduces concatenation to the standard join problem.

Why a Naive Root Merge Is Not Enough

A tempting but incorrect idea is:

remove max from T1
make that key the new root
attach T1 and T2 directly as children

This works only in special cases. In general, the black heights of the two trees may differ, and a direct splice can violate the red-black invariants.

The real algorithm must place the separator at a point where the black-height conditions can still be restored with local fix-up.

Black Height Is the Key Invariant

A red-black tree requires every path from a node to a leaf to contain the same number of black nodes. That number is called the black height.

Suppose:

'bh(T1) == bh(T2)'

Then the join is relatively easy: create a new node containing x, attach T1 as left subtree and T2 as right subtree, color the new node red or black as needed, and then repair any red-violation locally.

The harder case is when the black heights differ.

Joining When Black Heights Differ

Assume bh(T1) > bh(T2). Then you walk down the right spine of T1 until you find a node whose subtree black height matches bh(T2). That is the place where the separator node can be inserted.

Conceptually:

descend along the side closer to the other tree
stop where black heights line up
splice in a new node containing x
attach the matching subtree and T2
run the usual red-black insertion fix-up

That is why the operation runs in O(log n): the search for the splice point follows a tree path, not a full traversal.

High-Level Pseudocode

Here is simplified pseudocode for the case bh(T1) >= bh(T2).

text

1JOIN(T1, x, T2):
2    if black_height(T1) == black_height(T2):
3        make node x with left = T1 and right = T2
4        color x red
5        fix red-black violations if needed
6        return repaired tree
7
8    walk down right side of T1 until subtree black height == black_height(T2)
9    splice new node x above that subtree
10    set x.left to matched subtree
11    set x.right to T2
12    reconnect parent links
13    color x red
14    run standard red-black insertion fix-up
15    return repaired root

The mirror case for bh(T2) > bh(T1) walks down the left spine of T2 instead.

Example with a Separator Key

Imagine:

'T1 contains 1, 3, 5, 7'
'x = 9'
'T2 contains 10, 12, 14, 16'

Because all keys in T1 are less than 9 and all keys in T2 are greater than 9, 9 is a valid separator.

If the black heights match, 9 can be used as the join node directly and then normal insertion-style repair handles any coloring issues.

If the black heights differ, 9 is inserted at the matching black-height boundary rather than being forced to become the global root.

Time Complexity

The join operation is efficient because it does not rebuild the whole tree.

The main costs are:

following one root-to-leaf style path to find the splice point
performing the usual red-black repair rotations and recolorings

That gives an overall time complexity of O(log n + log m), usually summarized as O(log n) when discussing balanced trees of comparable scale.

The important point is that concatenation is much cheaper than flattening both trees into arrays and rebuilding from scratch.

Why This Matters

Red-black tree join is not just a theoretical trick. It is useful in:

ordered set splitting and joining
persistent balanced-tree operations
interval structures and rope-like data structures
advanced functional or systems data structures

Once you know how to join efficiently, split and concatenation-based operations become much more practical.

Common Pitfalls

A common mistake is ignoring black height and trying to attach the trees directly under a new root. That usually breaks the red-black invariants.

Another issue is forgetting that concatenation assumes all keys in one tree are strictly less than all keys in the other, or that a valid separator key exists between them.

Developers also sometimes say "concatenate" when they really mean union. Join is much simpler because the ordering boundary is already known.

Finally, do not skip the post-splice red-black fix-up. Even with a correct splice point, coloring and rotation repair is still required.

Summary

Concatenating red-black trees is usually solved as a join operation.
You need a separator key or must extract one from one of the trees.
The splice point is chosen by matching black height, not by naive attachment.
The operation is efficient because it follows one path and then runs normal red-black repair.
Ignoring black height is the main reason naive concatenation fails.