combinatorics
subsets
algorithm
mathematics
programming

generate all subsets of size k from a set

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Generating all subsets of size k from a set is the same as generating all combinations of k distinct elements. This problem appears in search, testing, scheduling, and feature selection because you often need every possible group of a fixed size, not every subset overall.

What the Output Size Looks Like

If a set has n elements, the number of subsets of size k is "n choose k". That matters because even a perfect algorithm still has to output every combination, so the runtime cannot be smaller than the number of results you ask for.

For example, if the input is ["a", "b", "c", "d"] and k = 2, the valid subsets are:

  • '["a", "b"]'
  • '["a", "c"]'
  • '["a", "d"]'
  • '["b", "c"]'
  • '["b", "d"]'
  • '["c", "d"]'

There are six of them, which matches the combination count.

Backtracking Is the Standard Approach

The most common manual solution is backtracking. Build a partial subset, choose the next element from the remaining positions, and stop when the subset reaches size k.

python
1def combinations_of_size_k(items, k):
2    result = []
3
4    def backtrack(start, current):
5        if len(current) == k:
6            result.append(current.copy())
7            return
8
9        for index in range(start, len(items)):
10            current.append(items[index])
11            backtrack(index + 1, current)
12            current.pop()
13
14    backtrack(0, [])
15    return result
16
17
18print(combinations_of_size_k(["a", "b", "c", "d"], 2))

This works because each recursive call only considers later elements, so combinations are generated without duplicates.

Why the start Index Matters

The start argument prevents the algorithm from going backward and generating the same subset in a different order. Without that restriction, ["a", "b"] and ["b", "a"] would both appear even though they represent the same subset.

The recursion tree is also easy to reason about:

  • choose the current element
  • recurse on the remaining suffix
  • undo the choice and try the next element

That pattern makes backtracking a good fit for interviews and for code you may need to modify later.

Add Pruning for Better Efficiency

A useful optimization is to stop early when there are not enough remaining elements to finish the subset. This does not change the result, but it avoids useless recursion.

python
1def combinations_of_size_k(items, k):
2    result = []
3
4    def backtrack(start, current):
5        if len(current) == k:
6            result.append(current.copy())
7            return
8
9        remaining_needed = k - len(current)
10        last_start = len(items) - remaining_needed
11
12        for index in range(start, last_start + 1):
13            current.append(items[index])
14            backtrack(index + 1, current)
15            current.pop()
16
17    backtrack(0, [])
18    return result

This matters more when n is large and k is small, because the algorithm can skip branches that are guaranteed to fail.

Built-In Library Solutions

In Python, the best production choice is often itertools.combinations because it is concise, well tested, and lazy.

python
1from itertools import combinations
2
3items = ["a", "b", "c", "d"]
4
5for subset in combinations(items, 2):
6    print(subset)

This yields tuples one by one instead of building the entire result immediately. That is useful when the number of combinations is large and you want to stream them into another computation.

Handling Edge Cases

Define the expected behavior for special values of k:

  • If k == 0, there is exactly one subset: the empty subset.
  • If k > n, there are no valid subsets.
  • If the input contains duplicates, your "set" is not really a mathematical set anymore, so duplicate combinations may appear unless you normalize the input first.

Making those rules explicit keeps bugs out of calling code.

Common Pitfalls

The first pitfall is generating permutations instead of combinations. If order is not supposed to matter, your recursive search must move forward through the input rather than retry earlier items.

Another mistake is appending the same mutable list object into the result. In Python, you need current.copy() or a slice copy. Otherwise, later backtracking steps modify every stored result.

A third issue is forgetting the growth rate. Even though the algorithm looks simple, the output size can explode quickly. Ten choose five already gives 252 results, and larger inputs get expensive fast.

Finally, be careful with duplicate input values. If the source data contains repeated elements, the output may contain repeated subsets unless you deduplicate or sort and skip repeats deliberately.

Summary

  • Subsets of size k are combinations, not permutations.
  • Backtracking is the standard manual approach because it is simple and avoids duplicates naturally.
  • A start index ensures each combination is built in one order only.
  • 'itertools.combinations is often the best choice in Python when you do not need custom recursion logic.'
  • The total number of results grows combinatorially, so output size is the main cost driver.

Course illustration
Course illustration