generate all subsets of size k from a set
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Generating all subsets of size k from a set is the same as generating all combinations of k distinct elements. This problem appears in search, testing, scheduling, and feature selection because you often need every possible group of a fixed size, not every subset overall.
What the Output Size Looks Like
If a set has n elements, the number of subsets of size k is "n choose k". That matters because even a perfect algorithm still has to output every combination, so the runtime cannot be smaller than the number of results you ask for.
For example, if the input is ["a", "b", "c", "d"] and k = 2, the valid subsets are:
- '
["a", "b"]' - '
["a", "c"]' - '
["a", "d"]' - '
["b", "c"]' - '
["b", "d"]' - '
["c", "d"]'
There are six of them, which matches the combination count.
Backtracking Is the Standard Approach
The most common manual solution is backtracking. Build a partial subset, choose the next element from the remaining positions, and stop when the subset reaches size k.
This works because each recursive call only considers later elements, so combinations are generated without duplicates.
Why the start Index Matters
The start argument prevents the algorithm from going backward and generating the same subset in a different order. Without that restriction, ["a", "b"] and ["b", "a"] would both appear even though they represent the same subset.
The recursion tree is also easy to reason about:
- choose the current element
- recurse on the remaining suffix
- undo the choice and try the next element
That pattern makes backtracking a good fit for interviews and for code you may need to modify later.
Add Pruning for Better Efficiency
A useful optimization is to stop early when there are not enough remaining elements to finish the subset. This does not change the result, but it avoids useless recursion.
This matters more when n is large and k is small, because the algorithm can skip branches that are guaranteed to fail.
Built-In Library Solutions
In Python, the best production choice is often itertools.combinations because it is concise, well tested, and lazy.
This yields tuples one by one instead of building the entire result immediately. That is useful when the number of combinations is large and you want to stream them into another computation.
Handling Edge Cases
Define the expected behavior for special values of k:
- If
k == 0, there is exactly one subset: the empty subset. - If
k > n, there are no valid subsets. - If the input contains duplicates, your "set" is not really a mathematical set anymore, so duplicate combinations may appear unless you normalize the input first.
Making those rules explicit keeps bugs out of calling code.
Common Pitfalls
The first pitfall is generating permutations instead of combinations. If order is not supposed to matter, your recursive search must move forward through the input rather than retry earlier items.
Another mistake is appending the same mutable list object into the result. In Python, you need current.copy() or a slice copy. Otherwise, later backtracking steps modify every stored result.
A third issue is forgetting the growth rate. Even though the algorithm looks simple, the output size can explode quickly. Ten choose five already gives 252 results, and larger inputs get expensive fast.
Finally, be careful with duplicate input values. If the source data contains repeated elements, the output may contain repeated subsets unless you deduplicate or sort and skip repeats deliberately.
Summary
- Subsets of size
kare combinations, not permutations. - Backtracking is the standard manual approach because it is simple and avoids duplicates naturally.
- A
startindex ensures each combination is built in one order only. - '
itertools.combinationsis often the best choice in Python when you do not need custom recursion logic.' - The total number of results grows combinatorially, so output size is the main cost driver.

