collections
defaultdict
dict
Python
programming

Collections.defaultdict difference with normal dict

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Both dict and collections.defaultdict store key-value pairs, but they behave differently when a key is missing. A normal dict raises KeyError, while defaultdict creates a default value automatically using a factory function. That small difference can simplify counting, grouping, and accumulation code.

How a Normal dict Handles Missing Keys

With a standard dictionary, you must initialize keys before updating nested or numeric values.

python
1counts = {}
2words = ["red", "blue", "red", "green", "blue", "red"]
3
4for word in words:
5    if word not in counts:
6        counts[word] = 0
7    counts[word] += 1
8
9print(counts)

This is explicit and sometimes preferred for clarity, but repetitive in many loops.

How defaultdict Changes the Pattern

defaultdict takes a callable that creates initial values for missing keys.

python
1from collections import defaultdict
2
3counts = defaultdict(int)
4words = ["red", "blue", "red", "green", "blue", "red"]
5
6for word in words:
7    counts[word] += 1
8
9print(dict(counts))

Because int() returns zero, each first access starts from zero automatically.

Grouping is similarly concise:

python
1from collections import defaultdict
2
3items = [
4    ("fruit", "apple"),
5    ("fruit", "banana"),
6    ("drink", "tea"),
7    ("drink", "coffee")
8]
9
10groups = defaultdict(list)
11for category, name in items:
12    groups[category].append(name)
13
14print(dict(groups))

Without defaultdict(list), you would need manual key initialization before every append.

Choosing Between Them

Use dict when missing keys represent true errors and should fail fast. Use defaultdict when missing keys are expected during construction of an aggregate structure.

A practical rule:

  • use dict for strict schemas and validated inputs
  • use defaultdict for accumulation pipelines

You can always convert defaultdict to regular dict at boundaries such as API responses or JSON serialization.

python
result = dict(groups)

Nested Data Structures

defaultdict is especially useful with nested collections.

python
1from collections import defaultdict
2
3scores = defaultdict(lambda: defaultdict(int))
4
5scores["team-a"]["wins"] += 1
6scores["team-a"]["losses"] += 0
7scores["team-b"]["wins"] += 2
8
9# Convert recursively when needed
10plain = {team: dict(stats) for team, stats in scores.items()}
11print(plain)

This pattern removes layers of boilerplate and keeps intent obvious.

Alternatives and Performance Notes

For one-off updates, dict.setdefault can be enough and may be more familiar to some teams.

python
groups = {}
for category, name in items:
    groups.setdefault(category, []).append(name)

In practice, performance differences between setdefault and defaultdict are usually small for medium data sizes. Most teams choose based on readability and consistency. If a hot path processes millions of rows, benchmark with realistic data before standardizing one approach.

Type Hints and API Boundaries

When returning values from functions, use standard mapping type hints so callers are not tightly coupled to implementation details. You can build with defaultdict internally and return a regular dictionary.

python
1from typing import Dict, List
2
3
4def group_names(rows: list[tuple[str, str]]) -> Dict[str, List[str]]:
5    temp = defaultdict(list)
6    for category, name in rows:
7        temp[category].append(name)
8    return dict(temp)

This keeps call sites simple and avoids surprises in serialization layers.

Common Pitfalls

  • Accessing a missing key in defaultdict mutates the object by creating a new key. Avoid accidental reads if key creation is not desired.
  • Using a mutable constant as a default value is wrong. Always pass a factory callable, such as list or set.
  • Serializing defaultdict directly can be confusing in some libraries. Convert to plain dict at output boundaries.
  • Switching from dict to defaultdict in strict validation logic can hide missing-input bugs.
  • Overusing nested factories can reduce readability if the data model becomes too implicit.

Summary

  • dict raises KeyError for missing keys, while defaultdict creates defaults automatically.
  • defaultdict reduces boilerplate for counting and grouping workloads.
  • Choose based on intent: strict validation or expected accumulation.
  • Convert to plain dict when exporting data externally.
  • Watch for accidental key creation when reading missing entries.

Course illustration
Course illustration

All Rights Reserved.