Collections.defaultdict difference with normal dict

collections

defaultdict

dict

Python

programming

Collections.defaultdict difference with normal dict

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Both dict and collections.defaultdict store key-value pairs, but they behave differently when a key is missing. A normal dict raises KeyError, while defaultdict creates a default value automatically using a factory function. That small difference can simplify counting, grouping, and accumulation code.

How a Normal `dict` Handles Missing Keys

With a standard dictionary, you must initialize keys before updating nested or numeric values.

python

1counts = {}
2words = ["red", "blue", "red", "green", "blue", "red"]
3
4for word in words:
5    if word not in counts:
6        counts[word] = 0
7    counts[word] += 1
8
9print(counts)

This is explicit and sometimes preferred for clarity, but repetitive in many loops.

How `defaultdict` Changes the Pattern

defaultdict takes a callable that creates initial values for missing keys.

python

1from collections import defaultdict
2
3counts = defaultdict(int)
4words = ["red", "blue", "red", "green", "blue", "red"]
5
6for word in words:
7    counts[word] += 1
8
9print(dict(counts))

Because int() returns zero, each first access starts from zero automatically.

Grouping is similarly concise:

python

1from collections import defaultdict
2
3items = [
4    ("fruit", "apple"),
5    ("fruit", "banana"),
6    ("drink", "tea"),
7    ("drink", "coffee")
8]
9
10groups = defaultdict(list)
11for category, name in items:
12    groups[category].append(name)
13
14print(dict(groups))

Without defaultdict(list), you would need manual key initialization before every append.

Choosing Between Them

Use dict when missing keys represent true errors and should fail fast. Use defaultdict when missing keys are expected during construction of an aggregate structure.

A practical rule:

use dict for strict schemas and validated inputs
use defaultdict for accumulation pipelines

You can always convert defaultdict to regular dict at boundaries such as API responses or JSON serialization.

python

result = dict(groups)

Nested Data Structures

defaultdict is especially useful with nested collections.

python

1from collections import defaultdict
2
3scores = defaultdict(lambda: defaultdict(int))
4
5scores["team-a"]["wins"] += 1
6scores["team-a"]["losses"] += 0
7scores["team-b"]["wins"] += 2
8
9# Convert recursively when needed
10plain = {team: dict(stats) for team, stats in scores.items()}
11print(plain)

This pattern removes layers of boilerplate and keeps intent obvious.

Alternatives and Performance Notes

For one-off updates, dict.setdefault can be enough and may be more familiar to some teams.

python

groups = {}
for category, name in items:
    groups.setdefault(category, []).append(name)

In practice, performance differences between setdefault and defaultdict are usually small for medium data sizes. Most teams choose based on readability and consistency. If a hot path processes millions of rows, benchmark with realistic data before standardizing one approach.

Type Hints and API Boundaries

When returning values from functions, use standard mapping type hints so callers are not tightly coupled to implementation details. You can build with defaultdict internally and return a regular dictionary.

python

1from typing import Dict, List
2
3
4def group_names(rows: list[tuple[str, str]]) -> Dict[str, List[str]]:
5    temp = defaultdict(list)
6    for category, name in rows:
7        temp[category].append(name)
8    return dict(temp)

This keeps call sites simple and avoids surprises in serialization layers.

Common Pitfalls

Accessing a missing key in defaultdict mutates the object by creating a new key. Avoid accidental reads if key creation is not desired.
Using a mutable constant as a default value is wrong. Always pass a factory callable, such as list or set.
Serializing defaultdict directly can be confusing in some libraries. Convert to plain dict at output boundaries.
Switching from dict to defaultdict in strict validation logic can hide missing-input bugs.
Overusing nested factories can reduce readability if the data model becomes too implicit.

Summary

dict raises KeyError for missing keys, while defaultdict creates defaults automatically.
defaultdict reduces boilerplate for counting and grouping workloads.
Choose based on intent: strict validation or expected accumulation.
Convert to plain dict when exporting data externally.
Watch for accidental key creation when reading missing entries.

Collections.defaultdict difference with normal dict

Master System Design with Codemia

Introduction

How a Normal dict Handles Missing Keys

How defaultdict Changes the Pattern

Choosing Between Them

Nested Data Structures

Alternatives and Performance Notes

Type Hints and API Boundaries

Common Pitfalls

Summary

How a Normal `dict` Handles Missing Keys

How `defaultdict` Changes the Pattern