Comparing two dictionaries and checking how many key, value pairs are equal

Python programming

dictionary comparison

key-value pairs

data structures

coding tutorials

Comparing two dictionaries and checking how many key, value pairs are equal

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Comparing two dictionaries and counting equal key-value pairs is a common operation in data reconciliation, cache validation, and configuration checks. The main task is matching keys and values while handling missing keys cleanly. A robust approach should be clear about case sensitivity, type comparisons, and performance.

Core Sections

Basic Python Approach

Use one dictionary as the iteration source and check membership and equality in the other.

python

1def count_equal_pairs(a: dict, b: dict) -> int:
2    return sum(1 for k, v in a.items() if k in b and b[k] == v)
3
4x = {"a": 1, "b": 2, "c": 3}
5y = {"a": 1, "b": 9, "d": 4}
6print(count_equal_pairs(x, y))  # 1

This is efficient for in-memory dictionaries with hashable keys.

Compute Matched Keys and Differences

For diagnostics, return both count and mismatch details.

python

1def compare_dicts(a: dict, b: dict):
2    equal = []
3    different = []
4    missing = []
5
6    for k, v in a.items():
7        if k not in b:
8            missing.append(k)
9        elif b[k] == v:
10            equal.append(k)
11        else:
12            different.append(k)
13
14    return {
15        "equal_count": len(equal),
16        "equal_keys": equal,
17        "different_keys": different,
18        "missing_in_b": missing,
19    }
20
21print(compare_dicts({"x": 10, "y": 20}, {"x": 10, "y": 99}))

Detailed outputs are useful in reconciliation reports.

Symmetry Considerations

If dictionaries can have different key sets, decide whether to count only overlapping keys or require exact full-map equality. This decision changes interpretation in audits.

python

def overlap_equal_count(a: dict, b: dict) -> int:
    shared = a.keys() & b.keys()
    return sum(1 for k in shared if a[k] == b[k])

Explicit naming helps avoid ambiguity.

Nested Values and Custom Comparisons

If values are nested structures, plain equality might be too strict. You may need normalized comparisons.

python

1def normalize(s: str) -> str:
2    return s.strip().lower()
3
4print(normalize("  YES ") == normalize("yes"))

Normalization is often necessary for user-entered data comparisons.

Performance for Large Dictionaries

Dictionary lookup is usually O(1), so counting over one dictionary is O(n). For very large datasets, avoid repeated transformations inside loops and pre-normalize values once.

Testing and Data Contracts

Write tests for missing keys, equal values, mismatched types, and empty dictionaries. Data contracts should define whether 1 and 1.0 are treated as equal in your domain.

Large-scale Comparison Patterns

For large dictionaries in data pipelines, compare on shared key sets to avoid repeated membership checks.

python

def count_equal_pairs_fast(a: dict, b: dict) -> int:
    shared = a.keys() & b.keys()
    return sum(1 for k in shared if a[k] == b[k])

This keeps logic concise and often improves readability for performance-sensitive code.

Reporting Precision in Reconciliation Jobs

Counts alone can hide important mismatches. Produce ratio metrics and sample mismatch records for faster debugging.

python

1def match_ratio(a, b):
2    shared = a.keys() & b.keys()
3    if not shared:
4        return 0.0
5    equal = sum(1 for k in shared if a[k] == b[k])
6    return equal / len(shared)
7
8print(match_ratio({"a":1, "b":2}, {"a":1, "b":3}))

Publishing both count and ratio makes reconciliation reports easier to interpret across datasets with different sizes.

When reconciliation is business-critical, persist mismatch snapshots with timestamps so teams can trace when and where divergence started.

For reproducible audits, include deterministic key ordering when exporting mismatch reports. Stable output makes code reviews and incident investigations much easier across repeated runs.

Common Pitfalls

Counting only key overlap without documenting that behavior.
Treating type-coerced values as equal without domain approval.
Returning only a count and losing useful mismatch diagnostics.
Repeating expensive normalization inside the compare loop.
Ignoring missing-key direction when reconciliation requires both sides.

Summary

Count equal pairs by checking key presence and value equality.
Decide and document overlap versus full-map comparison semantics.
Provide mismatch details when debugging or auditing data.
Normalize values carefully for user-entered or noisy data.
Add tests for edge cases to keep comparison behavior stable.