Introduction
When comparing Python dictionaries based on a subset of their keys, you extract the relevant keys from each dictionary and compare only those values. This is useful for matching records that share an identity (like name + date) but may differ in other fields. Approaches include extracting key subsets with comprehensions, using operator.itemgetter, converting to tuples for hashing, and using pandas for large-scale record matching.
Comparing Two Dictionaries on Specific Keys
1def dicts_equal_on_keys(d1, d2, keys):
2 """Check if two dicts have the same values for the given keys."""
3 return all(d1.get(k) == d2.get(k) for k in keys)
4
5record_a = {"name": "Alice", "age": 30, "city": "NYC", "score": 95}
6record_b = {"name": "Alice", "age": 30, "city": "LA", "score": 88}
7
8# Compare on name and age only
9print(dicts_equal_on_keys(record_a, record_b, ["name", "age"])) # True
10
11# Compare on all fields
12print(record_a == record_b) # False (city and score differ)
1def subset(d, keys):
2 """Extract a subset of keys from a dictionary."""
3 return {k: d[k] for k in keys if k in d}
4
5record = {"name": "Alice", "age": 30, "city": "NYC", "email": "[email protected]"}
6
7identity = subset(record, ["name", "age"])
8print(identity) # {'name': 'Alice', 'age': 30}
9
10# Compare subsets
11print(subset(record_a, ["name", "age"]) == subset(record_b, ["name", "age"])) # True
Finding Matching Records in Two Lists
1from operator import itemgetter
2
3employees = [
4 {"name": "Alice", "dept": "Engineering", "id": 1},
5 {"name": "Bob", "dept": "Sales", "id": 2},
6 {"name": "Charlie", "dept": "Engineering", "id": 3},
7]
8
9contractors = [
10 {"name": "Alice", "dept": "Engineering", "rate": 150},
11 {"name": "Diana", "dept": "Sales", "rate": 120},
12]
13
14# Find employees that match contractors on (name, dept)
15match_keys = ["name", "dept"]
16
17contractor_set = {
18 tuple(itemgetter(*match_keys)(c)) for c in contractors
19}
20# {('Alice', 'Engineering'), ('Diana', 'Sales')}
21
22matches = [
23 e for e in employees
24 if tuple(itemgetter(*match_keys)(e)) in contractor_set
25]
26print(matches)
27# [{'name': 'Alice', 'dept': 'Engineering', 'id': 1}]
Finding Differences Between Dictionaries
1def diff_on_keys(d1, d2, keys):
2 """Find keys where two dicts differ."""
3 return {k: (d1.get(k), d2.get(k)) for k in keys if d1.get(k) != d2.get(k)}
4
5old = {"name": "Alice", "age": 30, "city": "NYC", "salary": 100000}
6new = {"name": "Alice", "age": 31, "city": "NYC", "salary": 110000}
7
8changes = diff_on_keys(old, new, ["age", "city", "salary"])
9print(changes)
10# {'age': (30, 31), 'salary': (100000, 110000)}
Grouping by Composite Keys
1from collections import defaultdict
2
3records = [
4 {"date": "2024-01-01", "product": "A", "region": "East", "sales": 100},
5 {"date": "2024-01-01", "product": "A", "region": "West", "sales": 150},
6 {"date": "2024-01-01", "product": "B", "region": "East", "sales": 200},
7 {"date": "2024-01-02", "product": "A", "region": "East", "sales": 120},
8]
9
10# Group by (date, product)
11grouped = defaultdict(list)
12for r in records:
13 key = (r["date"], r["product"])
14 grouped[key].append(r)
15
16for key, group in grouped.items():
17 total = sum(r["sales"] for r in group)
18 print(f"{key}: total sales = {total}")
19
20# ('2024-01-01', 'A'): total sales = 250
21# ('2024-01-01', 'B'): total sales = 200
22# ('2024-01-02', 'A'): total sales = 120
Deduplicating by Composite Key
1def deduplicate(records, keys):
2 """Keep the first record for each unique combination of keys."""
3 seen = set()
4 result = []
5 for record in records:
6 key = tuple(record.get(k) for k in keys)
7 if key not in seen:
8 seen.add(key)
9 result.append(record)
10 return result
11
12data = [
13 {"name": "Alice", "date": "2024-01-01", "amount": 100},
14 {"name": "Alice", "date": "2024-01-01", "amount": 150}, # Duplicate
15 {"name": "Bob", "date": "2024-01-01", "amount": 200},
16]
17
18unique = deduplicate(data, ["name", "date"])
19print(unique)
20# [{'name': 'Alice', 'date': '2024-01-01', 'amount': 100},
21# {'name': 'Bob', 'date': '2024-01-01', 'amount': 200}]
Using Pandas for Large-Scale Comparison
1import pandas as pd
2
3df1 = pd.DataFrame([
4 {"name": "Alice", "dept": "Eng", "salary": 100000},
5 {"name": "Bob", "dept": "Sales", "salary": 80000},
6])
7
8df2 = pd.DataFrame([
9 {"name": "Alice", "dept": "Eng", "bonus": 10000},
10 {"name": "Charlie", "dept": "Eng", "bonus": 8000},
11])
12
13# Merge on composite key (name, dept)
14merged = pd.merge(df1, df2, on=["name", "dept"], how="inner")
15print(merged)
16# name dept salary bonus
17# 0 Alice Eng 100000 10000
18
19# Find records in df1 not in df2
20left_only = pd.merge(df1, df2, on=["name", "dept"], how="left", indicator=True)
21missing = left_only[left_only["_merge"] == "left_only"]
22print(missing[["name", "dept"]])
23# name dept
24# 1 Bob Sales
Common Pitfalls
Using unhashable values as composite keys: Tuple keys like (d["name"], d["list_field"]) fail if any value is unhashable (lists, dicts). Convert unhashable values to strings or frozensets before creating the key tuple: tuple(str(d[k]) for k in keys).
Forgetting about missing keys: If a dictionary is missing one of the comparison keys, d[k] raises KeyError. Use d.get(k) which returns None for missing keys, or validate that all required keys exist before comparing.
Case-sensitive string comparison: {"name": "Alice"} and {"name": "alice"} are not equal. If case-insensitive matching is needed, normalize values first: d.get(k, "").lower() for string keys.
Comparing floating-point values: Direct equality (==) on float values may fail due to precision. Use math.isclose(d1[k], d2[k]) for floating-point comparisons, or round to a fixed number of decimal places before comparing.
O(n*m) performance with nested loops: Comparing every record in list A against every record in list B is O(n*m). Convert one list to a set of key tuples first for O(n+m) performance, or use pandas merge for large datasets.
Summary
Compare dictionaries on specific keys with all(d1.get(k) == d2.get(k) for k in keys)
Extract key subsets with {k: d[k] for k in keys} for partial dictionary comparison
Convert composite keys to tuples for O(1) set lookups and efficient record matching
Use collections.defaultdict for grouping records by composite keys
Use pandas.merge(on=[keys]) for efficient large-scale record matching and comparison
Handle edge cases: missing keys (use .get()), case sensitivity, and unhashable values