CSV
Python
Dictionary
Data Processing
Programming

Creating a dictionary from a csv file?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Turning CSV rows into a dictionary is a common data-processing step when you need fast lookups by ID, SKU, email, or another key column. The main design question is not only how to read the CSV, but what the dictionary values should be and how duplicate keys should be handled.

Python's built-in csv module is enough for many cases. If the data pipeline already uses pandas, that can also be a convenient path. The important part is to choose a key strategy deliberately instead of letting duplicates overwrite data by accident.

Use csv.DictReader for Named Columns

DictReader turns each row into a dictionary keyed by the CSV headers:

python
1import csv
2
3mapping = {}
4
5with open("users.csv", newline="", encoding="utf-8") as f:
6    reader = csv.DictReader(f)
7    for row in reader:
8        mapping[row["id"]] = row["email"]
9
10print(mapping)

This is the simplest one-to-one mapping pattern: one column becomes the key and another becomes the value.

Keep Entire Rows as Values When Needed

Sometimes you want lookup by key, but you still need the whole row:

python
1import csv
2
3users_by_id = {}
4
5with open("users.csv", newline="", encoding="utf-8") as f:
6    reader = csv.DictReader(f)
7    for row in reader:
8        users_by_id[row["id"]] = row
9
10print(users_by_id.get("42"))

This is useful when the dictionary acts like an in-memory index rather than a simple value map.

Decide What to Do About Duplicate Keys

Duplicate keys are where many quick scripts become data-loss bugs. If the same key appears twice, you need an explicit rule.

Overwrite with the last row:

python
mapping[row["id"]] = row["email"]

Collect all values for the same key:

python
1from collections import defaultdict
2import csv
3
4emails_by_id = defaultdict(list)
5
6with open("users.csv", newline="", encoding="utf-8") as f:
7    for row in csv.DictReader(f):
8        emails_by_id[row["id"]].append(row["email"])
9
10print(dict(emails_by_id))

Neither policy is universally correct. What matters is choosing one intentionally.

Convert Types While Loading

CSV fields are strings by default, so numeric and boolean conversion should happen explicitly:

python
1import csv
2
3prices = {}
4
5with open("products.csv", newline="", encoding="utf-8") as f:
6    for row in csv.DictReader(f):
7        prices[row["sku"]] = float(row["price"])
8
9print(prices)

Doing this at load time makes data problems visible earlier and keeps downstream code cleaner.

Use Composite Keys When One Column Is Not Enough

Some datasets need more than one column to identify a record uniquely:

python
1import csv
2
3records = {}
4
5with open("orders.csv", newline="", encoding="utf-8") as f:
6    for row in csv.DictReader(f):
7        key = (row["customer_id"], row["order_id"])
8        records[key] = row
9
10print(records.get(("10", "5001")))

Tuple keys are often a clean way to express compound identity without adding another processing pass.

Validate the CSV Schema Early

If the expected columns are missing, fail fast:

python
1import csv
2
3required = {"id", "email"}
4
5with open("users.csv", newline="", encoding="utf-8") as f:
6    reader = csv.DictReader(f)
7    if not required.issubset(reader.fieldnames or []):
8        raise ValueError("CSV missing required columns")

This is much better than producing a partially correct dictionary and discovering the schema problem later.

Common Pitfalls

The biggest mistake is assuming the key column is unique when the CSV actually contains duplicates. That can silently overwrite data.

Another common issue is forgetting that CSV values are strings. If you need numbers, dates, or booleans, convert them explicitly.

People also use the CSV module with the wrong delimiter or encoding and then debug the dictionary logic when the parsing layer was actually the problem.

Finally, loading a huge CSV entirely into a dictionary may be the wrong architecture if you only need streaming processing or partial lookups.

Summary

  • 'csv.DictReader is the simplest built-in path for CSV-to-dictionary conversion.'
  • Decide early whether the dictionary should store one value or entire rows.
  • Choose and document a duplicate-key policy.
  • Convert field types while loading instead of leaving everything as strings.
  • Validate required headers so bad input fails clearly.

Course illustration
Course illustration

All Rights Reserved.