CSV import
DynamoDB bulk data
data migration
AWS DynamoDB
database management

How can I import bulk data from a CSV file into DynamoDB?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

There are two practical ways to bulk-import CSV data into DynamoDB. For controlled programmatic imports, use a script that reads the CSV, converts each row into DynamoDB item types, and writes with boto3. For larger one-time migrations, AWS also provides an import-from-S3 workflow that can create a new DynamoDB table directly from a CSV file stored in S3.

First Model the Table Correctly

A CSV file is only rows and columns. DynamoDB cares about:

  • the partition key
  • an optional sort key
  • attribute types
  • duplicate-key behavior

Before importing anything, decide which CSV columns map to the table keys and how each value should be typed. That design step matters more than the mechanics of the import command.

If you skip it, you can end up with rejected rows, bad types, or accidental overwrites.

Option 1: Import With boto3

A Python script is the most flexible approach when you need to transform values or validate rows before writing them.

python
1import csv
2from decimal import Decimal
3
4import boto3
5
6dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
7table = dynamodb.Table("Orders")
8
9
10def row_to_item(row: dict) -> dict:
11    return {
12        "order_id": row["order_id"],
13        "customer_id": row["customer_id"],
14        "status": row["status"],
15        "total": Decimal(row["total"]),
16        "created_at": row["created_at"],
17    }
18
19
20with open("orders.csv", newline="", encoding="utf-8") as csv_file:
21    reader = csv.DictReader(csv_file)
22
23    with table.batch_writer() as batch:
24        for row in reader:
25            batch.put_item(Item=row_to_item(row))

batch_writer() is the usual starting point because it buffers writes and handles unprocessed items more gracefully than issuing single-item writes in a tight loop.

Why Decimal Matters

DynamoDB numeric attributes should not be loaded from Python float values when precision matters. The safer practice is to use Decimal from the standard library.

python
from decimal import Decimal

amount = Decimal("19.95")

This avoids subtle issues where floating-point representation does not match the exact numeric value you intended to store.

Option 2: Import From CSV in S3

For larger migrations, AWS provides a managed import workflow from S3. That is attractive when:

  • the data already lives in S3
  • you want AWS to handle the bulk import job
  • you are creating a new table from the import

The tradeoff is that managed import is less flexible than a custom script. If your CSV needs heavy transformation or row-by-row cleansing, boto3 is usually still the better fit.

So the choice is:

  • use boto3 for control and transformation
  • use S3 import for larger managed ingestion workflows

Validate Rows Before Writing

Even in simple scripts, validate required fields before inserting.

python
1def row_to_item(row: dict) -> dict:
2    if not row["order_id"]:
3        raise ValueError("Missing order_id")
4
5    return {
6        "order_id": row["order_id"],
7        "customer_id": row["customer_id"],
8        "status": row["status"].strip().lower(),
9        "total": Decimal(row["total"]),
10        "created_at": row["created_at"],
11    }

That kind of validation prevents large imports from silently loading junk data.

Throughput and Operational Considerations

Bulk imports can stress write capacity, especially when the table has global secondary indexes. Plan for:

  • on-demand or temporarily increased write capacity
  • retry-safe import logic
  • chunking or resumability for large files
  • verification after import

If the dataset is large, operational planning matters as much as the code.

Verifying the Result

After the import, do not assume success just because the script finished. Read back a few expected items.

python
response = table.get_item(Key={"order_id": "A1001"})
print(response.get("Item"))

Spot checks plus row counts or sample comparisons are usually enough to catch obvious mapping mistakes.

Common Pitfalls

The biggest mistake is treating CSV columns as if they map directly to DynamoDB item types without any design or transformation step. DynamoDB is not a relational CSV target.

Another issue is using Python float values for money or precise numeric fields instead of Decimal.

People also often overlook duplicate partition keys and accidentally overwrite earlier items during import.

Finally, choose the import path that matches the job. A managed S3 import is great for large one-time loads into a new table, but a custom boto3 script is better when you need row transformation, validation, or application-specific rules.

Summary

  • Use a boto3 script with batch_writer() when you need flexible CSV-to-item transformation.
  • Use Decimal for numeric values that need DynamoDB-friendly precision.
  • AWS also offers a managed import-from-S3 workflow for CSV-based bulk loading into a new table.
  • Model the partition key, sort key, and attribute types before importing.
  • Validate and verify the data so the migration is not only fast, but correct.

Course illustration
Course illustration

All Rights Reserved.