How can I import bulk data from a CSV file into DynamoDB?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
There are two practical ways to bulk-import CSV data into DynamoDB. For controlled programmatic imports, use a script that reads the CSV, converts each row into DynamoDB item types, and writes with boto3. For larger one-time migrations, AWS also provides an import-from-S3 workflow that can create a new DynamoDB table directly from a CSV file stored in S3.
First Model the Table Correctly
A CSV file is only rows and columns. DynamoDB cares about:
- the partition key
- an optional sort key
- attribute types
- duplicate-key behavior
Before importing anything, decide which CSV columns map to the table keys and how each value should be typed. That design step matters more than the mechanics of the import command.
If you skip it, you can end up with rejected rows, bad types, or accidental overwrites.
Option 1: Import With boto3
A Python script is the most flexible approach when you need to transform values or validate rows before writing them.
batch_writer() is the usual starting point because it buffers writes and handles unprocessed items more gracefully than issuing single-item writes in a tight loop.
Why Decimal Matters
DynamoDB numeric attributes should not be loaded from Python float values when precision matters. The safer practice is to use Decimal from the standard library.
This avoids subtle issues where floating-point representation does not match the exact numeric value you intended to store.
Option 2: Import From CSV in S3
For larger migrations, AWS provides a managed import workflow from S3. That is attractive when:
- the data already lives in S3
- you want AWS to handle the bulk import job
- you are creating a new table from the import
The tradeoff is that managed import is less flexible than a custom script. If your CSV needs heavy transformation or row-by-row cleansing, boto3 is usually still the better fit.
So the choice is:
- use
boto3for control and transformation - use S3 import for larger managed ingestion workflows
Validate Rows Before Writing
Even in simple scripts, validate required fields before inserting.
That kind of validation prevents large imports from silently loading junk data.
Throughput and Operational Considerations
Bulk imports can stress write capacity, especially when the table has global secondary indexes. Plan for:
- on-demand or temporarily increased write capacity
- retry-safe import logic
- chunking or resumability for large files
- verification after import
If the dataset is large, operational planning matters as much as the code.
Verifying the Result
After the import, do not assume success just because the script finished. Read back a few expected items.
Spot checks plus row counts or sample comparisons are usually enough to catch obvious mapping mistakes.
Common Pitfalls
The biggest mistake is treating CSV columns as if they map directly to DynamoDB item types without any design or transformation step. DynamoDB is not a relational CSV target.
Another issue is using Python float values for money or precise numeric fields instead of Decimal.
People also often overlook duplicate partition keys and accidentally overwrite earlier items during import.
Finally, choose the import path that matches the job. A managed S3 import is great for large one-time loads into a new table, but a custom boto3 script is better when you need row transformation, validation, or application-specific rules.
Summary
- Use a
boto3script withbatch_writer()when you need flexible CSV-to-item transformation. - Use
Decimalfor numeric values that need DynamoDB-friendly precision. - AWS also offers a managed import-from-S3 workflow for CSV-based bulk loading into a new table.
- Model the partition key, sort key, and attribute types before importing.
- Validate and verify the data so the migration is not only fast, but correct.

