AWS
S3
file transfer
cloud storage
data management

Copy multiple files from s3 bucket

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Copying multiple files from Amazon S3 usually means copying all objects under a prefix, not traversing real directories. S3 is an object store, so the “folders” you see in the console are naming conventions inside object keys. Once that model is clear, the AWS CLI and Boto3 become much easier to use correctly.

Use the AWS CLI for Bulk Copies

If you want everything under a prefix, the most direct command is aws s3 cp with --recursive:

bash
aws s3 cp s3://my-bucket/reports/ ./reports/ --recursive

That copies all objects whose keys start with reports/ into the local ./reports/ directory.

You can also copy between buckets:

bash
aws s3 cp s3://source-bucket/reports/ s3://archive-bucket/reports/ --recursive

This is a good choice for one-time bulk transfers.

Filter Which Objects Are Copied

When you need only certain file types, combine --exclude and --include:

bash
1aws s3 cp s3://my-bucket/logs/ ./logs/ \
2  --recursive \
3  --exclude "*" \
4  --include "*.json"

The order matters conceptually: exclude broadly, then include what you want.

You can refine this further:

bash
1aws s3 cp s3://my-bucket/logs/ ./logs/ \
2  --recursive \
3  --exclude "*" \
4  --include "2025-*.json"

This is often enough when selection is based on filenames or prefixes.

Use sync for Repeated Transfers

If the real goal is “keep destination aligned with source,” aws s3 sync is usually better than cp --recursive:

bash
aws s3 sync s3://my-bucket/assets/ ./assets/

sync compares source and destination and skips files that do not need to move. That makes it a better fit for repeated jobs and large datasets.

Filtering also works with sync:

bash
aws s3 sync s3://my-bucket/assets/ ./assets/ \
  --exclude "*" \
  --include "*.png"

Use cp for straightforward copy tasks and sync for mirroring behavior.

Download Objects Programmatically with Boto3

If your selection logic depends on code, use Boto3. The common pattern is:

  1. list objects with a paginator
  2. filter keys in Python
  3. download each matching object
python
1from pathlib import Path
2import boto3
3
4s3 = boto3.client("s3")
5bucket = "my-bucket"
6prefix = "reports/"
7target_dir = Path("downloads")
8
9paginator = s3.get_paginator("list_objects_v2")
10
11for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
12    for obj in page.get("Contents", []):
13        key = obj["Key"]
14        if not key.endswith(".csv"):
15            continue
16
17        local_path = target_dir / key.removeprefix(prefix)
18        local_path.parent.mkdir(parents=True, exist_ok=True)
19        s3.download_file(bucket, key, str(local_path))

This approach is more verbose, but it gives you complete control over filtering, local path mapping, retries, and logging.

Copy Between Buckets Without Downloading Locally

If the destination is another S3 bucket, do not download objects to your machine unless you have to. Use server-side copy instead:

python
1import boto3
2
3s3 = boto3.client("s3")
4
5source_bucket = "source-bucket"
6target_bucket = "target-bucket"
7keys = ["reports/jan.csv", "reports/feb.csv"]
8
9for key in keys:
10    s3.copy_object(
11        Bucket=target_bucket,
12        Key=key,
13        CopySource={"Bucket": source_bucket, "Key": key},
14    )

That keeps the transfer inside AWS and is usually simpler and cheaper than routing the data through a local host.

Think in Prefixes, Not Folders

This mental model prevents many mistakes. If you run:

bash
aws s3 cp s3://my-bucket/reports/ ./reports/ --recursive

S3 is not traversing a real folder tree. It is returning keys that begin with reports/. That is why consistent key naming matters so much in S3-heavy systems.

Common Pitfalls

The most common mistake is treating S3 like a local filesystem. A “folder” is just a prefix in the object key.

Another pitfall is using cp --recursive for jobs that should really be incremental syncs. That can waste bandwidth and time by copying unchanged data repeatedly.

Developers also sometimes misuse include and exclude filters. The safest pattern is usually exclude everything first, then include the keys you actually want.

Finally, if you download nested keys with Boto3, remember to create parent directories locally before writing the file.

Summary

  • In S3, bulk copy usually means copying all objects under a prefix.
  • Use aws s3 cp --recursive for direct bulk copies.
  • Use aws s3 sync when you want repeated or incremental mirroring behavior.
  • Use include and exclude filters to narrow which objects move.
  • Use Boto3 or server-side S3 copy when you need code-level control or S3-to-S3 transfer.

Course illustration
Course illustration

All Rights Reserved.