Copy multiple files from s3 bucket
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Copying multiple files from Amazon S3 usually means copying all objects under a prefix, not traversing real directories. S3 is an object store, so the “folders” you see in the console are naming conventions inside object keys. Once that model is clear, the AWS CLI and Boto3 become much easier to use correctly.
Use the AWS CLI for Bulk Copies
If you want everything under a prefix, the most direct command is aws s3 cp with --recursive:
That copies all objects whose keys start with reports/ into the local ./reports/ directory.
You can also copy between buckets:
This is a good choice for one-time bulk transfers.
Filter Which Objects Are Copied
When you need only certain file types, combine --exclude and --include:
The order matters conceptually: exclude broadly, then include what you want.
You can refine this further:
This is often enough when selection is based on filenames or prefixes.
Use sync for Repeated Transfers
If the real goal is “keep destination aligned with source,” aws s3 sync is usually better than cp --recursive:
sync compares source and destination and skips files that do not need to move. That makes it a better fit for repeated jobs and large datasets.
Filtering also works with sync:
Use cp for straightforward copy tasks and sync for mirroring behavior.
Download Objects Programmatically with Boto3
If your selection logic depends on code, use Boto3. The common pattern is:
- list objects with a paginator
- filter keys in Python
- download each matching object
This approach is more verbose, but it gives you complete control over filtering, local path mapping, retries, and logging.
Copy Between Buckets Without Downloading Locally
If the destination is another S3 bucket, do not download objects to your machine unless you have to. Use server-side copy instead:
That keeps the transfer inside AWS and is usually simpler and cheaper than routing the data through a local host.
Think in Prefixes, Not Folders
This mental model prevents many mistakes. If you run:
S3 is not traversing a real folder tree. It is returning keys that begin with reports/. That is why consistent key naming matters so much in S3-heavy systems.
Common Pitfalls
The most common mistake is treating S3 like a local filesystem. A “folder” is just a prefix in the object key.
Another pitfall is using cp --recursive for jobs that should really be incremental syncs. That can waste bandwidth and time by copying unchanged data repeatedly.
Developers also sometimes misuse include and exclude filters. The safest pattern is usually exclude everything first, then include the keys you actually want.
Finally, if you download nested keys with Boto3, remember to create parent directories locally before writing the file.
Summary
- In S3, bulk copy usually means copying all objects under a prefix.
- Use
aws s3 cp --recursivefor direct bulk copies. - Use
aws s3 syncwhen you want repeated or incremental mirroring behavior. - Use include and exclude filters to narrow which objects move.
- Use Boto3 or server-side S3 copy when you need code-level control or S3-to-S3 transfer.

