Copy S3 Bucket including versions

AWS

S3 Bucket

Data Backup

Versioning

Cloud Storage

Copy S3 Bucket including versions

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Copying an S3 bucket sounds simple until versioning is involved. The important detail is that aws s3 cp and aws s3 sync only copy the current object state, not the full version history, so a real version-aware migration needs a different approach.

Why a normal S3 copy is not enough

In a versioned bucket, a key can have many object versions plus delete markers. A plain recursive copy only transfers the latest visible version for each key. That is fine for a one-time backup of current data, but it does not preserve historical versions, recovery points, or the delete state of objects.

If you need the destination bucket to behave like the source bucket, start by enabling versioning on the target:

bash

1aws s3api create-bucket \
2  --bucket my-destination-bucket \
3  --region us-east-1
4
5aws s3api put-bucket-versioning \
6  --bucket my-destination-bucket \
7  --versioning-configuration Status=Enabled

That step is required before you copy versioned content. Without it, the destination only keeps the last write.

Copying every object version with the AWS CLI

For a small or medium bucket, the direct method is:

List all versions in the source bucket.
Copy each version by VersionId.
Recreate delete markers if you need the same latest object state.

The AWS CLI exposes versions through list-object-versions:

bash

1aws s3api list-object-versions \
2  --bucket my-source-bucket \
3  --prefix documents/ \
4  --output json > versions.json

Then copy each real object version:

bash

1jq -r '.Versions[] | @base64' versions.json | while read -r row; do
2  item=$(printf '%s' "$row" | base64 --decode)
3  key=$(printf '%s' "$item" | jq -r '.Key')
4  version_id=$(printf '%s' "$item" | jq -r '.VersionId')
5
6  aws s3api copy-object \
7    --bucket my-destination-bucket \
8    --key "$key" \
9    --copy-source "my-source-bucket/$key?versionId=$version_id"
10done

This creates new versions in the destination bucket that contain the same object data as the source versions. The destination version IDs will be different, because S3 assigns new IDs when objects are written.

Handling delete markers and large migrations

Delete markers matter because a key can look deleted even though older versions still exist. list-object-versions returns them separately under DeleteMarkers. If the goal is operational continuity, you usually copy object versions first and then recreate delete markers by issuing a delete against the destination key.

For very large buckets, a loop like the one above becomes awkward. In production, the better options are:

S3 Replication for ongoing bucket-to-bucket version-aware copying
S3 Batch Operations for large one-off jobs
AWS DataSync when the job includes other storage systems or repeated transfers

S3 Replication is the cleanest option when you need AWS to continuously mirror object versions and deletes after the initial setup. Batch Operations is better when you already have an inventory or a known migration scope and just want AWS to execute the copies at scale.

What gets preserved and what does not

A version-aware copy preserves the object bytes, metadata you explicitly copy, and the fact that multiple historical versions existed. It does not preserve the original version IDs, and you should not assume timestamps or replication status will match exactly between buckets.

If compliance or audit workflows depend on exact historical identity, you need to document that limitation before migration. In many teams, "preserve recoverability" is enough. In others, "preserve exact historical identifiers" is the real requirement, and a bucket copy will not satisfy it.

Common Pitfalls

Using aws s3 sync and expecting old versions to appear in the destination. It only transfers the current version.
Forgetting to enable destination bucket versioning before the copy. That collapses all history into a single latest object.
Ignoring delete markers. The copied bucket may show files that appear deleted in the source.
Assuming version IDs stay the same. They do not; S3 creates new version IDs in the destination.
Skipping URL-safe handling for unusual keys. Keys with spaces or special characters need careful handling in --copy-source.

Summary

Versioned S3 buckets need a version-aware migration process, not just aws s3 cp or aws s3 sync.
Use list-object-versions to enumerate every object version and copy each one explicitly.
Enable versioning on the destination bucket before writing anything.
Recreate delete markers if you need the destination to match the visible state of the source.
For large or ongoing migrations, prefer S3 Replication or S3 Batch Operations over ad hoc shell loops.