AWS S3
sync
exclude folders
cloud storage
data management

Exclude multiple folders using AWS S3 sync

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Amazon Simple Storage Service (Amazon S3) is a widely adopted cloud storage service that provides an interface for storing and retrieving virtually unlimited amounts of data. One common use case for S3 is synchronizing a local directory to an S3 bucket. However, in some scenarios, it becomes necessary to exclude certain folders while performing a sync operation. This article explores how to exclude multiple folders when using the aws s3 sync command.

Understanding AWS S3 Sync

AWS S3 sync is a powerful command that automatically copies, updates, or removes files between a local system and an S3 bucket or between two S3 buckets. This command is part of the AWS Command Line Interface (CLI) toolkit, providing versatile options for managing data in S3.

Basic Syntax

The basic syntax for the aws s3 sync command is as follows:

bash
aws s3 sync <source> <destination> <options>
  • <source>: The origin of the files (local directory or S3 bucket).
  • <destination>: The location where files are to be synced (S3 bucket or local directory).
  • <options>: Various flags and options to modify the behavior of the sync process.

Excluding Folders

At times, it is necessary to exclude certain folders during sync to avoid uploading unnecessary files or to adhere to certain restrictions or policies. AWS provides the --exclude option that allows users to specify which objects to omit from the synchronization process.

Excluding Multiple Folders

To exclude multiple folders, you can use the --exclude option repeatedly. This option uses Unix-style glob patterns to specify filenames and directories to exclude.

Example: Exclude Multiple Folders

Suppose you want to sync a local directory to an S3 bucket but exclude folders named logs and temp. Below is an example of how to do this:

bash
aws s3 sync /path/to/local/dir s3://your-bucket-name \
  --exclude 'logs/*' \
  --exclude 'temp/*'

In this example:

  • logs/*: Excludes all files and subdirectories within the logs folder.
  • temp/*: Excludes all files and subdirectories within the temp folder.

Wildcards and Patterns

You can use partial filenames or wildcard characters (*, ?, and []) to create patterns. The asterisk * can replace any sequence of characters, including none, while the question mark ? matches exactly one character.

Example: More Complex Patterns

Suppose we also want to exclude folders whose names begin with "backup", here's how it can be applied:

bash
1aws s3 sync /path/to/local/dir s3://your-bucket-name \
2  --exclude 'logs/*' \
3  --exclude 'temp/*' \
4  --exclude 'backup*/**'
  • backup*/**: Excludes all folders and files that begin with "backup" at any level from the local directory.

Inclusion Overrides

If you want to include certain files that may be inside the excluded folders, you can use the --include option, which overrides exclusions.

Example: Include Specific Files

Suppose you need to exclude all .tmp files, but include .config files in all directories:

bash
aws s3 sync /path/to/local/dir s3://your-bucket-name \
  --exclude '*.tmp' \
  --include '*.config'

In this setup:

  • *.tmp: Excludes all files with the .tmp extension.
  • *.config: Ensures that all files with the .config extension are included even if they match an exclude pattern.

Summary Table

Below is a summary table illustrating key patterns for the aws s3 sync command:

PatternMeaning
logs/*Excludes all files in the logs folder.
logs/**Excludes all files and subdirectories in logs.
*.tmpExcludes all files with a .tmp extension.
!important/*Initial ! is not supported by aws CLI; use --include to negate.
backup*/**Excludes all starts-with "backup" folders at any level.
--include '*.config'Includes all .config files overriding earlier exclusions.

Conclusion

Synchronizing data with AWS S3 while excluding certain folders is a straightforward yet very useful feature, particularly in complex environments where only essential data should be transferred. By leveraging the --exclude and --include options with patterns, you can precisely control the behavior of data sync operations. Knowing how to effectively use these options can streamline your data management tasks in AWS, saving both time and resources.


Course illustration
Course illustration

All Rights Reserved.