Exclude multiple folders using AWS S3 sync
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Amazon Simple Storage Service (Amazon S3) is a widely adopted cloud storage service that provides an interface for storing and retrieving virtually unlimited amounts of data. One common use case for S3 is synchronizing a local directory to an S3 bucket. However, in some scenarios, it becomes necessary to exclude certain folders while performing a sync operation. This article explores how to exclude multiple folders when using the aws s3 sync command.
Understanding AWS S3 Sync
AWS S3 sync is a powerful command that automatically copies, updates, or removes files between a local system and an S3 bucket or between two S3 buckets. This command is part of the AWS Command Line Interface (CLI) toolkit, providing versatile options for managing data in S3.
Basic Syntax
The basic syntax for the aws s3 sync command is as follows:
<source>: The origin of the files (local directory or S3 bucket).<destination>: The location where files are to be synced (S3 bucket or local directory).<options>: Various flags and options to modify the behavior of the sync process.
Excluding Folders
At times, it is necessary to exclude certain folders during sync to avoid uploading unnecessary files or to adhere to certain restrictions or policies. AWS provides the --exclude option that allows users to specify which objects to omit from the synchronization process.
Excluding Multiple Folders
To exclude multiple folders, you can use the --exclude option repeatedly. This option uses Unix-style glob patterns to specify filenames and directories to exclude.
Example: Exclude Multiple Folders
Suppose you want to sync a local directory to an S3 bucket but exclude folders named logs and temp. Below is an example of how to do this:
In this example:
logs/*: Excludes all files and subdirectories within thelogsfolder.temp/*: Excludes all files and subdirectories within thetempfolder.
Wildcards and Patterns
You can use partial filenames or wildcard characters (*, ?, and []) to create patterns. The asterisk * can replace any sequence of characters, including none, while the question mark ? matches exactly one character.
Example: More Complex Patterns
Suppose we also want to exclude folders whose names begin with "backup", here's how it can be applied:
backup*/**: Excludes all folders and files that begin with "backup" at any level from the local directory.
Inclusion Overrides
If you want to include certain files that may be inside the excluded folders, you can use the --include option, which overrides exclusions.
Example: Include Specific Files
Suppose you need to exclude all .tmp files, but include .config files in all directories:
In this setup:
*.tmp: Excludes all files with the.tmpextension.*.config: Ensures that all files with the.configextension are included even if they match an exclude pattern.
Summary Table
Below is a summary table illustrating key patterns for the aws s3 sync command:
| Pattern | Meaning |
logs/* | Excludes all files in the logs folder. |
logs/** | Excludes all files and subdirectories in logs. |
*.tmp | Excludes all files with a .tmp extension. |
!important/* | Initial ! is not supported by aws CLI; use --include to negate. |
backup*/** | Excludes all starts-with "backup" folders at any level. |
--include '*.config' | Includes all .config files overriding earlier exclusions. |
Conclusion
Synchronizing data with AWS S3 while excluding certain folders is a straightforward yet very useful feature, particularly in complex environments where only essential data should be transferred. By leveraging the --exclude and --include options with patterns, you can precisely control the behavior of data sync operations. Knowing how to effectively use these options can streamline your data management tasks in AWS, saving both time and resources.

