aws s3 sync is a powerful tool for keeping local directories and S3 buckets in sync, but its behavior with exclude patterns and incremental updates can be surprisingly nuanced.

Let’s see it in action. Imagine we have a local directory my-local-data with a few files:

my-local-data/
├── file1.txt
├── file2.log
└── important/
    └── config.yaml

And a corresponding S3 bucket my-s3-bucket.

First sync:

aws s3 sync my-local-data s3://my-s3-bucket

This will upload all files. Now, let’s say we want to sync again, but we want to exclude all .log files. We might try this:

aws s3 sync my-local-data s3://my-s3-bucket --exclude "*.log"

This seems straightforward, but the first time aws s3 sync runs with an exclude pattern, it doesn’t just skip the excluded files during the upload. It also removes any files in the destination (S3) that match the exclude pattern and are not present in the source (local directory). This is because sync by default attempts to make the destination exactly like the source, considering the exclude pattern.

This behavior is often counter-intuitive. Many expect --exclude to simply prevent files from being uploaded or downloaded. However, aws s3 sync operates on a "make destination match source" principle. If a file is excluded from the source, and it exists in the destination, sync treats it as something to be removed from the destination to achieve parity, unless that file is also present in the source.

The core problem aws s3 sync solves is efficiently mirroring a directory structure between your local machine and an S3 bucket. It does this by performing an incremental update: it compares the source and destination, identifies differences (new files, modified files, missing files), and then performs only the necessary operations (uploads, downloads, deletes) to make the destination match the source.

Here’s how you control it. The --exclude and --include options are pattern-based filters. They use a glob-like syntax. When sync evaluates a file, it checks it against the list of exclude and include patterns in order. The first pattern that matches determines the file’s fate. If no pattern matches, the file is included by default.

The order of --exclude and --include arguments matters. Patterns are evaluated in the order they appear on the command line.

Let’s say you want to sync everything except .log files, but you do want to include a specific .log file named important.log.

aws s3 sync my-local-data s3://my-s3-bucket --exclude "*.log" --include "important.log"

In this case, sync first sees *.log and marks all .log files for exclusion. Then, it sees important.log. Since important.log is a specific match, it overrides the general exclusion for *.log, and this file will be included.

The "incremental" part of sync relies on comparing metadata, primarily file size and last modified timestamp. For S3, sync fetches a list of objects and their metadata. For the local directory, it reads file metadata. It then compares these.

If a file exists in both source and destination:

  • If timestamps differ, or if sizes differ (and metadata comparison is enabled/possible), the file is considered modified and will be uploaded from source to destination.
  • If both match, the file is skipped.

The most surprising thing about aws s3 sync for many is how it handles deletions when --delete is used in conjunction with --exclude or --include. If you use --delete and --exclude "*.log", sync will delete files from the destination that are not present in the source and match *.log. It will not delete files from the destination that are not present in the source but do not match *.log. This is because the exclusion pattern applies to what sync considers for deletion as well.

If you want to ensure that a specific file in your local directory is never uploaded, even if it’s modified, you can use a combination of --exclude and --include with a broad exclude.

Consider this scenario: you have a logs directory, and you want to sync everything in my-local-data except the logs directory, but you do want to sync a specific-log.txt file that happens to be in the root.

aws s3 sync my-local-data s3://my-s3-bucket --exclude "logs/*" --include "specific-log.txt"

This will sync file1.txt, file2.log, and important/config.yaml, but it will skip the entire logs directory. If specific-log.txt was inside the logs directory, the --include "specific-log.txt" pattern would still not cause it to be uploaded because the broader logs/* exclusion is evaluated first and prevents sync from even looking inside the logs directory for other matches. The order of operations is critical.

This exact behavior with --delete and --exclude/--include is a common source of data loss or unexpected state. Always test --delete operations with a dry run (--dryrun) first, especially when complex filtering is involved.

The next common pitfall is understanding how sync handles empty directories. By default, aws s3 sync does not sync empty directories. To include them, you need to use the --acl bucket-owner-full-control flag (which is a bit of a hack, but it forces sync to process directories) or ensure there’s at least one file within the directory.

Want structured learning?

Take the full S3 course →