S3 Analytics is often misunderstood as just a reporting tool; in reality, it’s a powerful mechanism for optimizing storage costs by identifying data that can be moved to cheaper storage tiers.

Let’s see how it works. Imagine you have an S3 bucket my-production-bucket storing logs generated by your web servers. Over time, these logs become less frequently accessed but are still important for compliance or historical analysis. You want to move them to S3 Standard-Infrequent Access (S3 Standard-IA) to save money.

Here’s the setup:

aws s3api put-bucket-analytics-configuration \
    --bucket my-production-bucket \
    --id DailyLogAnalysis \
    --analytics-configuration '{
        "Id": "DailyLogAnalysis",
        "StorageClassAnalysis": {
            "DataExport": {
                "Destination": "my-analytics-destination-bucket",
                "Format": "JSON",
                "Prefix": "s3-analytics/daily-logs/"
            },
            "Enable": true
        }
    }'

This command enables S3 Storage Class Analysis for my-production-bucket. It tells S3 to periodically analyze object access patterns. The DataExport section configures where the analysis results will be sent: to a separate bucket named my-analytics-destination-bucket, in JSON format, under the prefix s3-analytics/daily-logs/.

Once enabled, S3 begins collecting data. It doesn’t immediately tell you "move this object." Instead, it gathers statistics over time, typically a rolling window of 30 days. The analysis happens automatically, and you don’t need to trigger it. The key is that S3 observes actual access patterns, not just LastModified dates. An object could be old but still frequently accessed, making it unsuitable for IA.

After a few days, you’ll start seeing files appearing in your my-analytics-destination-bucket. You can list them:

aws s3 ls s3://my-analytics-destination-bucket/s3-analytics/daily-logs/

You’ll find JSON files, often gzipped. Let’s look at the contents of one, after decompressing:

{
  "purpose": "Storage Class Analysis",
  "version": "1.0",
  "bucketArn": "arn:aws:s3:::my-production-bucket",
  "மையாகId": "DailyLogAnalysis",
  "startDate": "2023-10-27T00:00:00Z",
  "endDate": "2023-10-27T23:59:59Z",
  "data": [
    {
      "key": "logs/webserver/access_log_2023-10-20.gz",
      "size": 1048576,
      "storageBytes": 1048576,
      "storageClass": "STANDARD",
      "timeOfFirstByteAccess": "2023-10-27T10:30:00Z",
      "dataRetrievalBytes": 0,
      "daysSinceFirstByteAccess": 0,
      "daysSinceMostRecentByteAccess": 0,
      "accessFrequency": 1,
      "storageMetrics": {
        "duration": 30,
        "bytes": 1048576
      }
    },
    {
      "key": "logs/webserver/access_log_2023-10-19.gz",
      "size": 1258291,
      "storageBytes": 1258291,
      "storageClass": "STANDARD",
      "timeOfFirstByteAccess": "2023-10-27T11:00:00Z",
      "dataRetrievalBytes": 0,
      "daysSinceFirstByteAccess": 0,
      "daysSinceMostRecentByteAccess": 0,
      "accessFrequency": 1,
      "storageMetrics": {
        "duration": 30,
        "bytes": 1258291
      }
    },
    // ... more objects
  ]
}

The crucial fields here are daysSinceMostRecentByteAccess and accessFrequency. S3 Storage Class Analysis calculates these based on actual GET requests observed over the analysis period. If daysSinceMostRecentByteAccess is high (e.g., > 30) and accessFrequency is low, the object is a prime candidate for a cheaper tier like S3 Standard-IA.

To act on this, you’d typically parse these JSON files, identify objects meeting your criteria (e.g., daysSinceMostRecentByteAccess > 60 and storageClass == "STANDARD"), and then use a lifecycle policy to transition them.

Here’s a lifecycle policy to move objects to S3 Standard-IA after 30 days and then to Glacier Instant Retrieval after another 60 days, but only if they haven’t been accessed recently:

{
  "Rules": [
    {
      "ID": "MoveLogsToIA",
      "Filter": {
        "Prefix": "logs/webserver/"
      },
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        }
      ],
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 30,
          "StorageClass": "STANDARD_IA"
        }
      ]
    },
    {
      "ID": "MoveLogsToGlacierIR",
      "Filter": {
        "Prefix": "logs/webserver/"
      },
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 90,
          "StorageClass": "GLACIER_IR"
        }
      ],
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 90,
          "StorageClass": "GLACIER_IR"
        }
      ]
    }
  ]
}

This lifecycle rule is applied to the bucket. The Days parameter in Transitions is critical. When combined with S3 Storage Class Analysis, you can refine these Days values. If your analysis shows objects are still being accessed after 30 days but are rarely touched after 60, you might adjust the Transitions to Days: 60 for STANDARD_IA. The analysis report is the input to your lifecycle policy decisions, not an automated action itself.

The most counterintuitive aspect of S3 Storage Class Analysis is that it doesn’t tell you when an object was last accessed, but rather calculates a daysSinceMostRecentByteAccess metric based on the entire observed period. This means a single access on day 29 of a 30-day analysis window can make an object appear "recently accessed" for the purpose of that report, even if it was only accessed once and then forgotten.

The next thing you’ll likely encounter is the need to automate the parsing of these S3 Analytics JSON reports and the dynamic updating of lifecycle policies based on cost-saving opportunities.

Want structured learning?

Take the full S3 course →