The most surprising thing about S3 Glacier restore is that the "expedited" option isn’t always faster than "standard."
Let’s see it in action. Imagine you’ve got a few terabytes of data archived in Glacier, and you need a critical report from yesterday. You’re looking at the restore options:
- Expedited: "Restores data from Amazon S3 Glacier Flexible Retrieval in minutes (1–5 minutes)."
- Standard: "Restores data from Amazon S3 Glacier Flexible Retrieval in 3–5 hours."
- Bulk: "Restores data from Amazon S3 Glacier Flexible Retrieval in 5–12 hours."
You’d instinctively pick Expedited, right? But here’s the catch: Expedited restore is a shared resource. If other accounts are also requesting Expedited restores, you might hit a queue, and your "minutes" can stretch into hours, potentially longer than a Standard restore would have taken. Standard and Bulk, on the other hand, are essentially dedicated capacity for your request, just with different time-to-completion SLAs based on cost.
The system works by moving your archived data from the deep storage layer of Glacier to a temporary, readily accessible location that S3 can serve from. The tiers dictate how much effort and what kind of resources S3 allocates to this retrieval process.
- Expedited: This tier uses a small pool of high-priority, on-demand processing resources. It’s like flagging your request to the front of a very short, but sometimes busy, express lane. It’s designed for those "I need this now, and I’m willing to pay a premium" scenarios. The cost is significantly higher per GB and per request.
- Standard: This is the workhorse. It uses a larger, more predictable pool of resources. It’s the "normal" lane – you get in line, and it’s processed in a timely manner within its SLA. The cost is moderate.
- Bulk: This tier is the most cost-effective for large amounts of data. It uses the lowest priority processing, meaning your request might take longer to start but will eventually be fulfilled. It’s like the cargo shipping lane – cheapest, but takes the longest.
Your mental model should be one of resource allocation. Expedited is like a high-performance compute cluster you temporarily borrow for a few minutes; Standard is like a standard EC2 instance; Bulk is like a batch processing job. The underlying data is the same, but the pathway to access it varies drastically in cost and speed.
When you initiate a restore, you’re essentially creating a retrieval job. You specify the object(s) you want, the destination bucket (usually a different S3 bucket, not your original Glacier bucket), and the retrieval tier. S3 then queues this job.
aws glacier restore-object --account-id - --vault-name my-vault --object-name my-archive.zip --restore-request '{"Days": 7, "GlacierJobParameters": {"Tier": "Standard"}}'
This command initiates a Standard restore for my-archive.zip from my-vault, making it available for 7 days. You’ll get a jobid back. You can then check the job status:
aws glacier describe-job --account-id - --vault-name my-vault --job-id <your-job-id>
The key levers you control are the Tier (Expedited, Standard, Bulk) and Days (how long the restored object will be available in S3). The cost is a function of the tier, the amount of data, and the duration it’s kept in S3.
The most common pitfall is assuming Expedited is always the fastest. If you have a very large object or multiple large objects to restore simultaneously, and you hit a "hot" moment for Expedited restores, your job might get delayed. It’s a shared resource pool, and while it’s designed for speed, extreme demand can cause contention. For predictable access times, especially with large datasets, Standard often offers a better balance of speed and cost if you don’t absolutely need it in under an hour.
The next concept you’ll grapple with is how to automate these restores for frequently accessed data that’s still cost-effectively stored in Glacier.