Old AMIs are about to become a ticking time bomb for your deployments.
Packer, the tool you likely use to build your Amazon Machine Images (AMIs), has a lifecycle. When you build a new AMI, Packer creates a new version. Over time, these old AMIs accumulate, consuming storage and, more importantly, posing a security risk if they aren’t patched. AWS has policies around AMI deprecation and deletion, and if you’re not managing them, you’ll eventually find your deployments failing because they can’t launch instances from an old, unsupported AMI.
Let’s see how this plays out in practice. Imagine you have a Packer build that runs nightly, creating a new my-app-base AMI.
{
"variables": {
"aws_region": "us-east-1",
"source_ami_filter_name": "ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*",
"ami_name_prefix": "my-app-base"
},
"builders": [
{
"type": "amazon-ebs",
"region": "{{user `aws_region`}}",
"source_ami_filter_type": "aws-marketplace",
"source_ami_filter_owner": "099720109477",
"source_ami_filter_name": "{{user `source_ami_filter_name`}}",
"ami_name": "{{user `ami_name_prefix`}}-{{timestamp}}",
"instance_type": "t3.micro",
"ssh_username": "ubuntu"
}
],
"provisioners": [
{
"type": "shell",
"inline": [
"sudo apt-get update -y",
"sudo apt-get upgrade -y",
"echo 'AMI built at $(date)' | sudo tee /etc/build_timestamp"
]
}
]
}
This configuration builds a new Ubuntu 20.04 AMI every time it runs, naming it with a timestamp. Without a lifecycle management strategy, you’ll end up with dozens, if not hundreds, of AMIs like:
my-app-base-20230115103000my-app-base-20230116103000my-app-base-20230117103000- … and so on.
The problem isn’t just storage; it’s that these AMIs will eventually become outdated. Security patches won’t be applied, and AWS might even start deprecating them, making them unusable for new deployments.
The core concept of managing AMI lifecycle is retention. You need to decide how many recent AMIs you want to keep and automatically delete the older ones. This is usually done by tagging your AMIs and then using a script or a dedicated tool to enforce your retention policy.
Here’s a common approach:
-
Tagging AMIs: When Packer builds an AMI, you can add tags. A common tag is
CreatedBy: Packerand aKeepUntil: YYYY-MM-DDorRetentionDays: 30. For example, you can modify the Packer build to include tags:"ami_tags": { "CreatedBy": "Packer", "Name": "{{user `ami_name_prefix`}}-{{timestamp}}", "RetentionDays": "30" } -
Automated Cleanup: You’ll need a mechanism to find and delete old AMIs. This can be a Lambda function triggered on a schedule (e.g., daily) or a script run by a cron job.
Here’s a Python script snippet using
boto3to find and deregister AMIs older than a specified retention period, based on theRetentionDaystag:import boto3 from datetime import datetime, timedelta, timezone ec2 = boto3.client('ec2', region_name='us-east-1') def lambda_handler(event, context): retention_days = 30 cutoff_date = datetime.now(timezone.utc) - timedelta(days=retention_days) print(f"Searching for AMIs created before: {cutoff_date.isoformat()}") paginator = ec2.get_paginator('describe_images') page_iterator = paginator.paginate( Owners=['self'], Filters=[ {'Name': 'tag:CreatedBy', 'Values': ['Packer']} ] ) for page in page_iterator: for image in page['Images']: creation_date = datetime.fromisoformat(image['CreationDate']).replace(tzinfo=timezone.utc) ami_id = image['ImageId'] ami_name = image.get('Name', 'N/A') # Check if RetentionDays tag exists and is a number retention_tag = next((tag for tag in image.get('Tags', []) if tag['Key'] == 'RetentionDays'), None) if retention_tag: try: ami_retention_days = int(retention_tag['Value']) if creation_date < (datetime.now(timezone.utc) - timedelta(days=ami_retention_days)): print(f"Deregistering AMI: {ami_name} ({ami_id}) created on {creation_date.isoformat()} (older than {ami_retention_days} days)") deregister_ami(ami_id) except ValueError: print(f"Skipping AMI {ami_id}: Invalid RetentionDays value: {retention_tag['Value']}") elif creation_date < cutoff_date: # Fallback if RetentionDays tag is missing, use global cutoff print(f"Deregistering AMI: {ami_name} ({ami_id}) created on {creation_date.isoformat()} (older than global {retention_days} days)") deregister_ami(ami_id) def deregister_ami(ami_id): try: ec2.deregister_image(ImageId=ami_id) print(f"Successfully deregistered AMI: {ami_id}") # You might also want to delete associated EBS snapshots # find_and_delete_snapshots(ami_id) except Exception as e: print(f"Error deregistering AMI {ami_id}: {e}") # Example of finding and deleting snapshots (requires more logic) # def find_and_delete_snapshots(ami_id): # snapshots = ec2.describe_snapshots( # OwnerIds=['self'], # Filters=[{'Name': 'description', 'Values': [f'Created by AMI {ami_id}']}] # ) # for snapshot in snapshots['Snapshots']: # print(f"Deleting snapshot: {snapshot['SnapshotId']}") # ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId']) # lambda_handler(None, None) # For local testing
The most surprising part of AMI lifecycle management is how many resources you can inadvertently accumulate. It’s not just the AMIs themselves, but also the EBS snapshots that are created when you build an AMI. Deregistering an AMI does not automatically delete its associated snapshots. You need a separate step in your cleanup process to identify and delete these orphaned snapshots, which can quickly become a significant cost.
Once you have your AMI cleanup script running, the next logical step is to think about how to manage the versions of your application running on these AMIs.