S3 delete markers are the silent hoarders of your bucket, and if you’re not careful, they’ll cost you a fortune in storage.
Let’s see this in action. Imagine you have an object my-object.txt in a versioned S3 bucket. You upload it, then you upload a new version, then you delete it.
# Initial upload
aws s3api put-object --bucket my-bucket --key my-object.txt --body file://local-file.txt
# Upload a new version
aws s3api put-object --bucket my-bucket --key my-object.txt --body file://another-file.txt
# Delete the object
aws s3api delete-object --bucket my-bucket --key my-object.txt
Now, if you list the versions of my-object.txt in your bucket:
aws s3api list-object-versions --bucket my-bucket --prefix my-object.txt
You won’t see my-object.txt listed directly. But if you run list-object-versions, you’ll see something like this:
{
"Versions": [
{
"Key": "my-object.txt",
"VersionId": "ExampleVersionID123",
"IsLatest": false,
"LastModified": "2023-10-27T10:00:00.000Z",
"ETag": "\"abcde12345\"",
"Size": 1024
},
{
"Key": "my-object.txt",
"VersionId": "ExampleVersionID456",
"IsLatest": true,
"LastModified": "2023-10-27T10:05:00.000Z",
"ETag": "\"fghij67890\"",
"Size": 2048
}
],
"DeleteMarkers": [
{
"Key": "my-object.txt",
"VersionId": "ExampleDeleteMarkerID789",
"LastModified": "2023-10-27T10:10:00.000Z"
}
]
}
Notice the DeleteMarkers section. That delete marker is what makes the object appear deleted, but the underlying data from previous versions (the ones with IsLatest: false) is still stored in S3, and you’re still paying for it. This happens because S3 versioning, by default, never truly deletes data; it just adds layers of history.
The problem is that every delete-object operation on a versioned bucket creates a delete marker. If you perform many updates and deletes on the same objects without cleaning up, your bucket can accumulate a massive number of delete markers. These markers themselves don’t store data, but they do prevent S3 from garbage collecting older versions of the objects. So, while the marker is small, it holds onto potentially large amounts of older, unneeded data.
To clean up this clutter, you need to explicitly delete these delete markers. S3 provides a mechanism for this: the delete-object operation with a specific version-id.
Here’s how you can automate this cleanup. You’ll need to identify the delete markers and then send a delete request for each one.
First, let’s find all the delete markers for a specific object:
aws s3api list-object-versions --bucket my-bucket --prefix my-object.txt --query 'DeleteMarkers[*].[Key, VersionId]' --output text
This command will output Key VersionId pairs for all delete markers associated with objects under my-object.txt. For example, it might output:
my-object.txt ExampleDeleteMarkerID789
my-object.txt AnotherDeleteMarkerIDXYZ
Now, to delete a specific delete marker, you use the delete-object command, specifying the key and the version-id of the delete marker:
aws s3api delete-object --bucket my-bucket --key my-object.txt --version-id ExampleDeleteMarkerID789
When you delete a delete marker, S3 effectively removes that "deletion" record. If there are older versions of the object still present, the most recent non-deleted version will then become visible again. If there are no older versions, the object will truly disappear from the bucket.
The most common scenario where delete markers become a problem is in buckets used for frequent data ingestion and lifecycle management, especially when objects are frequently overwritten or deleted. Think of logs, temporary files, or datasets that are updated regularly. Without a cleanup strategy, these buckets can grow unexpectedly.
A practical approach is to use AWS Lambda and CloudWatch Events (now EventBridge) to periodically scan your buckets for delete markers and delete them. You can set up a Lambda function that uses list-object-versions to find delete markers and then iterates through them, issuing delete-object commands.
Here’s a snippet of what a Python Lambda function might look like:
import boto3
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket_name = 'my-bucket'
paginator = s3.get_paginator('list_object_versions')
for page in paginator.paginate(Bucket=bucket_name):
if 'DeleteMarkers' in page:
for marker in page['DeleteMarkers']:
print(f"Deleting delete marker for key: {marker['Key']}, version ID: {marker['VersionId']}")
s3.delete_object(
Bucket=bucket_name,
Key=marker['Key'],
VersionId=marker['VersionId']
)
return {
'statusCode': 200,
'body': 'Delete marker cleanup complete.'
}
This function iterates through all object versions and delete markers in the bucket. For each delete marker found, it issues a delete_object call with the specific VersionId. This effectively removes the marker, allowing S3 to potentially reclaim space from older object versions if they are no longer needed and are past their retention periods.
Another crucial aspect to consider is the interaction with S3 Lifecycle policies. Lifecycle rules can be configured to expire objects or previous versions. However, a delete marker itself is not an object version and won’t be directly expired by a standard "Expire current versions of objects" rule. You need a specific "Noncurrent version expiration" rule combined with a delete marker cleanup strategy to truly reclaim storage. A lifecycle rule like {"NoncurrentVersionExpiration": {"NoncurrentDays": 30}} will expire noncurrent object versions after 30 days, but it won’t touch the delete markers. You must delete the delete markers first for the older versions to become "noncurrent" and eligible for expiration.
The most surprising thing is that even after you delete a delete marker, the underlying data for older versions isn’t immediately gone. S3 is a distributed system, and data is often replicated. Deleting a marker simply changes the metadata pointer, making older versions eligible for eventual garbage collection by S3’s internal processes. You’re paying for that storage until S3’s background jobs deem it safe to purge.
Finally, after you’ve cleaned up all your delete markers, you might notice that your bucket’s object count hasn’t changed much. This is because the objects themselves are still there, just that their "latest" version is now a previous version, and the delete marker that hid it is gone. The next thing you’ll likely run into is figuring out how to actually delete old object versions that are no longer needed.