S3 Object Lambda lets you transform data as it’s retrieved from S3, without changing the original object.
Let’s see it in action. Imagine you have a large CSV file in S3, and you want to retrieve only specific columns for a particular application. Instead of downloading the whole file and parsing it client-side, S3 Object Lambda can do this transformation for you.
Here’s a simplified setup:
- S3 Bucket: Your source data bucket (e.g.,
my-original-data-bucket). - Lambda Function: A function that performs the transformation (e.g.,
csvColumnSelectorLambda). This function will receive the S3 object data, process it, and return the transformed data. - S3 Access Point: An S3 Access Point configured to use your Lambda function for transformations. This is the entry point for your read requests.
- Application: The client that requests data from the S3 Access Point.
When your application requests an object (e.g., data.csv) from the S3 Access Point, here’s the flow:
- The request goes to the S3 Access Point.
- The Access Point invokes your configured Lambda function, passing the original object’s S3 URI and any requested transformation parameters.
- Your Lambda function reads the object from the original S3 bucket.
- It performs the transformation (e.g., filters columns, redacts PII, converts formats).
- It returns the transformed data back to S3 Object Lambda.
- S3 Object Lambda then provides this transformed data to your application.
The core problem S3 Object Lambda solves is the need to process data on demand when it’s read, without the overhead of managing separate, transformed copies of your data or complex client-side parsing logic. This is particularly useful for:
- Data Redaction: Removing sensitive information (like PII) before it reaches certain users or applications.
- Format Conversion: Providing data in a format that’s easier for specific applications to consume (e.g., converting JSON to CSV, or a subset of JSON fields to a simpler structure).
- Image/Video Processing: Resizing images, generating thumbnails, or converting video formats on the fly.
- Data Filtering: Selecting specific rows or columns from large datasets.
The mental model is that you’re creating a "view" or "proxy" for your S3 data. The original data remains untouched and immutable. When you request data through the Object Lambda Access Point, you’re not getting the raw object; you’re getting the result of a computation applied to that object.
Here’s a look at the configuration for an S3 Access Point that uses S3 Object Lambda. You’ll associate a Lambda function ARN with the Access Point.
{
"Name": "my-object-lambda-access-point",
"Bucket": "my-original-data-bucket",
"PublicAccessBlockConfiguration": {
"BlockPublicAcls": true,
"IgnorePublicAcls": true,
"BlockPublicPolicy": true,
"RestrictPublicBuckets": true
},
"ObjectLambdaConfiguration": {
"SupportingAccessPoint": "arn:aws:s3-accesspoint:us-east-1:123456789012:accesspoint/my-read-only-access-point",
"TransformationConfigurations": [
{
"Actions": ["GetObject"],
"ContentTransformation": {
"LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:csvColumnSelectorLambda"
}
}
]
}
}
In this configuration:
Bucket: The S3 bucket containing the original objects.SupportingAccessPoint: A regular S3 Access Point that the Object Lambda Access Point uses to retrieve the original object. This ensures proper IAM permissions are handled.TransformationConfigurations: Defines what actions trigger transformations and which Lambda function to use. Here,GetObjectrequests are routed tocsvColumnSelectorLambda.
The Lambda function itself needs to be designed to handle the specific event structure S3 Object Lambda provides. It receives a payload containing details about the request, including the S3 URI of the object to fetch and any user-provided x-amz-object-lambda-access-point-payload headers. The function’s response must include the transformed object data.
Crucially, the Lambda function is invoked by S3 Object Lambda, not directly by your application. Your application only ever interacts with the Object Lambda Access Point. The permissions for the Lambda function to read from the SupportingAccessPoint (and thus the original bucket) are managed via IAM roles.
A common point of confusion is how S3 Object Lambda handles errors from the Lambda function. If your Lambda function fails (e.g., throws an unhandled exception, times out), S3 Object Lambda will return a 500 Internal Server Error to your application. You can also configure specific error handling within the Lambda function itself to return more granular error responses, often using custom HTTP status codes and messages passed back through the Object Lambda event structure.
The next step after successfully transforming data on read is to consider how to manage the lifecycle and versioning of your original objects when they are frequently accessed and transformed.