The RDS Data API lets you run SQL queries against your RDS databases without managing connection pools, which is often seen as a magic bullet for serverless applications.
Here’s the Data API in action with AWS Lambda, showing how a Python function can directly execute a SELECT statement against an Aurora PostgreSQL cluster:
import json
import boto3
rds_data = boto3.client('rds-data')
def lambda_handler(event, context):
db_cluster_arn = "arn:aws:rds:us-east-1:123456789012:cluster:my-rds-cluster"
secret_arn = "arn:aws:secretsmanager:us-east-1:123456789012:secret:my-rds-credentials-abcdef"
db_name = "mydatabase"
sql = "SELECT NOW()"
try:
response = rds_data.execute_statement(
secretArn=secret_arn,
resourceArn=db_cluster_arn,
sql=sql,
database=db_name
)
return {
'statusCode': 200,
'body': json.dumps({'result': response['records']})
}
except Exception as e:
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}
This code doesn’t establish a persistent connection. Instead, for each execute_statement call, the Data API negotiates a new connection from a temporary pool managed by AWS, runs the query, and then releases it. This is the core of its "serverless" appeal: no idle connections, no provisioning EC2 instances for a connection broker, and no managing max_connections settings on your database. The service abstracts away the complexities of database connectivity.
The problem the Data API solves is the inherent tension between the ephemeral nature of serverless compute (like Lambda functions) and the persistent, stateful nature of traditional relational databases. Lambda functions spin up and down rapidly, and establishing a database connection is a relatively slow operation. If every Lambda invocation had to create a new connection, performance would suffer dramatically, and you’d quickly hit database connection limits. Traditional solutions involve connection pooling (e.g., using RDS Proxy or running a dedicated pooling service) which adds operational overhead. The Data API bypasses this by acting as a proxy itself, managing connections on your behalf.
Internally, when you call rds_data.execute_statement, AWS invokes a Lambda function within the Data API service. This function is part of a pool of pre-warmed workers. It picks up your request, retrieves credentials from Secrets Manager, establishes a connection to your RDS instance (or Aurora cluster), executes your SQL, formats the results, and returns them. The connection is then returned to a temporary pool managed by the Data API service itself, ready for the next request. You control its behavior through the resourceArn (your RDS cluster or instance endpoint), secretArn (your database credentials), and database name.
The most surprising aspect for many is how the Data API handles transactions. You can explicitly begin, commit, or rollback transactions using begin_transaction, commit_transaction, and rollback_transaction calls. However, if you don’t explicitly manage them, each execute_statement call is treated as an auto-commit operation. This means that if you have multiple statements that logically belong to a single unit of work, you must wrap them in an explicit transaction block. Failing to do so can lead to partial updates if an intermediate statement fails, leaving your data in an inconsistent state.
The next hurdle is handling large result sets efficiently and understanding the cost implications of frequent, small requests versus fewer, larger ones.