RDS Zero-ETL Integration with Redshift can feel like magic, but it’s just a clever orchestration of existing AWS services.

Let’s see it in action. Imagine you have a PostgreSQL database in RDS that’s constantly being updated with new customer orders. You want these orders to appear in your Redshift data warehouse in near real-time so your analytics team can track sales trends.

Here’s a simplified view of the data flow:

  1. RDS PostgreSQL:

    CREATE TABLE orders (
        order_id SERIAL PRIMARY KEY,
        customer_id INT,
        order_date TIMESTAMP,
        amount DECIMAL(10, 2)
    );
    
    INSERT INTO orders (customer_id, order_date, amount) VALUES (101, NOW(), 55.75);
    INSERT INTO orders (customer_id, order_date, amount) VALUES (102, NOW(), 120.00);
    
  2. RDS Zero-ETL Integration: You configure an integration in the AWS console. This integration essentially sets up a managed process. It doesn’t directly "push" data. Instead, it leverages AWS Database Migration Service (DMS) under the hood. DMS captures changes from RDS using its transaction log (WAL for PostgreSQL) and streams them.

  3. AWS DMS Replication Instance: A DMS replication instance is provisioned (you’ll specify an instance class like dms.t3.medium). This instance is responsible for reading the change data capture (CDC) events from your RDS source.

  4. AWS DMS Replication Task: A replication task is configured to read from the RDS source and write to the Redshift target. This task specifies which tables to replicate.

  5. Redshift: The data lands in Redshift, typically into staging tables first, and then is loaded into your final analytical tables.

    -- Example of a staging table in Redshift
    CREATE TABLE staging_orders (
        order_id INT,
        customer_id INT,
        order_date TIMESTAMP,
        amount DECIMAL(10, 2)
    );
    
    -- Example of a final analytical table
    CREATE TABLE fact_orders (
        order_id INT PRIMARY KEY,
        customer_id INT,
        order_date DATE, -- Often you'd transform timestamps to dates
        amount DECIMAL(10, 2)
    );
    

The Zero-ETL integration simplifies this by abstracting away the direct configuration of DMS replication instances and tasks for this specific use case. It’s designed to be a managed, end-to-end solution for streaming transactional data from RDS to Redshift.

The core problem Zero-ETL solves is the latency and complexity of traditional ETL processes. Instead of batching data, waiting for ETL jobs to run, and then loading it, Zero-ETL aims for near real-time synchronization. This is crucial for use cases like fraud detection, real-time inventory management, or immediate sales dashboards where up-to-the-minute data is essential.

Internally, it’s a managed DMS setup. You define the source (RDS) and the target (Redshift) within the Zero-ETL console. AWS provisions the necessary DMS infrastructure, configures the CDC on the RDS instance, and sets up the replication task to continuously stream changes. The integration handles schema evolution to a degree, though significant changes might still require manual intervention. You control the scope of replication by selecting specific schemas or tables. You also set parameters like batch size for Redshift loading and transformation rules within DMS if needed.

The most surprising thing about how RDS Zero-ETL handles schema changes is its reliance on DMS’s underlying change data capture mechanism. When you alter a table in RDS (e.g., ALTER TABLE orders ADD COLUMN quantity INT;), DMS captures this DDL event. For many common DDL operations, DMS can automatically propagate the change to Redshift, creating the new column in your target table. This isn’t by magic; it’s because DMS is actively monitoring the transaction log and can interpret and apply these DDL statements to the target schema, provided the target schema is compatible. However, complex DDL, like dropping a column or changing data types in incompatible ways, can still break the stream and require manual resolution in the Redshift target.

The next concept you’ll likely explore is how to handle data transformations and filtering within the Zero-ETL pipeline, or managing data consistency across multiple target tables in Redshift.

Want structured learning?

Take the full Rds course →