FinOps: Beyond Cloud Cost Cutting

Database storage and query costs are often the biggest line items in cloud infrastructure bills, but they’re also the most amenable to optimization.

Let’s look at an example of a PostgreSQL database and how we can tune it.

-- Sample table
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    product_id INT,
    order_date TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    quantity INT,
    price NUMERIC(10, 2),
    status VARCHAR(50)
);

-- Sample data insertion
INSERT INTO orders (customer_id, product_id, quantity, price, status) VALUES
(101, 201, 2, 19.99, 'Shipped'),
(102, 205, 1, 45.50, 'Processing'),
(101, 203, 3, 10.00, 'Delivered');

-- A common query that might be slow and expensive
SELECT
    o.order_id,
    o.order_date,
    c.customer_name,
    p.product_name,
    o.quantity,
    o.price
FROM
    orders o
JOIN
    customers c ON o.customer_id = c.customer_id
JOIN
    products p ON o.product_id = p.product_id
WHERE
    o.order_date >= '2023-01-01' AND o.status = 'Shipped';

The most surprising truth about database cost optimization is that it’s rarely about "faster queries" in the sense of reducing latency for a single request. It’s almost always about reducing the amount of work the database has to do, which directly translates to less CPU, I/O, and memory usage, and therefore, lower costs. This means optimizing storage and query execution patterns together.

Storage Optimization

Databases consume storage for data, indexes, and transaction logs. Each has different optimization strategies.

1. Data Compression: Most databases offer data compression at various levels. For PostgreSQL, this often involves extensions like pg_partman for partitioning (which can then be combined with compression on older partitions) or using features like TOAST (The Oversized-Attribute Storage Technique) which automatically compresses large fields.

Diagnosis: Check table and index sizes.

SELECT
    pg_size_pretty(pg_total_relation_size('orders'));

Fix: For TOASTable columns (like TEXT, BYTEA, JSONB, VARCHAR), compression is often automatic. For other types or to enforce compression on larger tables, consider partitioning and then using compression on older, less accessed partitions. For example, with pg_partman, you can configure automatic detachment and compression of older partitions.
```
-- Example of enabling compression on a table (hypothetical, depends on extension)
ALTER TABLE orders SET (compression = 'lz4'); -- This syntax is illustrative, actual command depends on the compression method/extension
```
Why it works: Compression reduces the number of bytes that need to be read from disk and stored, directly lowering storage costs and improving I/O performance for queries that need to scan large amounts of data.

2. Table Partitioning: For large tables, especially those with time-series data like orders, partitioning can significantly reduce query costs and storage management overhead. Partitioning splits a large table into smaller, more manageable pieces based on a key (e.g., date).

Diagnosis: Identify large tables and their access patterns. If queries frequently filter by date, partitioning by date is a strong candidate.

SELECT
    relname,
    pg_size_pretty(pg_total_relation_size(oid)) AS total_size
FROM
    pg_class
WHERE
    relkind = 'r' -- 'r' for relation (table)
ORDER BY
    pg_total_relation_size(oid) DESC
LIMIT 10;

Fix: Implement declarative partitioning in PostgreSQL (version 10+). For example, partition orders by order_date into monthly partitions.

-- Create the parent table
CREATE TABLE orders (
    order_id SERIAL,
    customer_id INT,
    product_id INT,
    order_date TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    quantity INT,
    price NUMERIC(10, 2),
    status VARCHAR(50)
) PARTITION BY RANGE (order_date);

-- Create partitions (example for one month)
CREATE TABLE orders_2023_01 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01 00:00:00+00') TO ('2023-02-01 00:00:00+00');

-- Use pg_partman for automated management
-- Install and configure pg_partman extension
-- Then, use its functions to create and manage partitions automatically.

Why it works: Queries that filter by the partition key (e.g., WHERE order_date >= '2023-01-01') can prune partitions, meaning the database only scans relevant partitions, drastically reducing I/O and CPU. Older, unneeded partitions can be detached, archived, or dropped without affecting the active data.

3. Index Management: Unused or redundant indexes consume storage and slow down writes. Conversely, missing indexes on frequently queried columns can lead to full table scans.

Diagnosis: Identify unused indexes.

SELECT
    schemaname,
    relname,
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS reads,
    idx_tup_read AS tuples_read
FROM
    pg_stat_user_indexes
WHERE
    idx_scan = 0 AND indexrelname NOT LIKE '%_pkey' -- Exclude primary key indexes
ORDER BY
    pg_relation_size(indexrelid) DESC;

Fix: Drop unused indexes.

DROP INDEX IF EXISTS unused_index_name;

Why it works: Removing indexes reduces storage footprint and speeds up INSERT, UPDATE, and DELETE operations because fewer index structures need to be maintained.

Query Cost Optimization

Optimizing queries is about reducing the number of blocks read from disk and the amount of CPU/memory used.

1. Indexing Strategy: Ensure appropriate indexes exist for WHERE clauses, JOIN conditions, and ORDER BY clauses.

Diagnosis: Use EXPLAIN ANALYZE on your slow queries. Look for "Seq Scan" (Sequential Scan) on large tables where a targeted index could be used, and "Bitmap Heap Scan" with a large number of rows fetched.

EXPLAIN ANALYZE
SELECT
    o.order_id,
    o.order_date,
    c.customer_name,
    p.product_name,
    o.quantity,
    o.price
FROM
    orders o
JOIN
    customers c ON o.customer_id = c.customer_id
JOIN
    products p ON o.product_id = p.product_id
WHERE
    o.order_date >= '2023-01-01' AND o.status = 'Shipped';

Fix: Create composite indexes that cover the WHERE and JOIN conditions. The order of columns in the index matters.

CREATE INDEX idx_orders_date_status ON orders (order_date, status);
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_orders_product_id ON orders (product_id);

Why it works: Indexes allow the database to quickly locate specific rows without scanning the entire table, dramatically reducing I/O and CPU for filtered queries. A composite index on (order_date, status) can efficiently serve queries filtering on both columns.

2. Query Rewriting & Avoiding SELECT *: Fetch only the columns you need. Avoid SELECT *.

Diagnosis: Review application code for SELECT *. Check query plans for unnecessary columns being fetched.

Fix: Explicitly list the required columns.

-- Instead of: SELECT * FROM orders WHERE order_id = 123;
SELECT order_id, customer_id, order_date, price FROM orders WHERE order_id = 123;

Why it works: Reduces the amount of data read from disk (if indexes don’t cover all requested columns) and transferred over the network, saving I/O and bandwidth costs.

3. Understanding VACUUM and ANALYZE: PostgreSQL uses MVCC (Multi-Version Concurrency Control). VACUUM reclaims space from dead rows, and ANALYZE updates statistics for the query planner. Autovacuum usually handles this, but misconfiguration or very high transaction rates can lead to issues.

Diagnosis: Check for high xid wraparound risks or stale statistics.

SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) DESC LIMIT 5;
-- Check autovacuum settings
SHOW autovacuum;
SHOW autovacuum_vacuum_threshold;
SHOW autovacuum_analyze_threshold;

Fix: Ensure autovacuum is enabled and properly tuned. Manually run VACUUM ANALYZE on heavily modified tables if needed.
```
VACUUM ANALYZE orders;
```
Why it works: Proper vacuuming prevents table bloat and keeps statistics up-to-date, ensuring the query planner makes optimal choices, which indirectly reduces query costs.

4. Connection Pooling: While not directly a query/storage cost, establishing database connections is resource-intensive. Connection pooling reduces this overhead.

Diagnosis: High number of idle connections in the database, or frequent connection/disconnection logs.
```
SELECT count(*), state FROM pg_stat_activity GROUP BY state;
```
Fix: Implement a connection pooler like PgBouncer or use built-in pooling in application frameworks. Configure it with appropriate pool sizes (e.g., max_client_conn = 100 in PgBouncer).
Why it works: Reuses existing connections, reducing CPU and memory overhead on the database server for connection management, thereby lowering overall operational costs.

The next problem you’ll likely encounter after optimizing storage and query costs is managing the lifecycle of your data and ensuring that older, less frequently accessed data is moved to cheaper storage tiers or archived.