Pagination is how you break large datasets into smaller, manageable chunks for display. The two most common methods are "offset" and "cursor" pagination. At scale, one of these methods grinds to a halt while the other keeps humming.
Let’s see offset pagination in action. Imagine a table of users, users, with a million rows.
-- Fetching the first page of 100 users
SELECT * FROM users ORDER BY id LIMIT 100 OFFSET 0;
-- Fetching the second page (users 101-200)
SELECT * FROM users ORDER BY id LIMIT 100 OFFSET 100;
-- Fetching the 100th page (users 9901-10000)
SELECT * FROM users ORDER BY id LIMIT 100 OFFSET 9900;
This looks straightforward. The database skips OFFSET rows and then returns LIMIT rows. The problem arises when OFFSET becomes large.
Here’s the mental model for offset pagination: The database has to physically scan through OFFSET rows before it can even start collecting the LIMIT rows. For OFFSET 0, it’s fast. For OFFSET 9900, it’s not. It’s like looking for page 9900 in a book by flipping through every single page until you get there. The larger the offset, the longer it takes. This linear scan time means performance degrades drastically as you paginate deeper.
Now, let’s look at cursor pagination. Instead of saying "skip X items," you say "give me items after this specific one."
-- Fetching the first page of 100 users
SELECT * FROM users ORDER BY id LIMIT 100; -- No offset needed for the first page
-- To get the next page, you'd use the ID of the last item from the previous page.
-- Let's say the last user on page 1 had id = 95
SELECT * FROM users WHERE id > 95 ORDER BY id LIMIT 100;
-- If the last user on page 2 had id = 187
SELECT * FROM users WHERE id > 187 ORDER BY id LIMIT 100;
In cursor pagination, the database doesn’t need to scan through a large number of rows. It just needs to find the last seen item’s value (the "cursor") and then scan forward from there. The query becomes WHERE column > cursor_value ORDER BY column LIMIT page_size. This is a range scan, which is typically much more efficient than a large offset scan, especially when the ORDER BY column is indexed. The time to fetch any page is relatively constant, regardless of how deep you are in the dataset.
The problem cursor pagination solves is the performance cliff of offset pagination at scale. When you have millions or billions of records, fetching page 1000 using offset might take seconds, making the user experience unbearable. Cursor pagination keeps the fetch time for any page within milliseconds.
The key levers you control are the ORDER BY column and the LIMIT (page size). For cursor pagination to be truly effective, the ORDER BY column must be indexed. If it’s not, the database will still have to scan the whole table. Also, the cursor itself needs to be unique or you risk skipping or duplicating items if multiple rows share the same value for the ORDER BY column. Often, a composite key (e.g., created_at, id) is used as the cursor.
What most people don’t realize is that the "cursor" doesn’t have to be a single database ID. It can be a combination of values that uniquely identify the last row fetched, such as a timestamp and a primary key. This allows you to paginate correctly even if the ORDER BY column isn’t unique, by including additional fields in the WHERE clause to break ties. For example, if paginating by created_at and two users were created at the exact same millisecond, you’d use WHERE created_at > '2023-10-27 10:00:00.123' OR (created_at = '2023-10-27 10:00:00.123' AND id > last_id_from_that_timestamp).
The next problem you’ll likely encounter is handling deleted or updated records between page fetches, which can cause gaps or duplicates in your cursor-paginated results.