SQLAlchemy’s ORM is a powerful tool for interacting with databases in Python, but it’s also a common source of performance issues, most notably the "N+1 select" problem.

Here’s a look at how to avoid common performance traps, focusing on the N+1 problem and other pitfalls.

The N+1 Select Problem

The N+1 select problem occurs when you fetch a list of parent objects and then, for each parent object, you execute a separate query to fetch its related child objects. If you have N parent objects, you end up with 1 query for the parents and N queries for the children, totaling N+1 queries.

Let’s illustrate with an example. Suppose we have Author and Book models, where an Author can have many Books.

from sqlalchemy import create_engine, Column, Integer, String, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Book(Base):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    title = Column(String)
    author_id = Column(Integer, ForeignKey('authors.id'))
    author = relationship("Author", back_populates="books")

class Author(Base):
    __tablename__ = 'authors'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    books = relationship("Book", back_populates="author")

# Setup (replace with your actual database URL)
engine = create_engine('sqlite:///:memory:')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()

# Populate some data
author1 = Author(name="Jane Austen")
author2 = Author(name="Charles Dickens")

book1 = Book(title="Pride and Prejudice", author=author1)
book2 = Book(title="Sense and Sensibility", author=author1)
book3 = Book(title="Oliver Twist", author=author2)

session.add_all([author1, author2, book1, book2, book3])
session.commit()

Now, let’s simulate the N+1 problem:

# Fetch all authors
authors = session.query(Author).all()

# For each author, fetch their books (THIS IS THE N+1 PART)
for author in authors:
    print(f"Author: {author.name}, Books: {[book.title for book in author.books]}")

When you run this, SQLAlchemy will execute one query to fetch all authors, and then for each author, it will execute a separate query to fetch their books. If you had 100 authors, that’s 101 queries!

Solutions to the N+1 Problem

The primary way to combat N+1 selects is by using eager loading. SQLAlchemy provides several strategies for this.

1. joinedload

joinedload tells SQLAlchemy to fetch the related objects in the same query using a JOIN. This is often the most straightforward and efficient method.

# Using joinedload
authors = session.query(Author).options(joinedload(Author.books)).all()

for author in authors:
    print(f"Author: {author.name}, Books: {[book.title for book in author.books]}")

When you execute this, you’ll see only one SQL query in your logs (or when inspecting session.query(Author).options(joinedload(Author.books)).statement). This query will join the authors and books tables and retrieve all necessary data in a single round trip to the database.

Why it works: joinedload instructs SQLAlchemy to perform a LEFT OUTER JOIN between the parent and child tables and select all columns for both. The ORM then reconstructs the object graph in memory.

2. selectinload

selectinload is another eager loading strategy. Instead of joining, it executes a separate, optimized query for the related collections after the initial query for the parent objects has completed. It uses an IN clause to fetch all related items for the collection in one go.

# Using selectinload
authors = session.query(Author).options(selectinload(Author.books)).all()

for author in authors:
    print(f"Author: {author.name}, Books: {[book.title for book in author.books]}")

This will result in two queries: one to fetch all authors, and a second query like SELECT books.id, books.title, books.author_id FROM books WHERE books.author_id IN (1, 2, ...) where (1, 2, ...) are the IDs of the authors fetched in the first query.

Why it works: selectinload is often more efficient than joinedload when dealing with large collections of related items because it avoids fetching a massive result set from a complex join. It fetches parent IDs, then fetches all children related to those IDs in a single, targeted query.

3. subqueryload

subqueryload is similar to selectinload in that it uses a separate query for the related collection, but it uses a subquery to select the primary keys of the parent objects.

# Using subqueryload
authors = session.query(Author).options(subqueryload(Author.books)).all()

for author in authors:
    print(f"Author: {author.name}, Books: {[book.title for book in author.books]}")

This will also result in two queries, but the second query will use a subquery to select the author IDs. It’s generally less common to use subqueryload over selectinload unless you have specific reasons related to database performance or complex loading scenarios.

Why it works: subqueryload uses a subquery to identify the primary keys of the parent objects, then performs a second query to fetch the related children using those keys. This can be beneficial in scenarios where joinedload might lead to duplicate parent rows.

Other Performance Traps

1. Fetching Too Much Data

Even with eager loading, you can still face performance issues if you fetch more data than you need.

# Fetching all columns when only names are needed
authors = session.query(Author).options(joinedload(Author.books)).all()
for author in authors:
    print(author.name) # We only needed the author's name, but we loaded all books too.

Solution: Use with_entities or with_polymorphic to select only the columns you require.

from sqlalchemy import select

# Fetching only author names
author_names = session.query(Author.name).all()
for name in author_names:
    print(name[0]) # Note: query returns tuples

# Fetching author names and book titles with joinedload (more complex, but possible)
# This can be tricky to get right with select statements. Often, it's better
# to load full objects and then project them if needed.

A more common approach for projection while still leveraging ORM features is to load the objects and then project in Python, or to use aliased entities if you need to join and project.

2. Too Many Small Queries (The Opposite of N+1)

While N+1 is about too many queries, sometimes you might construct a query that is technically one SQL statement but incredibly inefficient due to its complexity or the sheer amount of data it tries to process.

For example, a query with many OR conditions, or a query that attempts to load vast, deeply nested relationships without proper filtering.

Solution: Analyze your queries using profiling tools (like SQLAlchemy’s built-in logging or external tools) and break down complex operations into simpler, more targeted queries. Batching operations can also help.

3. Session Management and Large Transactions

Keeping a session open for too long or accumulating too many objects within a single session can lead to excessive memory usage and performance degradation.

Solution: Commit transactions frequently and, if necessary, clear the session.

# Example of clearing session
session.flush() # Ensure pending changes are written to DB
session.expunge_all() # Remove all objects from the session
# Now you can start a new "transaction" or clear memory

The most surprising true thing about SQLAlchemy’s ORM is that its default behavior is often lazy loading for relationships, which is great for initial object instantiation speed but a direct trap for N+1 queries when you later access those relationships.

Here’s a glimpse of how joinedload works behind the scenes, simplified. When you execute session.query(Author).options(joinedload(Author.books)).all(), SQLAlchemy generates a SQL query that looks something like this:

SELECT
    authors.id AS authors_id,
    authors.name AS authors_name,
    books.id AS books_id,
    books.title AS books_title,
    books.author_id AS books_author_id
FROM
    authors
LEFT OUTER JOIN
    books ON authors.id = books.author_id;

SQLAlchemy then processes this result set, recognizing that rows with the same authors_id belong to the same Author object and that the books columns in those rows correspond to the books collection of that Author. It reconstructs the object graph in memory from this single, denormalized result set.

The next concept you’ll likely encounter is how to handle more complex relationship patterns, such as many-to-many relationships and polymorphic loading, and how to optimize queries involving them.

Want structured learning?

Take the full Python course →