pytest TDD Workflow: Red-Green-Refactor in Python (2026)

The most surprising thing about pytest’s TDD workflow is that the "test" phase is actually where you write the least amount of code that actually works.

Let’s see this in action. Imagine we’re building a simple calculator that can add two numbers. We’ll start with a failing test, which is the "Red" in Red-Green-Refactor.

First, we need a test file. Let’s call it test_calculator.py.

# test_calculator.py
import pytest

def test_add_two_numbers():
    assert calculator.add(2, 3) == 5

If we run pytest in our terminal now, we’ll get an error.

=================================== ERRORS ===================================
___________ NameError: name 'calculator' is not defined ____________
    def test_add_two_numbers():
>       assert calculator.add(2, 3) == 5
E       NameError: name 'calculator' is not defined

This is our "Red" state. The test fails because the code it’s trying to test doesn’t exist yet. We haven’t even defined the calculator object or its add method. This is exactly what we want. We know the test should pass, but it can’t because the functionality isn’t there.

Now, we move to the "Green" phase. We write the absolute minimum amount of code to make the test pass.

Create a file named calculator.py:

# calculator.py
class Calculator:
    def add(self, a, b):
        pass # We'll fill this in next

And modify test_calculator.py to import our Calculator class:

# test_calculator.py
import pytest
from calculator import Calculator

def test_add_two_numbers():
    calc = Calculator()
    assert calc.add(2, 3) == 5

Run pytest again.

=================================== ERRORS ===================================
____________ AssertionError: assert None == 5 ____________
    def test_add_two_numbers():
        calc = Calculator()
>       assert calc.add(2, 3) == 5
E       AssertionError: assert None == 5

Still "Red"! The test now runs without a NameError, but it fails with an AssertionError. This is because our Calculator.add method currently does nothing (pass), so it implicitly returns None. This is still progress – we’ve moved from a non-existent dependency to a defined one that’s just not working correctly.

Let’s make it pass. Update calculator.py:

# calculator.py
class Calculator:
    def add(self, a, b):
        return a + b # The actual implementation

Run pytest one more time.

================================= 1 passed in 0.01s =================================

"Green"! The test passes. We’ve successfully gone from Red to Green. This is the core loop.

Now, for the "Refactor" phase. In this simple example, there’s not much to refactor. But imagine our add method had become more complex, perhaps handling different types or edge cases. We would use this phase to clean up the code, improve readability, remove duplication, or optimize performance, all while ensuring our tests still pass. The tests act as a safety net. If a refactor breaks something, a test will fail, and we’ll know immediately.

The mental model here is about building functionality incrementally and with confidence. You’re not trying to write perfect, comprehensive code from the start. Instead, you’re driven by the requirements expressed in your failing tests. Each test defines a small piece of desired behavior. You write the minimum code to satisfy that specific behavior, and then you have a working, tested piece of functionality.

The "Red-Green" cycle forces you to think about the interface and behavior of your code before you get bogged down in implementation details. You’re defining what your code should do before you decide how it will do it. This often leads to cleaner, more modular designs because you’re constantly considering how the code will be used (by the tests).

When you’re writing the "Green" part, the goal isn’t elegance or efficiency; it’s just to make the existing failing test pass. This might mean writing a very specific, almost "hardcoded" solution at first. For instance, if your test was assert add(2, 3) == 5, and you only had that one test, you could technically get away with return 5 in the add method. This is fine because the "Refactor" step is where you’d generalize if you had more tests or saw a pattern.

The real power of TDD emerges when you have multiple tests and a more complex system. You might write a test for add(5, 0), then another for add(-1, 1). Each time, you ensure the tests pass, and then you refactor to find a single implementation that satisfies all of them. This iterative process prevents large, unmanageable codebases and ensures that your tests accurately reflect the desired behavior at each step.

After successfully refactoring, you’d typically look for the next piece of functionality and write another failing test, starting the Red-Green-Refactor cycle anew.