Mutation testing reveals how well your tests catch bugs by introducing small, deliberate "mutations" (code changes) and seeing if your tests fail.

Let’s see it in action. Imagine a simple Python function and a test for it.

Code:

# my_math.py
def add(a, b):
    return a + b

Test:

# test_my_math.py
from my_math import add

def test_add_positive_numbers():
    assert add(2, 3) == 5

Now, let’s introduce a mutation. A mutation testing tool might change return a + b to return a - b. If our test test_add_positive_numbers still passes, it means our test suite is not effective at catching this specific type of error. The mutation survived.

The Problem Mutation Testing Solves:

You write tests to ensure your code is correct and stays correct. But how do you know your tests are good? Are they actually catching the kinds of bugs that could slip into your codebase? You might have 100% test coverage, but a bug could still be lurking, undetected by your existing tests. Mutation testing directly addresses this by asking: "If a small, common bug were introduced, would my tests catch it?"

How it Works Internally:

Mutation testing tools, like mutpy or mutator, work in several stages:

  1. Identify Mutation Points: They scan your source code for valid places to introduce mutations. This typically involves operators like +, -, *, /, ==, !=, >, <, and, or, not, etc.
  2. Generate Mutants: For each mutation point, the tool creates a "mutant" program. This is a version of your original code with a single, small change. For example, changing a + b to a - b creates one mutant. Changing a > b to a < b creates another.
  3. Run Test Suite: The tool then executes your entire test suite against each mutant program.
  4. Analyze Results:
    • If a test fails when run against a mutant, that mutant is considered "killed." This means your test suite successfully detected the artificial bug.
    • If all tests pass when run against a mutant, that mutant "survived." This indicates a weakness in your test suite – it failed to catch the introduced error.

The Goal:

The ultimate goal is to kill as many mutants as possible. A high "mutation score" (e.g., 80-90%) suggests your tests are robust and likely to catch real-world bugs. A low score indicates that your tests are missing certain types of errors, and you should write new tests or modify existing ones to cover those scenarios.

Controlling the Levers:

When using a mutation testing tool, you’ll typically configure:

  • Target Files/Modules: Which parts of your codebase to mutate.
  • Exclusion Patterns: Which files or specific lines to not mutate (e.g., boilerplate, auto-generated code).
  • Mutation Operators: Which types of changes to make (e.g., only arithmetic, only logical operators).
  • Test Command: How to run your test suite (e.g., pytest, nose2).
  • Output Format: How to present the results (e.g., JSON, HTML report, console summary).

Example with mutpy:

First, install mutpy:

pip install mutpy pytest

Create your my_math.py and test_my_math.py as above.

Now, run mutpy:

mutpy run --test-command "pytest" my_math.py

mutpy will generate mutants of my_math.py and run pytest against each one.

Output Snippet (Illustrative):

...
Mutant 1: my_math.py:2: return a - b # Survived
Mutant 2: my_math.py:2: return a * b # Survived
Mutant 3: my_math.py:2: return a / b # Survived
...
Mutation score: 0.00%

In this simple example, our single test test_add_positive_numbers survives all mutations because it only checks 2 + 3 == 5. It doesn’t test edge cases or different operators.

To improve the score, we’d add more tests:

# test_my_math.py
from my_math import add

def test_add_positive_numbers():
    assert add(2, 3) == 5

def test_add_negative_numbers():
    assert add(-2, -3) == -5

def test_add_mixed_numbers():
    assert add(5, -3) == 2

Rerunning mutpy with these new tests would likely kill more mutants.

The one thing that often surprises people is how many mutants survive even with seemingly comprehensive test suites. It’s not uncommon for a "good" test suite to have a mutation score below 50% initially. This isn’t a sign of failure; it’s a direct roadmap to improvement, highlighting exactly where your tests are weak. For instance, if you have a test for if x > 0: but no test for if x < 0:, a mutation changing > to < will likely survive.

The next step after improving your mutation score is understanding how to integrate mutation testing into your CI/CD pipeline.

Want structured learning?

Take the full Pytest course →