Hypothesis can test your code with inputs you’d never dream of, often finding bugs in edge cases that traditional unit tests miss.
Let’s see Hypothesis in action with a simple function that reverses a string.
def reverse_string(s: str) -> str:
return s[::-1]
Now, we’ll write a test using Hypothesis. We don’t need to list specific inputs; Hypothesis will generate them for us.
from hypothesis import given
import hypothesis.strategies as st
@given(st.text())
def test_reverse_string_is_involutive(s):
assert reverse_string(reverse_string(s)) == s
When you run this test with pytest, Hypothesis will generate a variety of strings – empty strings, long strings, strings with special characters, strings with Unicode, etc. – and pass them to test_reverse_string_is_involutive. If it finds an input where reverse_string(reverse_string(s)) != s, it will report that specific failing input.
This "property-based" testing approach focuses on defining properties that should hold true for all valid inputs, rather than testing individual, hand-picked examples. The property here is that reversing a string twice should yield the original string.
Hypothesis uses "strategies" to generate data. st.text() is a strategy that generates strings. There are strategies for almost every Python type:
st.integers(): Generates integers. You can constrain them:st.integers(min_value=0, max_value=100).st.lists(elements=st.integers()): Generates lists of integers. You can also constrain list length:st.lists(st.integers(), min_size=1, max_size=10).st.dictionaries(keys=st.text(), values=st.integers()): Generates dictionaries.st.sampled_from([1, 2, 3]): Generates values by sampling from a given list.st.one_of(st.integers(), st.text()): Generates either an integer or a string.
Hypothesis’s power comes from its ability to explore the input space efficiently. When a test fails, it doesn’t just report a random failing input; it performs a process called "shrinking." Shrinking takes the failing input and tries to simplify it to the smallest possible input that still causes the failure. This makes debugging much easier. For example, if a test fails with a 1000-character string, Hypothesis might shrink it down to a 3-character string that reveals the core issue.
A common misconception is that Hypothesis is only for complex, mathematical algorithms. While it excels there, it’s incredibly useful for everyday code. Consider a function that parses a date string. Instead of testing "2023-10-27" and "10/27/2023", you can use st.dates() and assert that parsing and then formatting back to a string yields the original format (or a consistent intermediate representation).
The @given decorator is the entry point. It takes one or more strategies as arguments. If your test function accepts multiple arguments, each argument will be populated by a corresponding strategy.
@given(st.lists(st.integers()), st.integers())
def test_list_and_int(my_list, my_int):
# Hypothesis will generate lists of integers and separate integers
# and pass them to my_list and my_int respectively.
assert len(my_list) >= 0
assert isinstance(my_int, int)
When dealing with custom data structures or complex types, you can build your own strategies. For instance, if you have a User class, you could define a strategy that generates valid User objects.
from dataclasses import dataclass
@dataclass
class User:
name: str
age: int
user_strategy = st.builds(
User,
name=st.text(min_size=1),
age=st.integers(min_value=0, max_value=120)
)
@given(user=user_strategy)
def test_user_creation(user: User):
assert isinstance(user, User)
assert len(user.name) >= 1
assert 0 <= user.age <= 120
Hypothesis’s st.builds function is powerful for creating instances of classes or calling functions with generated arguments. It essentially tells Hypothesis how to construct an object that conforms to your defined structure.
The real magic of Hypothesis lies in its ability to explore the combinatorial explosion of possible inputs for more complex functions. For example, a function that takes a list of dictionaries, where each dictionary can have varying keys and values of different types, would be a nightmare to test exhaustively with hand-written examples. Hypothesis, with carefully crafted strategies, can navigate this space and uncover unexpected interactions.
The st.text() strategy, by default, generates a wide range of characters, including control characters, non-ASCII characters, and characters that might cause issues in string processing or encoding/decoding logic. If your code has specific constraints on acceptable characters (e.g., only alphanumeric), you’d refine the strategy: st.text(alphabet=st.characters(whitelist_categories=('Lu', 'Ll', 'Nd')), min_size=1).
The next step after mastering basic property-based testing is exploring how to integrate Hypothesis with frameworks like Django or Flask, or how to use it for performance testing by generating large datasets.