How to Find and Fix Flaky Tests in pytest
Database state, network calls, import side effects — the most common causes of Python test flakiness and how to eliminate each one.
pytest is the gold standard for Python testing. Its fixture system, plugin ecosystem, and clean syntax make it a joy to write tests with. But those same powerful features — especially fixtures with broad scopes and plugin interactions — can introduce subtle flakiness that only shows up in CI.
This guide covers the most common patterns behind flaky pytest tests and gives you concrete fixes with real code. Whether you’re dealing with database state leaks, time-dependent assertions, or mysterious import side effects, you’ll find the solution here.
Want to skip the guesswork?
Instead of hunting through CI logs manually, Kleore analyzes your CI history and ranks every flaky test by failure rate and cost — so you fix the worst ones first.
Why pytest tests become flaky
Python’s dynamic nature and pytest’s powerful fixture system create unique flakiness vectors that don’t exist in more constrained testing frameworks. Here are the five most common root causes:
- Database state leaking between tests — Tests share a database and don’t properly isolate transactions. Test A creates a record, Test B doesn’t expect it to exist.
- File system conflicts — Tests write to the same files or directories. Parallel execution causes race conditions on file reads/writes.
- Network calls to real services — Tests make HTTP requests to external APIs that are slow, rate-limited, or occasionally down.
- Import side effects — Python modules that execute code at import time (database connections, config loading, signal handlers) create hidden coupling between tests.
- Test ordering dependencies — Test B only passes when Test A runs first because A sets up state that B implicitly relies on.
How to identify flaky pytest tests
pytest’s plugin ecosystem includes several tools specifically designed to flush out non-deterministic tests.
pytest-randomly: Shuffle test order
The most effective way to find tests with hidden ordering dependencies. pytest-randomly shuffles the order of test modules, classes, and functions on every run. When a test fails under randomization, you’ve found a flake.
pip install pytest-randomly
# Run with randomized order (enabled by default after install)
pytest
# Reproduce a specific failure with the same seed
pytest -p randomly --randomly-seed=12345
# Disable randomization temporarily
pytest -p no:randomlypytest-repeat: Stress-test suspected flakes
Run a specific test many times to confirm it’s non-deterministic.
pip install pytest-repeat
# Run a test 100 times — if it fails once, it's flaky
pytest --count=100 tests/test_checkout.py::test_apply_discount
# Stop on first failure
pytest --count=100 -x tests/test_checkout.py::test_apply_discountpytest-rerunfailures: Detect and retry
This plugin automatically reruns failed tests. Tests that pass on rerun are flaky by definition. Use this for detection, not as a permanent solution.
pip install pytest-rerunfailures
# Rerun failed tests up to 3 times
pytest --reruns 3
# Add a delay between reruns (useful for timing-dependent flakes)
pytest --reruns 3 --reruns-delay 2
# Mark specific tests as expected to flake
@pytest.mark.flaky(reruns=3, reruns_delay=1)
def test_webhook_delivery():
...Common patterns and fixes
Pattern 1: Database state leaking between tests
Symptom: Tests pass individually but fail when run together. Failures involve unexpected records in the database or unique constraint violations.
Root cause: Tests create database records that persist across test boundaries. One test’s setup data becomes another test’s pollution.
# conftest.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
@pytest.fixture(autouse=True)
def db_session(request):
"""Wrap every test in a transaction that rolls back."""
engine = create_engine(TEST_DATABASE_URL)
connection = engine.connect()
transaction = connection.begin()
session = sessionmaker(bind=connection)()
yield session
session.close()
transaction.rollback()
connection.close()
# For Django projects, use the built-in support:
@pytest.fixture(autouse=True)
def enable_db_access(db):
"""Django's db fixture already handles transaction rollback."""
pass
# Or in pytest.ini / pyproject.toml:
# [tool.pytest.ini_options]
# django_db_cleanup = "transaction"The autouse=True parameter ensures every test gets isolation automatically, without needing to request the fixture explicitly. This prevents new tests from accidentally skipping isolation.
Pattern 2: Time-dependent tests
Symptom: Tests that check expiration, scheduling, or duration fail at certain times of day or run slower in CI than expected.
Root cause: Tests use datetime.now() or time.time() directly, and their assertions depend on the current time.
pip install freezegunfrom freezegun import freeze_time
from datetime import datetime, timedelta
from myapp.auth import create_token, is_token_expired
@freeze_time("2025-06-15 12:00:00")
def test_token_expiration():
token = create_token(expires_in=timedelta(hours=1))
# Still within the hour — not expired
assert not is_token_expired(token)
@freeze_time("2025-06-15 14:00:00")
def test_token_is_expired():
# Create a token that expired an hour ago
with freeze_time("2025-06-15 12:00:00"):
token = create_token(expires_in=timedelta(hours=1))
# Now it's 2pm — token expired at 1pm
assert is_token_expired(token)
# As a fixture for broader use:
@pytest.fixture
def frozen_time():
with freeze_time("2025-01-01 00:00:00") as frozen:
yield frozenPattern 3: Network calls to real services
Symptom: Tests fail with ConnectionError, Timeout, or 429 Too Many Requests. Failures happen in bursts when the external service has issues.
Root cause: Tests make real HTTP requests to APIs you don’t control.
pip install responsesimport responses
import requests
from myapp.payment import charge_customer
@responses.activate
def test_successful_charge():
responses.add(
responses.POST,
"https://api.stripe.com/v1/charges",
json={"id": "ch_test_123", "status": "succeeded"},
status=200,
)
result = charge_customer(amount=2000, token="tok_visa")
assert result.status == "succeeded"
@responses.activate
def test_payment_gateway_timeout():
responses.add(
responses.POST,
"https://api.stripe.com/v1/charges",
body=requests.exceptions.Timeout(),
)
with pytest.raises(PaymentError, match="timeout"):
charge_customer(amount=2000, token="tok_visa")
# For httpx (async):
# pip install httpx-mock
import pytest
from httpx import AsyncClient
@pytest.fixture
def mock_httpx(httpx_mock):
httpx_mock.add_response(
url="https://api.example.com/data",
json={"results": []},
)
return httpx_mockPattern 4: File system conflicts
Symptom: Tests fail with FileNotFoundError, PermissionError, or produce corrupted output. Especially common with parallel test execution via pytest-xdist.
Root cause: Multiple tests read/write the same file paths concurrently.
def test_export_csv(tmp_path):
"""tmp_path gives each test a unique temporary directory."""
output_file = tmp_path / "export.csv"
export_data(output_path=output_file)
content = output_file.read_text()
assert "header1,header2" in content
assert len(content.splitlines()) == 101 # header + 100 rows
# tmp_path is automatically cleaned up after the test
def test_config_loading(tmp_path):
"""Create isolated config files per test."""
config_file = tmp_path / "config.yaml"
config_file.write_text("""
database:
host: localhost
port: 5432
""")
config = load_config(str(config_file))
assert config["database"]["host"] == "localhost"
# For fixtures that need a persistent temp directory across a test class:
@pytest.fixture(scope="class")
def shared_tmp(tmp_path_factory):
return tmp_path_factory.mktemp("shared")Pattern 5: Import side effects
Symptom: Tests fail with errors about database connections already being open, signal handlers being registered twice, or global config having unexpected values.
Root cause: Python modules execute code at import time. If a module opens a database connection, registers a signal handler, or modifies global state when imported, that side effect persists for the entire test session.
# If the module connects to a database on import:
# myapp/db.py
# connection = psycopg2.connect(DATABASE_URL) # Runs at import time!
# Option 1: Mock before import
import sys
from unittest.mock import MagicMock
# Prevent the real module from connecting
sys.modules["psycopg2"] = MagicMock()
from myapp.db import get_users # Now uses mocked connection
# Option 2: Use importlib for fresh imports
import importlib
def test_with_fresh_module():
import myapp.db
importlib.reload(myapp.db) # Re-executes module code
# ... test with fresh state
# Option 3 (best): Refactor to lazy initialization
# myapp/db.py
_connection = None
def get_connection():
global _connection
if _connection is None:
_connection = psycopg2.connect(DATABASE_URL)
return _connectionQuarantining flaky pytest tests
The pytest-quarantine plugin lets you mark tests as known-flaky so they don’t block your CI pipeline while you work on fixes.
pip install pytest-quarantine
# Generate a quarantine list from your last test run
pytest --quarantine-save quarantine.txt
# Run tests, treating quarantined tests as expected failures
pytest --quarantine quarantine.txtFor a more automated approach, Kleore detects flaky tests automatically from your CI history — no manual tagging needed. It tracks every test that has passed and failed on the same commit, ranks them by impact, and gives you a prioritized fix list with cost estimates.
CI configuration tips for pytest
Beyond fixing individual tests, your CI configuration can reduce flakiness across the board.
[tool.pytest.ini_options]
# Randomize test order to catch hidden dependencies
addopts = "-p randomly --randomly-seed=last"
# Strict markers — prevent typos in marker names
markers = [
"slow: marks tests as slow (deselect with '-m "not slow"')",
"integration: marks integration tests",
"flaky: marks known flaky tests",
]
strict_markers = true
# Timeout per test (requires pytest-timeout)
timeout = 30
# Fail on warnings to catch deprecation issues early
filterwarnings = [
"error",
"ignore::DeprecationWarning:third_party_lib.*",
]jobs:
test:
runs-on: ubuntu-latest
env:
PYTHONDONTWRITEBYTECODE: "1"
PYTHONHASHSEED: "0"
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version-file: ".python-version"
cache: "pip"
- run: pip install -r requirements-test.txt
- run: pytest -x --tb=short -q
# -x: stop on first failure
# --tb=short: concise tracebacks
# -q: quiet output
# For parallel execution with pytest-xdist:
test-parallel:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version-file: ".python-version"
cache: "pip"
- run: pip install -r requirements-test.txt
- run: pytest --forked -n auto
# --forked: each test in its own subprocess
# -n auto: use all available CPUsSetting PYTHONDONTWRITEBYTECODE=1 prevents .pyc file conflicts in parallel runs. PYTHONHASHSEED=0 makes dictionary ordering deterministic, eliminating a whole class of order-dependent flakes.
Stop guessing which pytest tests are flaky.
Kleore scans your GitHub Actions history and gives you a ranked list of every flaky test — with failure rates, cost estimates, and fix priority. Free to start.
Further reading
- How to Find and Fix Flaky Tests in Jest — The JavaScript equivalent of this guide.
- How to Fix Flaky Tests in GitHub Actions — Framework-agnostic patterns for CI-level flakiness.
- How Much Do Flaky Tests Actually Cost? — The dollar math to justify the fix.
- Flaky Test Cost Calculator — Plug in your team’s numbers and see the impact.