How to Fix Flaky Tests in GitHub Actions
Six patterns that cause 90% of test flakiness — and how to fix each one with concrete code changes.
You know the drill: CI goes red, you check the logs, the failure looks unrelated to your changes. You hit re-run. It passes. You merge. And the cycle repeats tomorrow.
This guide covers the six most common patterns behind flaky tests in GitHub Actions and gives you concrete fixes for each. Not theories — actual code changes and configuration updates you can apply today.
Before you start fixing
The first step is knowing which tests are flaky and how often they fail. If you’re guessing based on Slack complaints, you’re working blind. Kleore analyzes your CI history and ranks every flaky test by failure rate and cost — so you fix the worst ones first.
1. Timing & race conditions
Symptom: Test passes locally, fails intermittently in CI. Often involves UI tests, async operations, or anything that waits for a condition to become true.
Root cause: GitHub Actions runners have variable performance. A 2-core runner under load is slower than your M3 MacBook. Tests that assume operations complete within a specific window break when the runner is under pressure.
The fix: Replace fixed waits with condition-based polling.
// Bad: assumes the element appears within 500ms
await new Promise(r => setTimeout(r, 500));
expect(screen.getByText("Success")).toBeInTheDocument();// Good: waits for the condition, not the clock
await waitFor(() => {
expect(screen.getByText("Success")).toBeInTheDocument();
}, { timeout: 5000 });For E2E tests with Playwright or Cypress, use their built-in auto-waiting mechanisms instead of explicit sleeps. For backend tests, poll with exponential backoff rather than sleeping.
2. Shared mutable state
Symptom: Test passes in isolation (it.only) but fails when run with the full suite. Or it fails only when a specific other test runs before it.
Root cause: Tests share a database, in-memory store, filesystem, or global variable. Test A writes data that Test B doesn’t expect, or Test A forgets to clean up.
The fix: Isolate test state completely.
// Run each test in a transaction that rolls back
beforeEach(async () => {
await db.query("BEGIN");
});
afterEach(async () => {
await db.query("ROLLBACK");
});// Instead of hardcoding IDs that collide:
const userId = `test-user-${crypto.randomUUID()}`;
await createUser({ id: userId, name: "Test" });If you’re using a shared test database, consider running each test file in its own database schema or using containers. The small overhead is worth the determinism.
3. External service dependencies
Symptom: Tests fail with network timeouts, 503 errors, or rate-limit responses. Usually happens in bursts (when the external service has issues).
Root cause: Your tests make real HTTP calls to APIs you don’t control — payment gateways, auth providers, third-party data services.
The fix: Mock at the HTTP boundary, not the function level.
import { http, HttpResponse } from "msw";
import { setupServer } from "msw/node";
const server = setupServer(
http.post("https://api.stripe.com/v1/charges", () => {
return HttpResponse.json({
id: "ch_test_123",
status: "succeeded",
amount: 2000,
});
})
);
beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());Use MSW or similar tools to intercept HTTP at the network level. This tests your actual HTTP client code (headers, serialization, error handling) while eliminating network flakiness. Reserve real API calls for a small set of integration tests that run separately.
4. Environment differences
Symptom: Tests pass on macOS, fail on Linux. Or pass with Node 20, fail with Node 22. Or pass Monday through Friday, fail on weekends.
Root cause: Assumptions baked into tests about the OS, timezone, locale, filesystem behavior, or available system resources.
The fix: Pin your CI environment explicitly.
jobs:
test:
runs-on: ubuntu-latest
env:
TZ: UTC
LC_ALL: C.UTF-8
NODE_ENV: test
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version-file: ".node-version"
- run: npm ci
- run: npm testKey practices: always set TZ=UTC, use a .node-version file instead of hardcoding versions, and test with the same OS as production. If your tests compare file paths, normalize separators.
5. Port & resource conflicts
Symptom: EADDRINUSE errors, database connection failures, or file lock errors. Happens especially when tests run in parallel.
Root cause: Multiple test processes or test files trying to bind to the same port, open the same file, or connect to the same database concurrently.
The fix: Use dynamic port allocation.
// Instead of: server.listen(3000)
// Use port 0 to let the OS assign an available port
const server = app.listen(0);
const port = (server.address() as AddressInfo).port;
// Pass port to your test client
const client = createTestClient(`http://localhost:${port}`);For database tests, use unique database names per test worker or use Docker containers. For file-based tests, use os.tmpdir() with random suffixes.
6. Test order dependency
Symptom: Tests pass when run in the default order, fail when randomized or when a specific test file is skipped.
Root cause: Test A sets up state that Test B implicitly depends on. When A doesn’t run first, B fails.
The fix: Make every test self-contained.
describe("checkout flow", () => {
// Each test creates its own state from scratch
it("applies discount code", async () => {
// Setup: create the user, cart, and product for this test
const user = await createTestUser();
const product = await createTestProduct({ price: 100 });
const cart = await createCart(user.id, [product.id]);
// Act
const result = await applyDiscount(cart.id, "SAVE20");
// Assert
expect(result.total).toBe(80);
});
});Enable test randomization to catch these issues early. Jest supports --randomize, and Vitest can be configured with sequence.shuffle: true. If your tests slow down from redundant setup, invest in fast factory functions — not shared state.
The meta-fix: Retry as a bandaid, not a cure
GitHub Actions supports automatic retry via actions/retry or workflow re-run. Many teams add retry logic as a first response:
- uses: nick-fields/retry@v3
with:
max_attempts: 3
timeout_minutes: 10
command: npm testThis is fine as a short-term bandaid while you fix the root cause. But retrying hides the problem. A test that fails 30% of the time and gets retried 3 times will appear to pass 99.7% of the time — while still costing you 3x the CI minutes and masking the underlying issue.
Retry to unblock your team today. Fix the root cause this sprint.
How to prioritize which tests to fix first
Not all flaky tests are equal. A test that flakes once a month is annoying. A test that flakes daily on your critical path is an emergency. Prioritize by:
- Failure frequency — How often does it flake? Daily flakes first.
- Blast radius — Does it block all PRs, or just one workflow?
- Cost per failure — Long test suites cost more per re-run.
- Fix complexity — Can you fix it in an hour, or does it need a refactor?
Let Kleore do the prioritization for you.
Kleore analyzes your GitHub Actions history and ranks every flaky test by failure rate, cost, and impact. You get a prioritized list with dollar amounts — so you know exactly where to start.
Scan my repos — freeFurther reading
- What Are Flaky Tests? — The fundamentals of test flakiness.
- How Much Do Flaky Tests Actually Cost? — The dollar math to justify the fix.
- Flaky Test Cost Calculator — Plug in your team’s numbers and see the impact.