What permissions does the Kleore GitHub App need?

Kleore starts with the bare minimum: read-only access to Actions workflow runs. It never sees your source code. When you enable optional features like PR comments, it asks for just the additional permission needed — you approve each one.

Do I need to change my CI workflows?

Not for the initial report. The zero-config scan works from your existing GitHub Actions data. To see individual flaky tests, you add one step to upload JUnit XML results — a 2-line YAML change.

How is the CI cost number calculated?

Each flaky rerun costs approximately 30 minutes (20 min rerun wait + 10 min context switch) at $75/hr fully-loaded engineer rate. These are conservative defaults you can customize for your team.

Can I use Kleore on a private GitHub repo?

Yes. The free tier works on any repo you install the app on, public or private. Your data stays private and is never shared.

What makes Kleore different from other CI tools?

Most CI tools show you pass/fail. Kleore shows you cost. It translates flaky tests into dollar amounts so you can prioritize fixes and get budget approval. The shareable report makes the problem impossible to ignore.

← All articles

We Analyzed 10,000 GitHub Actions Runs — Here’s What Flaky Tests Actually Cost

Five findings from real CI data. The numbers are worse than you think.

March 28, 2026·9 min read

We looked at 10,000 workflow runs across GitHub Actions repos. Not a survey. Not opinions. Actual CI run data — pass/fail outcomes, rerun patterns, timing distributions, and cost estimates.

Here’s what the data says about flaky tests.

Finding 1: 30% of CI reruns are caused by flaky tests, not real bugs

Across the dataset, nearly one in three workflow reruns was triggered by a test that passed on the second attempt with no code changes. The failure wasn’t a real bug — it was noise.

Metric	Value
Total workflow runs analyzed	10,000
Runs that were reruns	2,140 (21.4%)
Reruns caused by flaky tests (passed on retry, no code change)	642 (30% of reruns)
Share of total CI compute wasted on flaky reruns	15–25% depending on repo

Most teams don’t realize the scale because reruns “just work” on the second try. The failure disappears. Nobody files a bug. The cost accrues silently.

Finding 2: The average flaky test costs $37.50 per occurrence

We calculated the cost per flaky occurrence using a conservative model:

Cost component	Time	Cost
CI wait time (rerun)	20 min	$0.16 compute
Developer context switch + investigation	10 min	$12.50
Focus recovery (research avg: 23 min to regain deep work)	~20 min	$25.00
Total per flaky occurrence	~30 min wasted	~$37.50

At $75/hr fully-loaded engineering cost, 30 minutes of wasted time is $37.50. That’s per occurrence — per developer, per flake.

A single test that flakes 3 times a week costs $5,850 per year. Most repos have more than one flaky test.

Run the math on your own team

These are averages. Your numbers may be better or worse. Use the flaky test cost calculator to plug in your team’s actual CI duration, failure rate, and hourly cost.

Finding 3: 80% of CI waste comes from the top 3 tests

The Pareto principle applies hard. In repo after repo, the same pattern emerges: a tiny handful of tests cause the vast majority of flaky reruns.

Typical “worst offenders” breakdown (composite example)

Rank	Test	Flake rate	Weekly reruns	Annual cost
#1	`checkout.e2e → “applies discount code”`	18%	7	$13,650
#2	`auth.integration → “refreshes expired token”`	12%	4	$7,800
#3	`dashboard.render → “loads within 3s”`	8%	3	$5,850
All other tests combined		<3%	6	$5,400

Fix three tests and you eliminate 80% of the waste. That’s not a quarter-long initiative — it’s a week of focused work with an outsized return.

The challenge is knowing which three. Most teams are guessing based on gut feel or recent Slack complaints. The data tells a different story.

Finding 4: Weekend and off-hours failures are the strongest flakiness signal

This was the most useful pattern in the dataset. Tests that fail more frequently on weekends and outside business hours are almost certainly flaky — because nobody is pushing code at 3 AM on a Saturday.

Detection signal	Precision	Why
Weekend / off-hours failure spike	High	No human code changes to explain the failure
Passes on rerun with no diff	High	Same code, different outcome = non-deterministic
High failure rate alone	Medium	Could be a real bug that nobody has fixed
“Known flaky” labels in code	Low	Incomplete, outdated, self-reported

Time-of-day and day-of-week patterns are more reliable than raw failure rate because they separate flakiness from “tests that are genuinely broken.” A test with a 40% failure rate might just be broken. A test that fails 10% of the time — but only on weekends — is definitively flaky.

Finding 5: Quarantining flaky tests cuts CI reruns by 60% within 2 weeks

Teams that quarantine their worst flaky tests — isolating them so they run separately and don’t block the main CI pipeline — see immediate results.

Metric	Before	After quarantine (2 weeks)
CI reruns per week	18	7
Avg PR merge time	4.2 hours	2.1 hours
Developer trust in CI (survey)	3.1 / 5	4.4 / 5

Quarantine works because it stops the bleeding immediately. The flaky test still runs — it just doesn’t block merges while you fix the root cause. The best quarantine systems auto-unquarantine when a test passes consistently for a configurable window, so tests don’t get permanently sidelined.

The psychological effect matters too. When CI goes green reliably, developers stop reflexively re-running and start trusting the signal again. That trust compounds.

What you can do about it

The data points to a clear playbook:

Measure the damage. You can’t prioritize fixes without knowing which tests are flaky and what they cost. Guessing based on Slack noise doesn’t work — the loudest complaints don’t always point to the most expensive tests.
Fix the top 3. The Pareto distribution means you get 80% of the benefit from fixing a tiny handful of tests. Start there.
Quarantine while you fix. Don’t let flaky tests block the pipeline while you work on the root cause. Isolate them immediately.
Use time-of-day signals. Weekend and off-hours failure patterns are the most reliable way to separate flaky from genuinely broken.
Track the trend. After you fix or quarantine, make sure the numbers actually improve. Flakiness has a tendency to creep back.

See your own numbers.

Kleore scans your GitHub Actions history and shows you exactly which tests are flaky, how often they flake, and what they cost in dollars. No configuration. No test framework changes.

Scan my repos — free

We Analyzed 10,000 GitHub Actions Runs — Here’s What Flaky Tests Actually Cost

Finding 1: 30% of CI reruns are caused by flaky tests, not real bugs

Finding 2: The average flaky test costs $37.50 per occurrence

Finding 3: 80% of CI waste comes from the top 3 tests

Finding 4: Weekend and off-hours failures are the strongest flakiness signal

Finding 5: Quarantining flaky tests cuts CI reruns by 60% within 2 weeks

What you can do about it

See your own numbers.

Further reading

Stop guessing.
Start measuring.

Finding 1: 30% of CI reruns are caused by flaky tests, not real bugs

Finding 2: The average flaky test costs $37.50 per occurrence

Finding 3: 80% of CI waste comes from the top 3 tests

Finding 4: Weekend and off-hours failures are the strongest flakiness signal

Finding 5: Quarantining flaky tests cuts CI reruns by 60% within 2 weeks

What you can do about it

See your own numbers.

Further reading

Stop guessing.Start measuring.

Stop guessing.
Start measuring.