Revenue

21st Jun 2026

How testpath makes money

We sell the statistics and the CI gate, not the compute. testpath wraps the customer’s existing eval, runs it N times, and returns a noise-aware red/green — the product principles layer. The thing worth paying for is the verdict and the history behind it, not the LLM calls underneath. That single choice shapes the whole model.

Bring-your-own-key (the margin decision)

The tests run against an LLM, which costs tokens. We do not want to be in the middle of that spend:

BYO key — the customer runs evals on their own LLM account; testpath orchestrates, repeats, and does the stats. Our cost of goods is hosting + storing run metadata → software-grade gross margin (≈ 85–90%).
Proxy the tokens — we’d resell compute at a markup: real COGS, thin margin, and we’d be a token middleman in a commodity race. Hard no.

Reselling tokens turns a SaaS business into an arbitrage business. BYO key keeps us a software business, which is the only kind a two-person team should bootstrap.

The model: open-core + tiered SaaS

The methods are public (required reading, Anthropic & AISI), so we can’t moat the math — we open-source it and monetize the product around it.

Free — OSS, local. The CLI runner + stats library. Runs locally, BYO key, $0. This is distribution and credibility, not charity: dev tools win by getting into the workflow first. (promptfoo, LangSmith, Braintrust all run this play.)
Team — ≈ $500/mo (the anchor). The hosted layer: a regression gate wired into CI, run history, flaky-test tracking, dashboards, a handful of seats and agents. $500/mo is the ICP price — a team running a support agent in production that can pay without a procurement cycle.
Enterprise — custom (4–5 figures/mo). SSO/SAML, VPC or on-prem, SLA, support, unlimited agents. Later, and only when inbound pulls us there.

What we meter on

Not per-eval. Metering each run punishes the exact behavior we want (running tests many times) and breeds bill anxiety — and our cost-aware sequential testing already lowers their compute, which is a far better story than charging them for it.

Charge per agent / project (the natural unit of “a thing you regression-test”) plus seats, with generous run limits. It lands small (one agent) and expands as they add agents and teammates — without a new contract each time.

Price to the pain, not to the cost

A silent regression shipped to a production support agent is expensive: bad answers, escalations, churned customers, a frantic rollback. The teardown shows nobody can even detect it. Against that, $500/mo is a rounding error — so we price to the cost of the failure we prevent, not to our hosting bill.

A back-of-envelope

For a bootstrapped two-founder target:

\text{ARR} = N \times \text{ARPA}

$\text{ARR}$ — annual recurring revenue
$N$ — number of paying accounts
$\text{ARPA}$ — average revenue per account ≈ $6,000/yr (the $500/mo Team tier)

So $300k ARR ≈ 50 paying accounts — a comfortable two-person bootstrap with ≈ 85% gross margin and near-zero CAC (cold email is time, not ad spend). The lead list is already in the dozens; the funnel math is a small-N problem, not a growth-hacking one. The expansion lever on top: a builder partnership (referral or “powered by testpath” rev-share) reaches a whole customer base at once — higher leverage than 1:1 outreach, but only after direct sales prove the pitch.

Honest risks

No moat on the math. It’s published. The moat is product polish, CI stickiness, and accumulated regression history — once testpath is the gate on someone’s pipeline, the switching cost is their entire test record.
Open-core tension. OSS too generous → no reason to pay; too thin → no adoption. The paid line is the hosted gate + history + team features, never the core algorithm.
Slow land. Dev-tool adoption is gradual — but sticky once embedded. The bootstrap model survives a slow land; a venture burn rate would not.