Definition of done

A 27/27 docs score does not mean a bot is done. v3 splits readiness into three independent scores; v3.5 adds a fourth. A bot is done only when all four are.

The forbidden shortcut: scoring documentation completeness as readiness. v2 made this mistake. The v3 status board shows docs/impl/runtime separately so it cannot recur.

The four scores

Score	Out of	What it means	What it does NOT mean
Docs	27	The spec is complete: purpose, inputs, parameters, defaults, thresholds, failure modes, wire examples, runbook, promotion gates — all 27 sections.	The bot is implemented or running.
Impl	15	The bot compiles against `@polytraders/contracts`, validates fixtures (normal/warning/hard/failure), emits ReportEnvelopes with reason codes from the registry, supports its declared modes (including quarantine where applicable), passes against the mock CLOB v2 adapter, and is green in CI.	The bot has been observed under live conditions.
Runtime	8	The bot has been running in shadow mode with clean telemetry, error rate under budget, latency within budget, reject rate calibrated, runbook battle-tested by an on-call rotation, dashboard live, alerts wired, and has been promoted through every gate.	That nothing will ever go wrong — only that we have evidence the bot behaves as documented.
Backtest	4	The bot runs end-to-end against the synthetic feed: a synthetic fixture exists, a recorded ReportEnvelope timeline is checked in, the parameter search space is declared, and at least one optimizer reference run is on file. See the synthetic demo and the optimizer.	That backtest results predict live PnL. They do not. Synthetic data only.

Impl checklist (15)

TypeScript compiles against the contracts package.
Implements the Bot<I,O> interface from @polytraders/contracts.
Config validates against the published BotConfig schema.
Default config falls within every declared hard threshold.
Fixture: normal-conditions input passes.
Fixture: warning-conditions input emits the warning reason code.
Fixture: hard-threshold input fires the safe fallback.
Fixture: failure-injection input does not fail open.
Emits ReportEnvelopes on every decision path.
Every reason code emitted is in the registry.
Authority enforced — can only do what the class allows.
Every declared mode_support entry is reachable at runtime.
Mock CLOB v2 adapter passes the bot's full fixture pack.
Standard metrics (decisions_total, decision_latency_ms, heartbeat_age_ms, errors_total, config_version, mode) are emitted.
CI is green — lint, type-check, test, schema validation, contract verification.

Backtest checklist (4)

Synthetic fixture exists in packages/polytraders-bots/fixtures/<bot>/ and is loaded by tests/verify.js.
Recorded ReportEnvelope timeline (deterministic by seed) is checked in alongside the fixture.
The bot exports a SEARCH_SPACE object describing every tunable parameter, range, and step.
At least one optimizer reference run (random or LLM-driven) is on file in packages/polytraders-bots/runs/, with the winning params and the score function it was tuned against.

Honesty rule: Backtest score does not predict live performance. It measures whether the bot is reproducibly tunable against a known feed. Anything stronger than that is a forbidden claim.

Runtime checklist (8)

Shadow telemetry clean for at least 7 days.
Rolling error rate under budget.
p95 latency inside the declared latency budget.
Reject rate within the calibrated band.
Runbook battle-tested by the on-call rotation against a real (or replayed) incident.
Dashboard live in the team Grafana / equivalent.
Alerts wired to the correct severity tier.
Promoted through every gate: stub → shadow → advisory → enforced.

What we do not count

Lines of code written.
PRs merged.
Hours spent.
How many sections of the spec are filled in (this is the docs score, which is necessary but not sufficient).