Definition of done
A 27/27 docs score does not mean a bot is done. v3 splits readiness into three independent scores; v3.5 adds a fourth. A bot is done only when all four are.
The forbidden shortcut: scoring documentation completeness as readiness. v2 made this mistake. The v3 status board shows docs/impl/runtime separately so it cannot recur.
The four scores
| Score | Out of | What it means | What it does NOT mean |
|---|---|---|---|
| Docs | 27 | The spec is complete: purpose, inputs, parameters, defaults, thresholds, failure modes, wire examples, runbook, promotion gates — all 27 sections. | The bot is implemented or running. |
| Impl | 15 | The bot compiles against @polytraders/contracts, validates fixtures (normal/warning/hard/failure), emits ReportEnvelopes with reason codes from the registry, supports its declared modes (including quarantine where applicable), passes against the mock CLOB v2 adapter, and is green in CI. | The bot has been observed under live conditions. |
| Runtime | 8 | The bot has been running in shadow mode with clean telemetry, error rate under budget, latency within budget, reject rate calibrated, runbook battle-tested by an on-call rotation, dashboard live, alerts wired, and has been promoted through every gate. | That nothing will ever go wrong — only that we have evidence the bot behaves as documented. |
| Backtest | 4 | The bot runs end-to-end against the synthetic feed: a synthetic fixture exists, a recorded ReportEnvelope timeline is checked in, the parameter search space is declared, and at least one optimizer reference run is on file. See the synthetic demo and the optimizer. | That backtest results predict live PnL. They do not. Synthetic data only. |
Impl checklist (15)
- TypeScript compiles against the contracts package.
- Implements the
Bot<I,O>interface from@polytraders/contracts. - Config validates against the published
BotConfigschema. - Default config falls within every declared hard threshold.
- Fixture: normal-conditions input passes.
- Fixture: warning-conditions input emits the warning reason code.
- Fixture: hard-threshold input fires the safe fallback.
- Fixture: failure-injection input does not fail open.
- Emits
ReportEnvelopes on every decision path. - Every reason code emitted is in the registry.
- Authority enforced — can only do what the class allows.
- Every declared
mode_supportentry is reachable at runtime. - Mock CLOB v2 adapter passes the bot's full fixture pack.
- Standard metrics (decisions_total, decision_latency_ms, heartbeat_age_ms, errors_total, config_version, mode) are emitted.
- CI is green — lint, type-check, test, schema validation, contract verification.
Backtest checklist (4)
- Synthetic fixture exists in
packages/polytraders-bots/fixtures/<bot>/and is loaded bytests/verify.js. - Recorded
ReportEnvelopetimeline (deterministic by seed) is checked in alongside the fixture. - The bot exports a
SEARCH_SPACEobject describing every tunable parameter, range, and step. - At least one optimizer reference run (random or LLM-driven) is on file in
packages/polytraders-bots/runs/, with the winning params and the score function it was tuned against.
Honesty rule: Backtest score does not predict live performance. It measures whether the bot is reproducibly tunable against a known feed. Anything stronger than that is a forbidden claim.
Runtime checklist (8)
- Shadow telemetry clean for at least 7 days.
- Rolling error rate under budget.
- p95 latency inside the declared latency budget.
- Reject rate within the calibrated band.
- Runbook battle-tested by the on-call rotation against a real (or replayed) incident.
- Dashboard live in the team Grafana / equivalent.
- Alerts wired to the correct severity tier.
- Promoted through every gate: stub → shadow → advisory → enforced.
What we do not count
- Lines of code written.
- PRs merged.
- Hours spent.
- How many sections of the spec are filled in (this is the docs score, which is necessary but not sufficient).