The platform becomes trustworthy when the same raw inputs recreate the same state, the same decisions, and the same simulated fills.
Challenge we are solving
Most backtests tell comforting stories because they skip real state, real execution, or version changes. We need validation under the same rules as production.
What this stage does
Replays the event log, rebuilds canonical state, regenerates signals and scores, compares predicted behaviour against realised behaviour, and reports parity, calibration error, slippage error, Brier, Sharpe, drawdown.
Why this stage exists
This is how we separate genuine edge from wishful backtests. No promotion to runtime-live without parity.
Every formula below is implemented in packages/polytraders-bots/ or packages/polytraders-runner/. Treat the worked example as the unit-test sanity check you should be able to reproduce locally.
worked example\[N=120,\; RMSE = 0.0011\;(\approx 11\,bps)\]
Stage 11 alerts if slip RMSE exceeds the promotion budget. This is how undetected adverse selection surfaces.
How a developer codes this stage
Reference TypeScript implementation lives in packages/polytraders-* at the repository root. Stage owners maintain these files — read them before writing new code.